The ability to view tag-level execution profiling (the amount of time spent on each tag in a request) is no longer a dream, and it opens up powerful new forms of debugging and performance tuning.
The Problem: Tools We Lack
Most CFML developers know that both ColdFusion and BlueDragon can be configured to show debugging info at the bottom of a page, which shows (among other things) the total time spent on a request and in each template within that request.
But how often have you wished you could see the time spent on each tag within the request? This would really help in determining where specific bottlenecks exist. While seeing that timing information for each individual tag would be powerful enough, we really need to start a higher level. What we need is a way to view all the requests against the server for a given time period, viewing their total execution time in aggregate, so we can identify just which requests are indeed the longest-running and therefore deserve our closer attention.
All these kinds of profiling/tuning tools have long been available to other developers, but not to us CFML developers. Users of BlueDragon, though, will soon have just this kind of tool in the BlueDragon CFML Profiler (scheduled to be released later this year), which has been designed to address this very kind of challenge.
Solving a Real Problem with the Profiler
Rather than just talk about the tool, let's instead dive right into using it to solve an actual problem. To help both identify the problem and confirm the result of code changes, I've used a load testing tool to run a small set of requests executing a subset of the Macromedia example applications that come with ColdFusion.
It's not important to discuss the load testing tool, nor will I bother detailing the particular requests I ran in the test. And I certainly don't mean to disparage the examples. They weren't written to be demonstrations of well-tuned code but rather very simple examples which beginning CFML developers could easily understand. Even so, they do represent rather typical practices that many developers employ which can hold up performance.
With the Profiler enabled, I ran a simple short load test to run through some of the example pages. (The details of how to start and configure the profiler may change between now and its final release, so I won't elaborate on how it's started now.)
The profiler is an extension to the BlueDragon engine, using a very low-overhead mechanism (aspects) to create a stream of profiling data available on a particular port, which can then be analyzed by a tool that retrieves the stream of profiling information.
Even though it's low-overhead and won't add much burden during profiling, the profiler can still generate a very large amount of information if enabled for too long a test (it's tracking every tag in every request while enabled). Profiling tools aren't really meant to be used to analyze full bore load tests. Again, I was just using a small sample generating multiple requests against several pages running for about 30 seconds.
The current version of the profiler is a web-based interface (HTML, generated from CFML), but a Flash-based version could be a powerful adaptation and is under consideration.
Viewing the Profile of All Requests
With the load test completed (or indeed even while it's running), I opened the web-based interface to view the requests currently being profiled. See Figure 1. The "get latest profiles" button caused the profiler to gather all the current profiling information being generated by the latest set of requests.
The profiler collects all the information since the BlueDragon engine has started, but I can clear the profiled requests at any time before running another set of tests, in order to focus only on those most recently run requests. I'll be doing this later, after performing some code changes based on the information presented from this first load test.
Since profiling may be generated for many requests (even hundreds in my small test), the interface provides for scrolling to show just 10 requests at a time. The list of requests can also be sorted by any of the available columns, and I have chosen to sort it by the "Render Time" column, listing the requests by total execution time per request.
Further, you'll notice that many of them are highlighted in yellow. That's not something I added to the screenshots for this article but rather it's what the profiler will do for you to help isolate the most troubling requests, based on a "filter" field available on the display. I've entered a value of 100 (milliseconds), but you can set it to whatever value you prefer depending on the kind of data shown in your profile display.
We can see that all the requests shown as being of greatest duration are for the page /cfdocs/exampleapps/personneldirectory/results.cfm. The load test ran many other requests, but these are the ones taking the most time, so that's where I can start to focus my tuning attention.
To be fair, the kind of information shown in Figure 1, viewing total time per request, can be obtained from other tools, including simple web server log analysis tools. Indeed, yet another tool offering this sort of aggregated CFML page request information (for both CFMX and BlueDragon/J2EE) is the excellent SeeFusion tool, available at http://seefusion.com.
Even so, there is at least one hint of the greater power of the profiler in that the display also shows the number of tags run within the request. Clearly that's not reported by a web server log analysis tool (nor even by SeeFusion, though it has great value as an adjunct to the profiler which I'll discuss later).
Drilling Down to View All Tags In The Request
At this point, most CFML developers would at least be delighted to be able to take this information and focus on the one page that's showing to cause the most execution time. They would open the code and start eye-balling to see what's up, but the BlueDragon Profiler can do much more for you!
You can click on a given request, and as Figure 2 shows it will drill down and view details for that request, displaying several interesting kinds of detail that CFML developers have simply never seen. First, notice that it displays all the tags that were executed in the request, and further it shows the number of times they were executed, how long they took in aggregate, as well as the average time spent per execution of that tag. Very cool!
Note as well that this information is across all templates executed within the request, meaning included files and CFML custom tags, whose names are all shown at the bottom of the display. That portion of the screen also shows how many times each of those files was used in the request as well as the number of tags in each file and the total time spent in each file.
The display of tags and their execution time also supports the same filter time, so we see in this particular example that the CFQUERY tag is clearly the longest running tag in the request-indeed it is the same time shown for the "template as a whole", so it's clearly the biggest thing the template does. (The highlighting of CFMODULE reflects an anomaly which will be resolved before the final release.)
Drilling Down to View All Executions of a Given Tag in the Request
Again, at this point one may be tempted to go edit the template and find the CFQUERY tag (it does say there's only one in the template), but we don't technically know which file it occurs in (there are four files making up this request). Still, most CFML developers would again happily rely on their traditional experience at this point to root through the files to find the tag and attack it. But they don't have to waste their time!
The profiler helps save that time and hassle in a couple more ways, providing still more powerful profiling information. While the CFQUERY tag is clearly our focus, let's take a moment instead to view what we could see about a tag that shows having been executed in several places within a template (or across many files within the request).
For instance, notice that the CFOUTPUT tag shows at the top of the list of Figure 2. The list is sorted by default by the "hit rate" or number of times a given tag is executed within the request (across all files). Even though it's not a performance concern, let's take a look at drilling down to view the details about all uses of CFOUTPUT (since there's only one occurrence of CFQUERY, which we'll come back to in a moment).
In Figure 3, we see that the 13 executions of CFOUTPUT are spread across two templates (layout.cfm and results.cfm), and notice further that we see the specific line number and number of times executed for that tag for each line (totaling to the 13 that were listed for CFOUTPUT at the top of Figure 2), as well as the time spent (min, max, total, and average) for that tag on that line in that file, across the entire request. Again, potentially powerful stuff!
But the CFOUTPUT tag doesn't show much in terms of the time values displayed (their values weren't even enough to register). I just wanted to diverge for a moment to just show this additional level of functionality available in the profiler. It could come in handy in other circumstances.
Indeed, we could drill down one more time into any one of those files listed (clicking on the file name) to see a further detail display for just that file (for this request), showing the same columns but listing all the tags in that one file. In the interest of time, I'll skip showing that screen.
Going back to the CFQUERY tag identified in Figure 2, if we were to drill down on that page to view the details about the use of CFQUERY across all files in the given request, it would show the details in Figure 4.
Notice that there's a slight (and useful!) difference in Figure 4 compared to Figure 3. Since it is a CFQUERY being displayed, the profiler goes a step further and shows the specific SQL statement associated with that CFQUERY on that line. Nice.
Displaying the Selected Tag's Source Page Context
So at this point we know that we have a rather long-running query at line 34 of results.cfm. Again most CFML developers would be itching to open their editor at this point to get to work on the code! But you may be tickled to learn that if we were still doing analysis at this point (or didn't have ready access to the source, such as on a remote box), we don't have to open the template in an editor to view the code.
The profiler itself offers a simple and useful display of the given tag, in the context of its source. Notice the § symbol displayed to the left of the filename shown in Figure 4. That acts as a hyperlink (different from clicking on the filename) which when clicked will display the CFML for that file, highlighting specifically the line number selected. Figure 5 shows how it opens to displaying line 34, as expected.
Turning to the Code to Tweak and Tune: Attempt 1
So, that's enough of the overview of how to use and understand the profiler displays. Let's use the information we've seen to now actually tune this application and review the results with the profiler. Even without opening an editor, we see in Figure 5 that there are a couple of issues with the CFQUERY and its SQL. Can you spot the possible opportunities for improvement?
Even without too much consideration, it's reasonable to question the use of "select *", which seem a shortcut but really asks the database to gather all the columns in the given table, brings that across the database connection (and network, if the DB is on another box), and then causes the CFML engine to store the results in memory as a recordset. If there are any columns we don't really use, that could be a lot of extra, needless work.
We can scroll down in the display of Figure 5 (or in the editor, if we've gone ahead to open the template), and see that the only fields used in the request are firstnname, lastname, and phone. Would changing the Select to just return those columns make a difference? We could look at the page's debugging output to see what other columns are gathered by the Select * (or open the database or a database viewing tool) to determine if there are enough more columns to justify changing from Select *. But with a powerful profiling tool (and existing load test scripts) at hand, it's easy enough to go ahead and try the change and see what impact it makes.
To clear out the current profiling details from the previous load test, we can click on the "Requests Avail." link that's displayed on the left of all screens, which will return us to the top list (as in Figure 1). At the bottom right below the list of requests is an available "clear all" link which will clear the cache of profiling information. I then made the change to the CFML code and reran the load test. The result of then hitting "get latest profiles" is shown in Figure 6.
Ah! We clearly see a change. The number of requests that exceed the 100 ms filter has clearly decreased. That's a good sign, and for such a small change! But notice that we still see the result.cfm page showing up consistently at the top of the list as the longest running requests. Again, I know my test ran many other pages. We may be happy at this point, but what else might we readily try?
Lest some quibble about the query taking more than 100 MS, let's not be too concerned. What I got during my test on my machine - loaded with other processes including the load tester itself, and even if compared to running the code on CF on my box - isn't really significant here. What matters is whether things on this setup can be improved by making changes to the code.
Tweaking and Tuning: Attempt 2
Again, take a look at the CFML and SQL in the display of Figure 5. If you could also look at the output of the page, you'd find it's a display of information about employees in a company. Hmm. If this isn't a big company, they probably aren't adding new employees all that often. Could it be that they haven't added one in hours? days? weeks? months?
This sounds like an excellent place to consider query caching. (Query caching was introduced in CF 4 yet many CFML developers still don't know about it. See Ben Forta's excellent introductory article from 1999, "Caching in on Performance", available at http://coldfusion.sys-con.com/read/41490.htm, as well as the Macromedia documentation.)
If you're new to query caching, you may worry that because the WHERE clause in the SELECT statement changes based on different incoming form data, that it may not make sense to do caching (you wouldn't want users to see the same data as a previous search for other employees). That's a common misunderstanding. The CFML engine (BlueDragon or CF) will cache based on each unique generated SQL statement. Again, see the docs for more information.
(Savvy developers may have noticed that the CFQUERY being analyzed also failed to use CFQUERYPARAM, where it used of form variables as the value to be searched in the WHERE clause. This could be both a security and an additional performance problem. But ColdFusion doesn't let you use CFQUERYPARAM in a cached query, so you'd have to choose one or the other. Clearly, you could test each approach to compare the performance improvement and then weigh the security concern against performance benefit. If you're using BlueDragon, though, I'm happy to report that you don't need to choose. You could do both, as it doesn't impose that limitation. But that's fodder for another article!)
So how long should we cache the results? Maybe there's a concern about keeping the display data up to date because the employee records can be edited elsewhere in the system. Even caching for just a short amount of time, like 15 minutes (or even a few minutes), can make for a performance boost. Again, we're in a perfect position to try it and see.
So I modified the CFQUERY to cache its results for 15 minutes by adding the attribute, CACHEDWITHIN="#CreateTimeSpan(0,0,15,0)#". I again cleared the profiler data and reran the load test. The results are shown in Figure 7.
Alright! We've made considerable progress. The personneldirectory/results.cfm request is not only listed once, but even that is no longer taking enough time to be highlighted by the filter. Indeed, we see only one remaining file that exceeds our filter. We could attack that next, but notice also that we see a whole new list of other page requests other than the results.cfm page, so clearly we've removed a significant bottleneck. We could also turn our attention to them. As is always the case, tuning is an iterative process.
Conclusion: The Profiler Made Short Work of Tuning
Clearly, the Profiler has been a powerful tool. It's made quick work first of identifying the longest running requests for a given period of time, then we were able to drill down to view the longest-running tag(s) within a given long-running request. Further, we could then drill down to view the uses of that tag across all files in that long-running request. Finally, we could even choose to look at the CFML source to view a selected long-running tag in context of its source.
It took a while to explain it all, but clearly in normal operation this could have taken just minutes to possibly identify and resolve a key bottleneck. This was a query problem, and I mentioned SeeFusion earlier which also has a powerful feature for specifically identifying query bottlenecks. I do recommend you check it out. Even so, I could just as well have found a problem with something other than a query problem. In any case SeeFusion wouldn't have targeted the specific line of code at issue. There's certainly a place for both tools.
If you're looking forward to this powerful profiling capability, I'll just remind you again that it's a BlueDragon-only feature, scheduled to be released later this year (the interface and feature set are subject to change). We already have ideas of still more profiling information we'd like to show (memory used for the objects in the page, and more.) Keep an eye on us.
The best place to learn about BlueDragon and its advantages is to visit the web site, www.newatlanta.com/bluedragon/, which has ample self-help information. To hear when the profiler is released, join the BlueDragon Interest list (www.newatlanta.com/products/bluedragon/ self_help/archive_search/index.cfm), a mailing list staffed by our engineers and customers.