Exceptions in .NET can be very expensive when it comes to CPU cycles. Read how Geeks with Blogs (http://geekswithblogs.net) used Retrace to improve the performance of their application.
Thousands of software developers use Geeks with Blogs (GWB) to host their own personal blog. The code base has slowly morphed and has been modified by several developers over 15 years. As a free service, GWB decided to use Stackify to try improve the overall page load times and CPU usage with the hopes of also being able to use fewer servers to cut hosting costs and improve application performance and user experience. Here’s their story:
We installed Retrace on our web servers and configured log4net to send errors and logs to Stackify. This allowed us to get visibility to errors being logged by log4net, basic server monitoring, key .NET performance counters, and full code profiling details via Retrace. I have to admit, it was pretty amazing to see all of this data in one place.
By sending our log4net logs to Stackify we were able to immediately identify a couple simple bugs that needed to be fixed. Retrace’s page level performance tracking confirmed that two pages (the blog list and individual blog page view) were generating the vast majority of all the traffic to the site. We also found a couple places in our code where we could do a better job of caching database queries.
Out-of-the-box Stackify’s default monitoring included some .NET performance counters. One of which was the number of .NET CLR exceptions being thrown per minute. We noticed that the number was over 10,000 on each server but none of those errors were being logged via log4net. The exceptions were being caught and thrown away and we had no way to know what they were.
This shows total errors being about 400,000 to 700,000 every 30 minutes. Yikes!
It is very common for developers to make helper methods for common tasks like grabbing a column from a SqlDataReader and converting it to the correct type as shown below. This type of code seems innocent but can potentially cause huge performance problems as you will see below.
public DateTime? GetDate(SqlDataReader reader, string columnName) { DateTime? value = null; try { value = DateTime.Parse(reader[columnName].ToString()); } catch { } return value; }
This code can easily throw multiple different exceptions:
#2 and #3 are easy to avoid by refactoring the code as shown below. Checking for null and using TryParse eliminates the potential exceptions.
public DateTime? GetDate(SqlDataReader reader, string columnName) { try { var dbval = reader[columnName]; DateTime dt; if (dbval != DBNull.Value && DateTime.TryParse(dbval.ToString(), out dt)) { return dt; } } catch {} return null; }
The root cause of our performance problems was actually #1, trying to access a column name that was not available in the SqlDataReader. We have several web pages that do different types of searches of blog posts and they use different stored procedures. A long time ago someone made the decision to have one code path that took the stored procedure results and populate them in to a common object that all the pages used. This worked perfect except all the stored procedures returned slightly different columns. This caused a lot of exceptions to occur for the missing columns, but we had no idea that the exceptions even existed since our code did not log the exceptions.
The only way we were able to find these exceptions was from Retrace. Because it does code profiling within the CLR, it is able to get visibility to all exceptions that are thrown. So even though our code was swallowing the exceptions and not logging them via log4net, Stackify was able to give us visibility to them with no coding or logging changes. We were able to immediately find problems in a 15 year old application.
With a few code changes we were able to eliminate several exceptions that we had no idea existed until we used Retrace. This cut CPU usage and enabled us to scale down the # of web servers we were using which will save us monthly hosting costs as well as to improve overall application performance.
This chart shows how the number of errors has dramatically lowered
Page load times improved as well. It is amazing how much of an effect hidden exceptions had on overall page load time across the app. Before the changes, average load times were 500-800 milliseconds and have dropped to 250-500ms depending on site traffic.
This chart below shows request volume by user satisfaction over the same time period. You can see that the yellow sluggish requests have lessened.
Exceptions in .NET are expensive, even when your users don’t know they are happening. The primary cause of it is capturing the stack trace at the time the exception is thrown. The .NET CLR has to stop and walk back through the calling frames and this requires CPU cycles and blocks the thread. At Stackify we would suggest never using exceptions to control application logic flow and to use good defensive coding techniques to avoid issues like those which Geeks with Blogs ran into.
If you would like to be a guest contributor to the Stackify blog please reach out to stackify@stackify.com