At TrackAbout, we already do a lot of system performance measurement. We’re about to start doing even more. Database performance plays a large role in the overall performance of the TrackAbout service. In this blog series, we will discuss a new approach to measuring and monitoring the performance of queries against our Microsoft SQL Server relational database environment.
The performance of our software is an important part of our end-user experience. We encourage all developers to adopt the philosophy that performance is a feature. Apps and services that perform well yield many benefits. At its simplest:
- Speed makes users happy
- Efficiency keeps costs down, in the form of requiring less hardware to get the work done. This enables us to scale up our company more easily.
In technology, many performance problems can be solved by throwing money at them. Throwing money generally boils down to either (1) upgrading or adding hardware and infrastructure or (2) improving the code. Sometimes it’s not clear which approach yields the best price/performance. First, you need to know where the bottlenecks are. That means measurement.
Ultimately, we intend to define internal Service Level Agreements (SLAs) for query performance to which our engineers will strive to adhere. We do not know yet what our target SLA numbers are going to be. We will have a clearer idea once we start gathering and analyzing current performance data for our queries.
Understanding why certain queries are slow will enable us to better educate our developers as to how to avoid creating performance problems in the first place.
We are open-sourcing all the code we develop as part of this effort. You can find it on GitHub in our sql-perf-monitoring repository.
We are in the process of migrating our disk from one SAN to another. We are in the enviable position of having both SANs available to our Microsoft SQL Server 2008 R2 SP1 cluster at the same time. Our SQL Server instance is not virtualized.
I began working on a process to migrate all our databases with as little downtime as possible. I imposed the “as little downtime as possible” requirement on myself to see if it was possible to migrate this data with no downtime whatsoever. I time-boxed my efforts to about 3 days of intermittent work (when I could steal the time). Technically, our customers would be fine with a little bit of downtime outside of business hours, but I wanted to challenge myself and see what I could come up with.