In this blog, I wanted to simulate a typical problem that projects, big and small alike, encounter when developers don’t have any idea how their queries perform. There are project where developers don’t give any regard to the performance of what they are doing. Some just code without actually knowing when their code exactly starts to explode on their faces.
Quite often, software development projects that rely on outsourced manpower, could probably experience euphoria right after they stamp their project as 100% done, only to experience problems when some coders responsible for a lot of the problems are probably gone and is somewhere ready to start on another mess. Most likely, those coders never had the chance of being told the problems they left. Probably some were lucky enough to have been retained and were still available when the problems start to manifest.
The Experiment
So when exactly do our applications start to show some sluggishness?
Here is the result:
|
Dev/Testing |
After |
After |
After |
_sales_office |
9235 |
29202 |
59877 |
337076 |
_sales |
119290 |
378058 |
776146 |
4376480 |
Qry1 |
1 |
4 |
8 |
48 |
Qry2 |
<1 |
<1 |
<1 |
4 |
Qry3 |
<1 |
<1 |
<1 |
<1 |
Qry4 |
<1 |
<1 |
<1 |
<1 |
Qry5 |
2 |
6 |
13 |
73 |
Qry6 |
2 |
5 |
11 |
66 |
Qry7 |
<1 |
<1 |
<1 |
<1 |
Qry8 |
<1 |
<1 |
<1 |
2 |
Qry9 |
<1 |
<1 |
<1 |
5 |
NOTE: results are in seconds.
To answer this question, I conducted an experiment to make a very simple 2-table application that contains very few columns. Only the primary and foreign keys were indexed
upon creation. The rest of the columns were not indexed. This application records the quantities of cameras sold by every salesman from a fix number of regions. I then built a small utility that would inject the tables of quantities being sold daily. I tried to have the records randomly inserted to be realistic as possible.
I then prepared at least 8 very simple queries so I can run it against the database when it is populated so I can see how the codes perform. The queries are a mix of things that anyone often encounters in a project that manifested some performance problems.
I then prepared the database to be filled with data that would simulate typical stages of the application’s lifecycle and when things are all very fast (usually this is during the development and testing stage). So I had prepared 4 stages so I get to see how things will progress in terms of speed of processing.
I started with the dev/testing stage. In this stage I wanted to know how fast are things during this stage assuming most projects don’t conduct load testing. As you can see in the result, all queries are running very fast, even the most absurd ones like query #5 and #6.
Then I emptied the database to signify that the next records to be inserted are all production records. I wanted to know the speed of things after 15 days. This is usually when everybody tends to be happy knowing that their brand new system is 100% running, and running very FAST. After 15 days, as you can see in the results, things are almost just as fast as when things were under development. Even the queries from hell (#5 and #6) are more than acceptable. Once in a while I still hear stories like management asking their developers a list of things that are a ten thousand long. In this example, it’s a 300k record list. That looooong! Nevertheless after 15 days, everybody are expected to be wearing their happy faces.
I also wanted to know how things are after a month of operation. Usually, at this stage, top management celebrates the success of their newly implemented system as things go smooth and pretty. Then probably, they would tend to let go of some key contractual personnel as a consequence of having the impression that their brand new system is running fine. But of course, they wanted to cut on expenses. No matter how absurd, a 700k record dataset is just as fast in 13 seconds. Who could complain with that volume and speed?
6 Months Later
Fast forward to 180 days, everybody must be wondering what hit them and started the blamestorming game. Those who are left in the project, probably are so overwhelmed by complaints from non-techies why things got so slow. Non-techies probably are wondering when everything were just bought and done 6 months ago. Everything was so brand new.
The Queries
The following were queries used in the experiment so you have an idea how things perform over time. The queries are very simple. #5 and #6 are created to make things obvious as there are still people who are not aware that those kinds of queries are just plain absurd no matter how management needs them (e.g. a report containing a list of everything).
–Q1
SELECT * FROM _sales
–Q2
SELECT * FROM _sales_office
–Q3
SELECT COUNT(*) FROM _sales
–Q4
SELECT COUNT(*) FROM _sales_office
–Q5
SELECT * FROM _sales T1 JOIN _sales_office T2 ON T1.sales_office_pk = T2.sales_office_pk
–Q6
SELECT
T2.datesold, T2.region, T2.salesman, T1.brand, T1.quantity_sold
FROM
_sales T1
JOIN _sales_office T2 ON T1.sales_office_pk = T2.sales_office_pk
–Q7 GENERATE YESTERDAY’S REPORT
SELECT
T2.datesold, T2.region, T2.salesman, T1.brand, T1.quantity_sold
FROM
_sales T1
JOIN _sales_office T2 ON T1.sales_office_pk = T2.sales_office_pk
WHERE T2.datesold = (SELECT MAX(datesold) FROM _sales_office)
–Q8 GENERATE AGGREGATED REPORT
SELECT
T2.datesold, T2.region, T2.salesman, SUM(T1.quantity_sold) AS ‘qty_sold’
FROM
_sales T1
JOIN _sales_office T2 ON T1.sales_office_pk = T2.sales_office_pk
WHERE
T2.datesold = (SELECT MAX(datesold) FROM _sales_office)
GROUP BY
T2.datesold, T2.region, T2.salesman
Conclusion
There are lots of things everyone can learn from this experiment. It is very important that someone in any project knows what they are doing and what they are up to. This is essential especially now that databases are taken for granted and much focus are given on the application side of things. Someone has to know their databases.
**************************************************
Toto Gamboa is a consultant specializing on databases, Microsoft SQL Server and software development operating in the Philippines. He is currently a member and one of the leaders of Philippine SQL Server Users Group, a Professional Association for SQL Server (PASS) chapter and is one of Microsoft’s MVP for SQL Server in the Philippines. You may reach him by sending an email to totogamboa@gmail.com