This has been perhaps written quite a number of times by a lot of TSQL gurus out there but earlier I was presented with a TSQL code that prompted me to recheck my notions on subqueries, table variables and temporary tables. The TSQL code presented to me is something built dynamically that depends heavily on user inputs. The issue presented was performance. The code used to run acceptably fine until earlier when things are annoyingly slow.
Upon seeing the code, I said to myself, problem solved. Instantly, I thought I was able to spot what is causing the performance issue. The code was written using TSQL’s temporary tables. Based from experience, I would automatically shoot down any attempt to use temporary tables when executing a TSQL batch. And though Microsoft has introduced table variables as a better alternative to temporary tables, better in terms of technicality and programmability, delving with the two options would always be my last resort. In most cases, I’d rather tackle difficult scenarios requiring handling of temporal data using subqueries. In fact, I immediately ruled a rewrite of the code to minimize the use of temporary tables, and at least replace it with table variables which were supposed to be a lot better. But upon further investigation, I ruled a rewrite because of some other things not related to the topics at hand here. On the other hand, am trying to figure out how things can be converted using subqueries which I would primarily prefer.
In my investigation though, something interesting came up and I thought I’d better time the executions of my experiment while I make the templates of how things are done using subqueries. The results now intrigues me a bit, and probably would take a second look at the code I was shown earlier. I might be looking at the wrong place. Nevertheless, the results of my experiment might interest others out there. So here it is:
The Experiment
I always love to use one of our company’s product, the library system, to test my experiments with anything to do with TSQL as its database is as diverse in design and available test data is in great abundance. I picked one small database sample for my experiment and this is how it went:
- Test 1 is intended to check how different approaches fare on small sets of data. I have to query all library materials that were written in ‘Filipino’ (which would result into 974 rows out of the 895701 rows in a single table). Based on the result, I have to search for a string pattern “bala” in all of the materials’ titles using the LIKE operator which would eventually result to 7 rows.
- Test 2 is basically the same query against the same environment with different query conditions to see how the three approaches fare on large sets of data. So I queried all library materials that were written in ‘English’ (which would result into 43896 rows out of the 895701 rows). Based on the result, I have to search for a string pattern “life” in all of the materials’ titles using the LIKE operator which would eventually result to 440 rows.
I wrote three queries that each would use a specific approach to query my desired output in this experiment.
First query using SUB QUERIES:
Second query uses TABLE VARIABLES:
The third and last query uses TEMPORARY TABLES:
I run all three queries (3x each) after a service restart to clean-up tempdb and I got the following results. Figures are presented in the number of seconds:
Total Rows | Temporal Data Count | Final Result | Sub Queries | Table Variables | Temporary Tables | |
Test 1 | 895701 rows | 974 rows | 7 rows | 2 secs | 4 secs | 2 secs |
Test 2 | 895701 rows | 43896 rows | 440 rows | 3 secs | 23 secs | 4 secs |
The results pretty much proved interesting. I thought, based on the queries I wrote that temporary tables would be slightly slower of the three approaches. The execution plan for table variables and temporary tables are pretty much the same. I was a bit surprised that the execution time of the table variable is almost twice that of a temporary table on small sets of data but temporary tables trumps table variables by a large margin in large set of data. I was expecting a slight difference only as, though both are a bit different in terms of scoping and technicality, and to some extent purpose, both uses TEMPDB as their memory (contrary to the notion that table variables use in-memory) when handling data. What is notable though is that, disk I/O is costlier with Table Variables. I am not sure at this point what causes this cost. Will try to dig deeper when I have more time.
Nevertheless, this particular experiment does not provide a general conclusion that one of the three approaches is better than among the three. I still subscribe to the general notions:
- That sub queries are neat and would perform better in most if not all cases
- That temporary tables would be slower on environments with heavy concurrency as sql server would handle more locks especially when using it within a transaction. I have not tested this scenario here though.
- That temporary tables may be a faster solution if constraints/indexes are needed in between queries, especially when a large set of data is expected, statistics on table variables columns arent created and it can’t have indexes. But surprisingly, this experiment shows Table Variables performed poorly among the three with handling large data set as being its slowest.
- That table variables are a good option technically/programmatically
You just be the judge. It would be prudent to conduct tests. I can’t say my experiment here is conclusive at best, but from hereon, I would avoid using Table Variables whenever I can. For very simple queries like in the tests, it performed poorly. I can’t imagine how things will be with larger datasets and more complex queries.
I am on to my next experiment!
**************************************************
Toto Gamboa is a consultant specializing on databases, Microsoft SQL Server and software development operating in the Philippines. He is currently a member and one of the leaders of Philippine SQL Server Users Group, a Professional Association for SQL Server (PASS) chapter and is one of Microsoft’s MVP for SQL Server in the Philippines. You may reach him by sending an email to totogamboa@gmail.com