It is very common that we take for granted data growth. We fail to anticipate the various consequences caused by unmanaged data growth. The most obvious consequence that we immediately see and feel is in the aspect of performance degradation. As huge volumes of data are added to a table, querying would take more time before it can produce results.
To manage or control data growth, one must understand how data are being used so one can mitigate possible repercussions and put in place strategies that would minimize the need to query all of SQL Server’s data to produce the result.
Here I will try to list down the common things that can be done to avoid querying all data:
- Data Partitioning. For example, if a certain business rely tremendously on information taken out from the previous/recent 2 quarters, one can put in place a design where only 2 quarters worth of the most recent data are available, the rest are stored on a separate table and can be classified as less active data. Having a single table containing all of the millions of rows spanning years and decades of data forces you to deal with tables containing irrelevant data which can be inefficient. There are several ways one can data partition in SQL Server. You can do it by design (having schema that physically separate inactive data from active ones) or by using SQL Server’s data partitioning features (e.g. Table Partition).
- Archive Old Data. Remove data that you know that are not used anymore. This will make most of your database objects smaller and quicker. This is a kind of data partitioning strategy too and the difference is that data that are unactive are removed away from the database instead of just marking them as inactive.
- Use of efficient index. By introducing indexes to tables, SQL Server avoids having to go through each row to check if the row gets to be part of the result.
- Sharding. On extreme cases, one can implement sharding. I have written an old article about it when Azure’s SQL database lacked sharding support. it has some sharding mechanism now. But you can glean on how to implement sharding to manage on data growth and scale massively.