Compression in SQL Server Backups

Today, apps development has never been easier and faster and it provided anyone the means to gather vast amounts of key information needed by one’s business. The explosion of various data sources that can possibly be mined of useful data that might be key to one’s business adds to the pressure of managing tons of data.

The size of data to be managed has definitely gone over the roof and forces everyone to seriously take a look at how they should keep all these. Data occupies space in bits. The process in which these bits are written to some storage medium takes time. As the volume of bits increases, the time to store these bits to some digital storage also increases. Of course everybody knows that. But almost 7 years since this feature in SQL Server became available  to help in managing data storage, it is still a surprise that not too many has utilized it. Some even aren’t aware that such feature exist in SQL Server.

This simple SQL Server feature is gold. It is the capability of SQL Server to do compressed backup be it full, differential or transaction log. How golden is this feature? Here I tested a 25GB test database to see some numbers:

mssql uncompressed mssql compressed uncompressed + 7Zip normal uncompressed + 7Zip fastest
Backup size 19.1 GB (20,587,249,664 bytes) 3.49 GB (3,752,882,176 bytes) 1.14 GB (1,234,287,895 bytes) 1.67 GB (1,799,850,005 bytes)
Backup speed 309 secs 114 secs 309 secs (backup) 5040 secs (compression) 309 secs (backup) 897 secs (compression)

Clearly, there is a lot of benefit here for anyone who will use compression in backups. Microsoft provided something neat and very convenient and seamless.

Advertisement

Database Server Provisioning: Partitions & Logical Drives

People overlook the importance of coming up with the right number and correct combination of storage partitions when setting up SQL Server server. More often, people just create partitions equal to the number of disks or arrays they have on the server. For example:

Array 1, Raid 1, Disks 1-2, Partition 1 – Drive C: (OS)
Array 2, Raid 1, Disks 3-4, Partition 2 – Drive D: (Data)
Array 3, Raid 1, Disks 5-6, Partition 3 – Drive E: (Log)
Array 4, Raid 1, Disks 7-8, Partition 4 – Drive F: (TempDB)

This works perfectly however if you are lucky enough to have a server capable of having the right amount of partitions that would host your intended files. But most of us live in the world where funds are scarce that we have to deal with limited amount of resources like server hard drives.

So if you are one of those who weren’t lucky enough to be given the right amount of funds for the right amount of server hard drives, creating storage partitions like as if you have the right amount of hard drives will save you some future work when you finally had your server with the right amount of hard drives.

For example, if you only have 2 drives, configured as RAID 1, instead of just creating a single partition (drive C:) to house every file you have in the server, you may want to create the partitions  in a manner that would mimic your best/ideal configuration even though you have a limited amount of resource underneath. For example:

Array 1, Raid 1, Disks 1-2, Partition 1 – Drive C: (OS)
Array 1, Raid 1, Disks 1-2, Partition 2 – Drive D: (Data)
Array 1, Raid 1, Disks 1-2, Partition 3 – Drive E: (Log)
Array 1, Raid 1, Disks 1-2, Partition 4 – Drive F: (TempDB)

When bosses grant your wishes to have the server configuration you desire, rest assured that you will have lesser tweaks to do when migrating to the newer server. For example, on the newer server, you will have the following:

Array 1, Raid 1, Disks 1-2, Partition 1 – Drive C: (OS)
Array 2, Raid 1+0, Disks 3-8, Partition 2 – Drive D: (Data)
Array 3, Raid 1+0, Disks 9-12, Partition 3 – Drive E: (Log)
Array 4, Raid 1+0, Disks 13-16, Partition 4 – Drive F: (TempDB)

By pre-configuring your servers with the ideal amount of partitions regardless of the underlying number of physical drives / arrays, you simply just restore a database backup on the new server and this saves you from reconfiguring file locations to take advantage of what is in the new server.

Caution: The examples above aren’t suggestive of best practice in terms of server/database performance but are mere simple examples to drive a point. Your ideal partition configuration may be more exotic or elaborate than the samples presented as based from your unique circumstances.

**************************************************

IMG_5960
Philipine Duck, Anas luzonica (endemic)
Photographed in Candaba, Pampanga

**************************************************
Toto Gamboa is a consultant specializing on databases, Microsoft SQL Server and software development operating in the Philippines. He is currently a member and one of the leaders of Philippine SQL Server Users Group, a Professional Association for SQL Server (PASS) chapter and is one of Microsoft’s MVP for SQL Server in the Philippines. You may reach him by sending an email to totogamboa@gmail.com

Database Server Provisioning: How Many Disks Do I Need For RAID 10?

I have a new server box intended as a dedicated SQL Server 2012 server. With the new box, I am now faced with a question on what should be the optimal number of physical disk I would configure on each RAID 10 array that I would set. Having the answer to the question would greatly help me in deciding as to how many partitions I will create to support the separation of SQL Server’s data, transaction log and tempdb.

To come up with an answer, I conducted some simple SQLIO test to determine which array configuration is best suited for the read-query-intensive application that I am about to set up. The test should be able to answer the following:

  • Is a 6-disk RAID 1+0 array better than a 4-disk RAID 1+0 array?
  • Is an 8-disk RAID 1+0 array better than a 6-disk RAID 1+0 array?

The Test

I carefully devised the following tests:

  • Run an SQLIO READ test using a 10GB file on a RAID 1+0 array with either 4, 6, 8 and 10 disks.
  • Run an SQLIO WRITE test using the same configurations as the READ test.
  • Each SQLIO test case should be run 5 times and get the average from the results.
  • Prior to each test, I have to destroy the RAID configuration using the server’s RAID utilities, set it up with full initialization, create and format the partition.

The Results

The following are numbers produced by SQLIO:

READ Test

Disks

 IOs/sec

 MBs/sec

Latency (ms)

Min

Ave

Max

4

1,323.70

10.34

0

23

544

6

1,580.82

12.35

0

19

562

8

1,817.43

14.19

0

17

498

10

1,878.32

14.37

0

16

495


WRITE Test

Disks

 IOs/sec

 MBs/sec

Latency (ms)

Min

Ave

Max

4

715.82

5.59

0

44

152

8

1,197.28

9.35

0

26

199

My Conclusion

  • I conducted the READ test first which influenced my decision to just instead run the full WRITE test on a 4 and 8 disk array.
  • After the READ test, it seems to suggest that having 4 and 6 disks in an array don’t seem to have a significant difference in terms of performance between the two.
  • Going beyond 8 disks, performance wouldn’t improve much
  • With only 14 physical disks to utilize, I ended up with 2 RAID 1+0 arrays. One with 4 disks and the other with 8 disks.

Additional Information

  • I only run each test once on a 6 and 10 disk arrays so I have not included the figures in the results.
  • I tested the 10-disk array configuration just to confirm if there will be significant performance gain.
  • Disks used were the ordinary 250GB SATA drives.

Caution

The results presented here do not, in any way, represent a recommendation. It merely shows based on my specific configuration. It may differ in other server configurations and I have no way to know the possible results of other test using a different server configuration. It is therefore highly recommended to conduct your own testing when you have a new server.

**************************************************
IMG_0561
Mountain Province, Philippines

**************************************************

Toto Gamboa is a consultant specializing on databases, Microsoft SQL Server and software development operating in the Philippines. He is currently a member and one of the leaders of Philippine SQL Server Users Group, a Professional Association for SQL Server (PASS) chapter and is one of Microsoft’s MVP for SQL Server in the Philippines. You may reach him by sending an email to totogamboa@gmail.com

Thoughts on Designing Databases for SQL Azure – Part 3

In the first article of this series, I raised an issue considered to be one of sharding’s oddities. The issue raised was what would one do should a single tenant occupying a shard exceeds a shard’s capability (e.g. in terms of storage and computing power). The scenario I was referring in the first article was that I opted to choose “country” as my way of defining a tenant (or sharding key). In this iteration, I’ll once again attempt to share my thoughts on how I would approach the situation.

Off the bat, I’d probably blurt out the following when ask how to solve this issue:

  • Increase the size of the shard
  • Increase the computing power of the machine where the shard is situated

On in-premise sharding implementations, throwing in more hardware is easier to accomplish. However, doing the above suggestions when you are using SQL Azure, is easier said than done. Here is why:

  • Microsoft limits SQL Azure’s database sizes to 1GB, 5GB, and 50GB chunks.
  • The computing instance of where a shard can reside in SQL Azure is as finite as well

I have heard of unverified reports that Microsoft allows on a case-to-case basis to increase an SQL Azure’s database size to more than 50GB and probably situate a shard on some fine special rig. This however leads to a question on how much Microsoft allows each and every SQL Azure subscriber to avail of such special treatment. And it could probably cost one a fortune to get things done this way.

However, there are various ways to circumvent on the issue at hand without getting special treatment. One can also do the following:

  • You can tell your tenant not to grow big and consume much computing power (Hey … Flickr does this. :P)
  • You can probably shard a shard. Sometimes, things can really go complicated but anytime of the day, one can chop into pieces a shard. Besides, at this point, you could have probably eaten sharding for breakfast, lunch and dinner.

So how does one shard a shard?

In the first part of this series, I used as an example of sharding a database by Country. To refresh, here is an excerpt from the first article:

Server: ABC
Database: DB1
Table: userfiles
userfiles_pk user uploaded file country_code
1 john file1 grmy
2 john file2 grmy
6 edu file1 grmy
Server: ABC
Database: DB1
Table: country
country_code country
grmy germany
can canada
ity italy
Server: CDE
Database: DB1
Table: userfiles
userfiles_pk user uploaded file country_code
3 allan file1 can
4 allan file2 can
5 allan file3 can
9 jon file1 can
10 jon file2 can
11 jon file3 can
Server: CDE
Database: DB1
Table: country
country_code country
grmy germany
can canada
ity italy

In the sample above, the first shard contains data related only to grmy (germany) and the second shard contains data related only to can (canada). To break the shard further into pieces, one needs to find a another candidate key for sharding. If there is none, as in the case of our example, one should create one. We can probably think of splitting up a country by introducing regions from within (e.g. split by provinces, by cities, or by states). In this example, we can probably pick city as a our sharding key. To illustrate how, see the following shards:

Shard #1
Server: ABC1
Database: DB1
Table: userfiles
userfiles_pk User uploaded file country_code city_code
1 John file1 grmy berlin
2 John file2 grmy berlin

 

Shard #2
Server: ABC2
Database: DB1
Table: userfiles
userfiles_pk user uploaded file country_code city_code
6 edu file1 grmy hamburg

By deciding to further subdivide a country by cities where each city becomes a shard, the following statements would be true:

  • The new sharding key is now city_code.
  • Our shard would only occupy data related to a city.
  • Our shard would only occupy data related to a city.
  • Various shards can be in the same server. Shards don’t need to be in separate servers.
  • The increase in the number of shards would also increase the amount we spend on renting SQL Azure databases. According to Wikipedia, Germany alone have 2062 cities. This is some serious monthly spending that we have here. However this example is just for illustration purposes to convey the idea of sharding. One can always pick/create the most practical and cost-effective key for further sharding to address the issue of going beyond a shard’s capacity without the spending overhead due to poor design choices.
  • At a certain point in the future, we might exceed a shard’s capacity once again breaking our design.

**************************************************
Toto Gamboa is a consultant specializing on databases, Microsoft SQL Server and software development operating in the Philippines. He is currently a member and one of the leaders of Philippine SQL Server Users Group, a Professional Association for SQL Server (PASS) chapter and is one of Microsoft’s MVP for SQL Server in the Philippines. You may reach him by sending an email to totogamboa@gmail.com

Thoughts on Designing Databases for SQL Azure – Part 2

In the first article, I showed an example how a database’s design could impact us technically and financially. And sharding isn’t all just about splitting up data. It also brings to the table a group of terrible monsters to slay. There are a lot of concerns that needs to be considered when one attempts to shard a database, especially in SQL Azure.

NoSQL, NoRel, NoACID

In breaking things apart, one is bordering on clashing religions. One monster to slay is the issue of ACIDity. People discuss NoSQL, NoRel, NoACID to be one of the trends out there. And most even swear to the fact that these approaches are better than SQL. In my case, I prefer to call it NoACID and it is not by any means more or less than SQL. I have NoACID implementations on some projects I had. And I love SQL.  To simplify, I’ll put in these trends in a NoX lump as they commonly attempt to disengage with the realities of SQL.

For me, NoX is not a religion, it is simply a requirement. The nature of the app you build will dictate if you need to comply to the principles of ACID (Atomicity, Consistency, Isolation, Durability). If ACID is required, it is required regardless of your data and storage engine or your prefered religion. If it is required, you have to support it. Most cloud apps that we see, like Google and Facebook, could probably have ACID to be absent in their requirements list. Google is primarily read only so it does make sense to have data scattered all over various servers in all continents without the need for ACID. By nature, ACID in this regard, can be very minimal or absent. Facebook on the otherhand is read/write intensive. Seems like it is driven by a massive highly sophisticated message queuing engine. Would ACID be required in Facebook? I am not quite sure about Facebook’s implementation but the way I look at it, ACID can be optional. ACID can well be present in operations concerned only to one tenant in case of an FB account. Outside of this, the absence of ACID could probably be compensated by queuing and data synching.

If Facebook and Google decided to require ACID, they could be facing concerns on locking a lot of things. While locked on, latency could be one of the consequences. It is therefore very important to lay out firsthand if ACID is a requirement or not. For a heavy transactional system, a sharded design presents a lot of obstacles to hurdle. In SQL Azure, this is even harder as SQL Azure does not support distributed transactions like we used to with SQL Server. This means, if your transaction spans across multiple shards, there is no simple way to do it as SQL Azure does not support it, thus ACID can be compromised. SQL Azure however does support local transactions. This means you can definitely perform ACIDic operations within a shard.

To be continued…

**************************************************
Toto Gamboa is a consultant specializing on databases, Microsoft SQL Server and software development operating in the Philippines. He is currently a member and one of the leaders of Philippine SQL Server Users Group, a Professional Association for SQL Server (PASS) chapter and is one of Microsoft’s MVP for SQL Server in the Philippines. You may reach him by sending an email to totogamboa@gmail.com