Some Major Changes to SQL Server Licensing in 2012 Version

Last month, Microsoft has published changes to SQL Server’s licensing model when the 2012 version hits gold. A number of major changes will likely catch everyone’s attention. Here are those that caught my while:

  • Editions now are streamlined to just three (3) flavors:
    • Enterprise
    • Business Intelligence (new)
    • Standard
  • Enterprise Edition will now be licensed per CPU core instead of the usual per CPU Socket (processor).
  • Server + CAL Licensing will only be available in the Standard and Business Intelligence flavors.
  • Core licenses will be sold in Packs of 2 cores.
  • Cost of per core license will be 1/4 the price of current SQL Server 2008R2’s per processor cost license
  • Cost of Server CAL Licensing has increased by more or less 25%
  • To license your server based on cores, Microsoft requires you to buy a minimum of 4 core licenses per CPU socket.

These changes will definitely impact your upgrading plans. To carefully plot your licensing strategies, you can visit SQL Server 2012 Licensing Overview.

Where I See These Changes Have An Impact

  • Scenario 1: Upgrading from Servers with dual cores or less. If you are upgrading to a very old server with 2 CPUs with single core for each, you need to buy 4 core licenses for each CPU. That means you have to buy 8 core licenses. It would be rather logical that when doing this, you might as well upgrade your hardware. Besides, your hardware upgrade seems to be long overdue.
  • Scenario 2: Upgrading from Servers with more than 4 cores. In the past, If one experiences the need to have more CPU power for SQL Server, one only has to buy a new server with as many cores per processor that money can get, reinstall your SQL Server and you are good to go … all without minding licensing issues and cost. With SQL Server 2012, you will be spending more as you go beyond 4 cores. Going 6 cores, you need 2 more core licenses. Going 8 and 10 core, you need to buy additional 4 and 6 core licenses respectively.
  • Scenario 3: Upgrading from Servers with Server + CAL Licensing. There is a 25% increase in cost for CALs.

For those of you who are still using versions 2000 and 2005 and are planning to upgrade but find these 2012 changes unpalatable, you may want to rush upgrading to SQL Server 2008 R2 instead and get stuck in that version’s perks and licensing model before your resellers remove the older SKUs with older licensing models from their sales portfolios and force you to buy SQL Server 2012.

Of course, SQL Server 2012 Express will still be FREE!

Juneau That You Are On Steroids With Denali?

Supporting multiple versions of SQL Server for a single application typically means headache. And that is what I have been in since I started making commercial applications (banking mostly on the benefits provided by T-SQL). We have not abstracted our databases with non-TSQL objects in the mid-tier for very valid and interesting reasons. In my current situation, I am dealing with SQL Server 2005, 2008, 2008/R2 supporting bits of features from each version. Thank heavens all our clients have upgraded already from version 2000. My world though is still 80% SQL Server 2005, and being still a very formidable version, I don’t think I will be able to part ways with this build anytime soon. Case in point, clients cannot just easily upgrade to higher versions every time one comes out.

As a result, I am mired in managing our SQL Server development assets using various tools. Anything that can help tract and manage everything. Ever since, I had always wished for a tool that will address most of my SQL Server development concerns if not all. All these years, I have heavily relied on SSMS and supported by various other tools. When Database Projects arrive with Visual Studio, it got me interested but it tanks out every time I load it up with what I have. The last one was with Visual Studio 2010. With the sheer number of objects that we have, capacity and performance was always the issue and each time, I had to resort back to SSMS without exploring further those tools.

When Juneau, now officially called SQL Server Developer Tools, or SSDT, was announced as one of the touted tool in the next version of SQL Server  codenamed “Denali”, I was pretty excited about the possibilities it will bring to my situation. After getting my hands on it with Denali CTP3, I say it really got me excited. Though I haven’t tried steroids, based on the effects that I have seen, SSDT (sounds like drug testing haha) gets my nod despite some issues that I have encountered. I am aware that what I got is preview build and for sure Microsoft will eventually iron out those. The benefits far outweigh the quirks that I have seen. It would be worth to upgrade even with SSDT alone.

So what is in SQL Server Developer Tools?

I barely scraped the surface yet but here is what I have found that will a big impact with my situation:

  • It is a unified environment/tool for SQL Server, Business Intelligence, SSIS. Everything I need is provided for within the SSDT shell. I dont have to switch back and forth between Visual Studio, SSMS, and BIDS. Creation of new reports and SQL Server objects such as tables, views, stored procedures, file groups, full text catalogs, indexes … all can be done within SSDT. Existing ones can be added too. When I had the chance to explore MySQL Workbench, I find it cool to have objects that I create scripted first before it is created physically. This capability is now available in SSDT.
  • “Intellisense support for TSQL” for SQL Server 2005. With SSDT, it has not actually upgraded SQL Server 2005 then gave it TSQL Intellisense. TSQL Intellisense is outright available within SSDT regardless of your targetted SQL Server platform. With SSDT, you model your database schema/objects without the tool actually being connected to a targetted version 2005 live database.  This way, you work with your TSQL with Intellisense provided by SSDT, and you can publish your project to an SQL Server 2005 database by choosing 2005 as your target platform. Build/error checks are done prior to the publishing of the database project.
  • SQL Server Platform Targetting. SSDT supports versions 2005, 2008, 2008 R2, Denali and SQL Azure. You do your thing and just select your target platform when you are done. During the build, you will be prompted for platform violations.
  • Refactoring (Rename, wildcard expansion, Fully qualified naming). While in TSQL writing mode, you can rename a field and SSDT does the rest to all affected objects (tables, stored procedures, etc). Neat. No more 3rd party stuff to achieve this. 🙂
  • Supports Incremental Project Publishing. You can keep a development version of your database and upgrade your live version incrementally either via script or while connected to the database. It can even prompt you when your changes/upgrades causes a data loss. Very cool.
  • SSDT detects changes done outside of SSDT. It prompts you during publishing of a database project when it detects objects created outside of the tool. This is necessary so you know if somebody is messing up with your databases. I do wish though that SSDT would have the capability to reverse engineer these detected objects when it is determined to be legit as sometimes, you get to tweak directly the database using other tools like SSMS.
  • Importing of Existing Database Schema (from 2005 to Denali) into a database development project. One need not start from an empty project, existing databases can be dealt with.
  • Scalability. I am glad SSDT successfully imported the more than 20,000 objects in our current database. The previous attempts I had using Visual Studios’ Database Project (both 2008 and 2010) just hanged midstream after hours of importing all the objects. Took SSDT more than 20 minutes to import our database using a lowly test Denali test machine.

This is all for now folks! Regardless of some unwanted quirks I have encountered when using SSDT, I can still say, it is worth looking at. I am just waiting for some decent Denali machine to use so I get to load up my projects in SSDT and take the next level from there.

**************************************************
Toto Gamboa is a consultant specializing on databases, Microsoft SQL Server and software development operating in the Philippines. He is currently a member and one of the leaders of Philippine SQL Server Users Group (PHISSUG), a Professional Association for SQL Server (PASS) chapter and is one of Microsoft’s MVP for SQL Server in the Philippines. You may reach him by sending an email to totogamboa@gmail.com

Expertise Bug Leads to Major Database Design Blunder

Often, we hear software developers (myself included) say, “yeah … we can definitely do that!” or “yeah … you have come to the right place, anything you want .. we can build it for you”! Then often, we too hear stories of software projects fail here and there. And now I begin to wonder how things could be worked out properly so problems, or worst case, failures can be avoided.

What happened to me recently could probably be one of the reasons why software projects fail. For the past several months, I have been working on a system that concerns the health of people. The system is intended for use by doctors, dentists, physical therapists, nurses, or any practice or profession that deals with people’s health, etc. The system, when done, will handle quite an extensive amount of data gathered from a good number of processes and sources that are realized every minute of the day.

For the past several months, there was a good amount of communication between my group and those potential users of the system. A lot of time were spent in requirements discovery and gathering, analysis, and everybody even subject a lot of items to questions just to sort things out clearly so things come out fine. A near functional prototype has been established for several months and it looks like everybody is happy of everyone’s progress … including the state of the system. Beta testing was conducted, had 6 major beta builds in the last few months, and seemed not a thing was amiss. In fact, it is almost done that the potential users are so eager to have the system deployed already and pioneered as soon as possible (that would have been like last month). I was into the finishing touches and was like in the process of putting icing on the cake.

Then … it was KABLAAAAAAMMMMMM!!! As the project’s database designer, I thought of something that hurled everything back into the drawing board. I miss identifying one piece of information that should have been in the database design from the very start. With the set of people contributing to make this happen … this one thing never had any manifestation of being thought out. The medical guys involved in the project never thought of the item. The software development guys never had any clue. I never had any clue. And I have contemplated so deeply to analyze how I could miss something this important. People who played the analyst role came short in thinking about this (I am one of those). But I, being the database designer, am blaming of missing something so important.

After gathering myself, I came to a conclusion that the only time that I’d be able to easily identify or come across such piece of information, is probably when I am a doctor, a practicing one, and at the same time a database designer who had lots of databases and experience tucked under my belt. I caught the missing piece from a totally unrelated event, not even related to what I am doing.

To cut the story short, what happened thereafter was 1 table was added to the database structure with 1 new column that would serve as a reference for 70% of other tables. With the change, 70% of sql code were re-written, 50% of critical UI got revamped and lots of time lost and gained lots of sleepless nights.

The moral of my story, though completely lacking of juicy technical details as I cannot divulge those for fear of legal ramifications and of ridicule (ano ako hilo? hahahha), software development / technical expertise can only bring us to certain extents and domain expertise is clearly a desirable attribute one can have, especially if you are a database designer. The reality and funny thing though is that, I can’t picture myself as a doctor and a database person so I can eliminate this problem in the future and this is probably the reason why there are just too many software project that have failed. And I can still hear myself saying “yeah … of course, anything you want, I can build it for you!”.

The system I talked about here is almost done and is looking really good! And I don’t think I have missed some more that would screw up my day. How about you? Do you say, being considered as an expert, can do of anything that is asked of you? 😛

Thoughts on Designing Databases for SQL Azure – Part 3

In the first article of this series, I raised an issue considered to be one of sharding’s oddities. The issue raised was what would one do should a single tenant occupying a shard exceeds a shard’s capability (e.g. in terms of storage and computing power). The scenario I was referring in the first article was that I opted to choose “country” as my way of defining a tenant (or sharding key). In this iteration, I’ll once again attempt to share my thoughts on how I would approach the situation.

Off the bat, I’d probably blurt out the following when ask how to solve this issue:

  • Increase the size of the shard
  • Increase the computing power of the machine where the shard is situated

On in-premise sharding implementations, throwing in more hardware is easier to accomplish. However, doing the above suggestions when you are using SQL Azure, is easier said than done. Here is why:

  • Microsoft limits SQL Azure’s database sizes to 1GB, 5GB, and 50GB chunks.
  • The computing instance of where a shard can reside in SQL Azure is as finite as well

I have heard of unverified reports that Microsoft allows on a case-to-case basis to increase an SQL Azure’s database size to more than 50GB and probably situate a shard on some fine special rig. This however leads to a question on how much Microsoft allows each and every SQL Azure subscriber to avail of such special treatment. And it could probably cost one a fortune to get things done this way.

However, there are various ways to circumvent on the issue at hand without getting special treatment. One can also do the following:

  • You can tell your tenant not to grow big and consume much computing power (Hey … Flickr does this. :P)
  • You can probably shard a shard. Sometimes, things can really go complicated but anytime of the day, one can chop into pieces a shard. Besides, at this point, you could have probably eaten sharding for breakfast, lunch and dinner.

So how does one shard a shard?

In the first part of this series, I used as an example of sharding a database by Country. To refresh, here is an excerpt from the first article:

Server: ABC
Database: DB1
Table: userfiles
userfiles_pk user uploaded file country_code
1 john file1 grmy
2 john file2 grmy
6 edu file1 grmy
Server: ABC
Database: DB1
Table: country
country_code country
grmy germany
can canada
ity italy
Server: CDE
Database: DB1
Table: userfiles
userfiles_pk user uploaded file country_code
3 allan file1 can
4 allan file2 can
5 allan file3 can
9 jon file1 can
10 jon file2 can
11 jon file3 can
Server: CDE
Database: DB1
Table: country
country_code country
grmy germany
can canada
ity italy

In the sample above, the first shard contains data related only to grmy (germany) and the second shard contains data related only to can (canada). To break the shard further into pieces, one needs to find a another candidate key for sharding. If there is none, as in the case of our example, one should create one. We can probably think of splitting up a country by introducing regions from within (e.g. split by provinces, by cities, or by states). In this example, we can probably pick city as a our sharding key. To illustrate how, see the following shards:

Shard #1
Server: ABC1
Database: DB1
Table: userfiles
userfiles_pk User uploaded file country_code city_code
1 John file1 grmy berlin
2 John file2 grmy berlin

 

Shard #2
Server: ABC2
Database: DB1
Table: userfiles
userfiles_pk user uploaded file country_code city_code
6 edu file1 grmy hamburg

By deciding to further subdivide a country by cities where each city becomes a shard, the following statements would be true:

  • The new sharding key is now city_code.
  • Our shard would only occupy data related to a city.
  • Our shard would only occupy data related to a city.
  • Various shards can be in the same server. Shards don’t need to be in separate servers.
  • The increase in the number of shards would also increase the amount we spend on renting SQL Azure databases. According to Wikipedia, Germany alone have 2062 cities. This is some serious monthly spending that we have here. However this example is just for illustration purposes to convey the idea of sharding. One can always pick/create the most practical and cost-effective key for further sharding to address the issue of going beyond a shard’s capacity without the spending overhead due to poor design choices.
  • At a certain point in the future, we might exceed a shard’s capacity once again breaking our design.

**************************************************
Toto Gamboa is a consultant specializing on databases, Microsoft SQL Server and software development operating in the Philippines. He is currently a member and one of the leaders of Philippine SQL Server Users Group, a Professional Association for SQL Server (PASS) chapter and is one of Microsoft’s MVP for SQL Server in the Philippines. You may reach him by sending an email to totogamboa@gmail.com

Thoughts on Designing Databases for SQL Azure – Part 2

In the first article, I showed an example how a database’s design could impact us technically and financially. And sharding isn’t all just about splitting up data. It also brings to the table a group of terrible monsters to slay. There are a lot of concerns that needs to be considered when one attempts to shard a database, especially in SQL Azure.

NoSQL, NoRel, NoACID

In breaking things apart, one is bordering on clashing religions. One monster to slay is the issue of ACIDity. People discuss NoSQL, NoRel, NoACID to be one of the trends out there. And most even swear to the fact that these approaches are better than SQL. In my case, I prefer to call it NoACID and it is not by any means more or less than SQL. I have NoACID implementations on some projects I had. And I love SQL.  To simplify, I’ll put in these trends in a NoX lump as they commonly attempt to disengage with the realities of SQL.

For me, NoX is not a religion, it is simply a requirement. The nature of the app you build will dictate if you need to comply to the principles of ACID (Atomicity, Consistency, Isolation, Durability). If ACID is required, it is required regardless of your data and storage engine or your prefered religion. If it is required, you have to support it. Most cloud apps that we see, like Google and Facebook, could probably have ACID to be absent in their requirements list. Google is primarily read only so it does make sense to have data scattered all over various servers in all continents without the need for ACID. By nature, ACID in this regard, can be very minimal or absent. Facebook on the otherhand is read/write intensive. Seems like it is driven by a massive highly sophisticated message queuing engine. Would ACID be required in Facebook? I am not quite sure about Facebook’s implementation but the way I look at it, ACID can be optional. ACID can well be present in operations concerned only to one tenant in case of an FB account. Outside of this, the absence of ACID could probably be compensated by queuing and data synching.

If Facebook and Google decided to require ACID, they could be facing concerns on locking a lot of things. While locked on, latency could be one of the consequences. It is therefore very important to lay out firsthand if ACID is a requirement or not. For a heavy transactional system, a sharded design presents a lot of obstacles to hurdle. In SQL Azure, this is even harder as SQL Azure does not support distributed transactions like we used to with SQL Server. This means, if your transaction spans across multiple shards, there is no simple way to do it as SQL Azure does not support it, thus ACID can be compromised. SQL Azure however does support local transactions. This means you can definitely perform ACIDic operations within a shard.

To be continued…

**************************************************
Toto Gamboa is a consultant specializing on databases, Microsoft SQL Server and software development operating in the Philippines. He is currently a member and one of the leaders of Philippine SQL Server Users Group, a Professional Association for SQL Server (PASS) chapter and is one of Microsoft’s MVP for SQL Server in the Philippines. You may reach him by sending an email to totogamboa@gmail.com

Thoughts on Designing Databases for SQL Azure

Cloud has been the new darling in the world of computing and Microsoft has provided once again developers a new compeling platform to build their new generation applications on. In this picture enters SQL Azure, a cloud-based edition of Microsoft’s leading database product, Microsoft SQL Server. One would ask how one designs an RDBMS database for the cloud and specifically for the Azure platform. It is basically easy to create a simple database in SQL Azure but database creation and database design are two different life forms. So we ask, how are databases be in the cloud and in SQL Azure? In this blog, I hope to provide an answer.

Scale Out

First and foremost, the cloud has been one of the primary solutions to consider when an application is in the realm of a massive scalability issue. Or to put it in another perspective, the Cloud was both a catalyst of the emergence of extremely large applications and a result of the webification of a lot of things. Design considerations need to factor in unheard of scales that can span from almost everywhere. Bandwidth needed is of the global scale, hard disk space are in the petabytes, and users are in the millions. When you talk about building an application to handle this scale, one can’t do anymore the usual stuff any one of us used to do. One now needs to think massively in terms of design. E.g. No single piece of hardware can handle the kind of scale I just mentioned. So how does one design a simple database that is intended to scale up to the degree that I just mentioned?

The Problem

Supposing, we have a great idea of developing the next big thing and we named it FreeDiskSpace. FreeDiskSpace allows us to upload massive amounts of files where files can be anything one can think of. The scale of this next big thing is thought to be in the range of having 500 million users scattered around the globe with each users allowed to upload an unlimited amount of files.

The Challenge

With such order of magnitude, the many challenges we will be facing will definitely lead and force us to question every bit of our knowledge and development capability that we might have been using when we are used to doing gigs in the few GBytes of databases with a few thousand users. If what is allowed makes each user upload files in the range of not less than 5GB, one would need to store 2,500 petabytes worth of files. How does one store 2,500 petabytes worth of files and manage the uploading/downloading of 500 million users?

In the Cloud, you are less likely to see a bunch of supercomputers handling this kind of scale. Rather, you are more likely to see hundreds of thousands of commodity servers woven together forming a cohesive unit. Making use of these mesh to behave like it is as if it is operating as a unit requires design patterns that are more appropriate for these environments.

Database Sharding in SQL Azure

First, what is database sharding. To put it simply, database sharding is a design pattern to split up data into shards or fragments where each shard can be run on its own on a separate instance running on a separate server located somewhere. Each shard contains the same structure as that of another but contains different subsets of data.

Current version of SQL Azure offers NO feature that handles automatic scaling. This will surely change in the near future as everybody doing applications handling these massive loads will somehow need an automatic way to allow the application to handle this amount of load and data easily.  One way that is guaranteed to be commonly used will be sharding databases. However for now, sharding in SQL Azure isn’t automatic. If one needs to shard, one needs to do it by hand. As I have said, in the future, Microsoft will be compelled to give us an easy way to practice sharding. Hopefully, that future is near.

Now we would ask, why do we need to shard databases in SQL Azure. The cloud and specifically with SQL Azure, with its limitations, forces us to use this design pattern. SQL Azure only allows us 50GB for a database’s maximum size on a virtual instance running on an off the  shelf commodity server. With that, we surely need a way to work around these constraints if we see our data to grow for like 500 terabytes. Second, if we foresee a huge number of users hitting our applications, doing this on a single piece of hardware would spawn concurrency nightmares that would choke our databases to death.

Though there are other approaches to solve our scaling challenges, there are other interesting motivating factors where sharding becomes a very compelling option. Firstly, it is more likely for a database server to process small databases faster than larger ones. Secondly, by utilizing multiple instances of computing power, we can spread processing load across these instances to more processes can be executed all at once.

How To Shard

Let us try to go back with our FreeDiskSpace killer app (assuming we provide each user up 1GB in database space) and practice some very simple sharding so we have a partial understanding of what sharding is all about. If we are to design the database using a single instance, the table would look like this:

Server: ABC
Database: DB1
Table: userfiles
userfiles_pk user uploaded file country_code
1 john file1 grmy
2 john file2 grmy
3 allan file1 can
4 allan file2 can
5 allan file3 can
6 edu file1 grmy
7 roman file1 ity
8 roman file2 ity
9 jon file1 can
10 jon file2 can
11 jon file3 can
Server: ABC
Database: DB1
Table: country
country_code country
grmy germany
can canada
ity italy

If we are to shard, we need to decide on how we split up our database into logical, manageable fragments. We then decide what data is going to occupy each fragment (we can probably call the occupier our “tenant”). Based on tables we see above, there are 2 fields that we can potentially use as keys for sharding. These fields are user and country. These fields should provide answers to a question: How to we sensibly split up our database? Do we have to split it up by user, or by country? Both are actually qualified to be used as basis for splitting up our database. We can investigate each though.

Sharding by field: User

Server: ABC
Database: DB1
Table: userfiles
userfiles_pk user uploaded file country_code
1 john file1 grmy
2 john file2 grmy
Server: ABC
Database: DB1
Table: country
country_code country
grmy germany
can canada
ity italy
Server: CDE
Database: DB1
Table: userfiles
userfiles_pk user uploaded file country_code
3 allan file1 can
4 allan file2 can
5 allan file3 can
Server: CDE
Database: DB1
Table: country
country_code country
grmy germany
can canada
ity italy

By deciding to make each user as a tenant in our database, the following statements would be true:

  • Our shard would only occupy data related to a single user.
  • Various shards can be in the same server. Shards don’t need to be in separate servers.
  • It would render country table a bit useless and awkward. But we can always decide to get rid of it and just maintain a column indicating a user’s country of origin.
  • Now with a one-user-one-database structure, 1 million users mean 1 million separate databases. In SQL Azure’s terms, assuming a 1GB costs us 10$ a month, designing our shards by user means we have to spend $10 million monthly. Now that is no joke!

Sharding by field: Country

Server: ABC
Database: DB1
Table: userfiles
userfiles_pk user uploaded file country_code
1 john file1 grmy
2 john file2 grmy
6 edu file1 grmy
Server: ABC
Database: DB1
Table: country
country_code country
grmy germany
can canada
ity italy
Server: CDE
Database: DB1
Table: userfiles
userfiles_pk user uploaded file country_code
3 allan file1 can
4 allan file2 can
5 allan file3 can
9 jon file1 can
10 jon file2 can
11 jon file3 can
Server: CDE
Database: DB1
Table: country
country_code country
grmy germany
can canada
ity italy

By deciding to make country as a tenant for each of our database, the following statements would be true:

  • Our shard would only occupy data related to a country.
  • Various shards can be in the same server. Shards don’t need to be in separate servers.
  • It would render country table unnecessary as each database represent a country. But we can always decide to get rid of this table.
  • Asnwer.com says there are only 196 countries in the world. With a one-country-one-database structure,  this means,  if we are getting the larger 10GB, 100$ a month SQL Azure databases, we only need to spend $19,600 a month.
  • This design decision though is pretty useful until we ran out of space in our 10GB design to contain all users related to a country. There are solutions to this problem though. However, I will try to discuss this in my future blogs.

It is pretty clear that picking the right key for sharding has obvious ramifications. It is very important for us to determine which ones are appropriate for our needs. It is obvious that picking country is the most cost-effective in this scenario while picking user could shield you much longer on the 10GB limit problem.

This article only presents partial concepts of sharding but nevertheless has given one the very basic idea of how database sharding can be applied, and how all these translate when you choose SQL Azure as a platform.

Thoughts on Designing Databases for SQL Azure – Part 2

**************************************************
Toto Gamboa is a consultant specializing on databases, Microsoft SQL Server and software development operating in the Philippines. He is currently a member and one of the leaders of Philippine SQL Server Users Group, a Professional Association for SQL Server (PASS) chapter and is one of Microsoft’s MVP for SQL Server in the Philippines. You may reach him by sending an email to totogamboa@gmail.com

My First Shot At Sharding Databases

Sometime in 2004, I was faced with a question whether to have our library system’s design, which I started way back in 1992 using Clipper on Novell Netware and later was ported to ASP/MSSQL, be torn apart and come up with a more advanced, scalable and flexible design. The usual problem I would encounter most often is that sometimes in an academic organization, there could be varying structures. In public schools, they have regional structures where libraries are shared by various schools in the region. In some organizations, a school have sister schools with several campuses each with one or more libraries in it but managed by only one entity. In one setup, a single campus can have several schools in it, with each having one or more libraries. These variations pose a lot of challenge in terms of programming and deployment. A possible design nightmare. Each school or library would often emphasize their independence and uniqueness against other schools and libraries, for example wanting to have their own library policies and control over their collections and users and customers and yet have that desire to share their resources to others and interoperate with one another. Even within inside a campus, one library can even operate on a different time schedule from the other library just a hallway apart. That presented a lot of challenge in terms of having a sound database design.

The old design from 1992 to 2004 was a single database with most tables have references to common tables called “libraries” and “organizations”. That was an attempt to partition content by libraries or organization (a school). Scaling though wasn’t a concern that time as even the largest library in the country won’t probably consume a few gigs of harddisk space. The challenge came as every query inside the system has to filter everything by library or school. As features of our library system grew in numbers and became more advanced and complex, it is apparent that the old design, though very simple when contained in a single database, would soon burst into a problem. Coincidentally though, I have been contemplating to advance the product in terms of feature set. Flexibility was my number one motivator, second was the possibility of doing it all over the web.

Then came the ultimate question, should I retain the design and improve on it, or should I be more daring and ambitious. I scoured over the Internet for guidance of a sound design and after a thorough assessment of current and possibly future challenges that would include scaling, I ended up with a decision to instead break things apart and abandon the single database mindset. The old design went into my garbage bin. Consequently, that was the beginning of my love of sharding databases to address issues of library organization, manageability and control and to some extent, scalability.

The immediate question was how I am gonna do the sharding. Picking up the old schema from the garbage bin, it was pretty obvious that breaking them apart by libraries is the most logical. I haven’t heard the concept of a “tenant” then, but I dont have to as the logic behind in choosing it is as ancient as it can be. There were other potential candidate for keys to shard the database like “schools” or “organization”, but the most logical is the “library”. It is the only entity that can stand drubbing and scrubbing. I went on to design our library system with each database containing only one tenant, the library. As of this writing, our library system have various configurations: one school have several libraries inside their campus, another have several campuses scattered all over metro manila with some campus having one or more libraries but everything sits on a VPN accessing a single server.

Our design though is yet to become fully sharded at all levels as another system acts as a common control for all the databases. This violates the concept of a truly sharded design where there should be no shared entity among shards. Few tweaks here and there though would fully comply with the concept. Our current design though is 100% sharded at the library level.

So Why Sharding?

The advent of computing in the cloud present to us new opportunities, especially with ISVs. With things like Azure, we will be forced to rethink our design patterns. The most challenging perhaps is on how to design not only to address concerns of scalability, but to make our applications tenant-aware and tenant-ready. This challenge is not only present in the cloud, but a lot of on-premise applications can be designed this way. This could help in everyone’s eventual transition to the cloud. But cloud or not, we could benefit a lot on sharding. In our case, we can pretty much support any configuration out there. We also got to mimic the real world operation of libraries. And it eases up on a lot of things like security and control.

Developing Sharded Applications

Aside from databases, applications need to be fully aware that it is not anymore accessing a single database where it can easily query everything with ease without minding other data exists somewhere. Could be on a different database, sitting on another server. Though the application will be a bit more complex in terms of design, often, it is easier to comprehend and develop if you have an app instance mind only a single tenant as oppose to an app instance trying to filter out other tenants just to get the information set of just one tenant.

Our Library System

Currently our library system runs on sharded mode both on premise and on cloud-like hosted environments. You might want to try its online search:

Looking at SQL Azure

Sharding isn’t automatic to any Microsoft SQL Server platform including SQL Azure. One needs to do it by hand and from ground up. This might change in the future though. I am quite sure Microsoft will see this compelling feature. SQL Azure is the only Azure based product that currently does not have natural/inherent support for scaling out.  If I am Microsoft, they should offer a full SQL Server service like shared windows hosting sites do along side SQL Azure so it eases up adoption. Our systems database design readiness (being currently sharded) would allow us to easily embrace the new service. But I understand, it would affect, possibly dampen their SQL Azure efforts if they do it. But I would try to reconsider it than offering a very anemic product.

As of now, though we may have wanted to take our library system to Azure with few minor tweaks, we just can’t in this version of SQL Azure for various reasons as stated below:

  • SQL Server Full Text Search. SQL Azure does not support this in its current version.
  • Database Backup and Restore. SQL Azure does not support this in its current version.
  • Reporting Services. SQL Azure does not support this in its current version.
  • $109.95 a month on Azure/SQL Azure versus $10 a month shared host with a full-featured IIS7/SQL Server 2008 DB

My Design Paid Off

Now I am quite happy that the potentials of a multi-tenant, sharded database design, though is as ancient, it is beginning to get attention with the advent of cloud computing. My 2004 database design is definitely laying the groundwork for cloud computing adoption. Meanwhile, I have to look for solutions to address what’s lacking in SQL Azure. There could be some easy work around.

I’ll find time on the technical aspects of sharding databases in my future blogs. I am also thinking that PHISSUG should have one of this sharding tech-sessions.

**************************************************
Toto Gamboa is a consultant specializing on databases, Microsoft SQL Server and software development operating in the Philippines. He is currently a member and one of the leaders of Philippine SQL Server Users Group, a Professional Association for SQL Server (PASS) chapter and is one of Microsoft’s MVP for SQL Server in the Philippines. You may reach him by sending an email to totogamboa@gmail.com

Speed Matters : Subquery vs Table Variable vs Temporary Table

This has been perhaps written quite a number of times by a lot of TSQL gurus out there but earlier I was presented with a TSQL code that prompted me to recheck my notions on subqueries, table variables and temporary tables. The TSQL code presented to me is something built dynamically that depends heavily on user inputs. The issue presented was performance. The code used to run acceptably fine until earlier when things are annoyingly slow.

Upon seeing the code, I said to myself, problem solved. Instantly, I thought I was able to spot what is causing the performance issue. The code was written using TSQL’s temporary tables. Based from experience, I would automatically shoot down any attempt to use temporary tables when executing a TSQL batch. And though Microsoft has introduced table variables as a better alternative to temporary tables, better in terms of technicality and programmability, delving with the two options would always be my last resort. In most cases, I’d rather tackle difficult scenarios requiring handling of temporal data using subqueries. In fact, I immediately ruled a rewrite of the code to minimize the use of temporary tables, and at least replace it with table variables which were supposed to be a lot better. But upon further investigation, I ruled a rewrite because of some other things not related to the topics at hand here. On the other hand, am trying to figure out how things can be converted using subqueries which I would primarily prefer.

In my investigation though, something interesting came up and I thought I’d better time the executions of my experiment while I make the templates of how things are done using subqueries. The results now intrigues me a bit, and probably would take a second look at the code I was shown earlier. I might be looking at the wrong place. Nevertheless, the results of my experiment might interest others out there. So here it is:

The Experiment 

I always love to use one of our company’s product, the library system, to test my experiments with anything to do with TSQL as its database is as diverse in design and available test data is in great abundance. I picked one small database sample for my experiment and this is how it went:

  • Test 1 is intended to check how different approaches fare on small sets of data. I have to query all library materials that were written in ‘Filipino’ (which would result into 974 rows out of the 895701 rows in a single table). Based on the result, I have to search for a string pattern “bala” in all of the materials’ titles using the LIKE operator which would eventually result to 7 rows.
  • Test 2 is basically the same query against the same environment with different query conditions to see how the three approaches fare on large sets of data. So I queried all library materials that were written in ‘English’ (which would result into 43896 rows out of the 895701 rows). Based on the result, I have to search for a string pattern “life” in all of the materials’ titles using the LIKE operator which would eventually result to 440 rows.

I wrote three queries that each would use a specific approach to query my desired output in this experiment.

First query using SUB QUERIES:

Second query uses TABLE VARIABLES:

The third and last query uses TEMPORARY TABLES:

I run all three queries (3x each) after a service restart to clean-up tempdb and I got the following results. Figures are presented in the number of seconds:

  Total Rows Temporal Data Count Final Result Sub Queries Table Variables Temporary Tables
Test 1 895701 rows 974 rows 7 rows 2 secs 4 secs 2 secs
Test 2 895701 rows 43896 rows 440 rows 3 secs 23 secs 4 secs

 

The results pretty much proved interesting. I thought, based on the queries I wrote that temporary tables would be slightly slower of the three approaches. The execution plan for table variables and temporary tables are pretty much the same. I was a bit surprised that the execution time of the table variable is almost twice that of a temporary table on small sets of data but temporary tables trumps table variables by a large margin in large set of data. I was expecting a slight difference only as, though both are a bit different in terms of scoping and technicality, and to some extent purpose, both uses TEMPDB as their memory (contrary to the notion that table variables use in-memory) when handling data. What is notable though is that, disk I/O is costlier with Table Variables. I am not sure at this point what causes this cost. Will try to dig deeper when I have more time.

Nevertheless, this particular experiment does not provide a general conclusion that one of the three approaches is better than among the three. I still subscribe to the general notions:

  • That sub queries are neat and would perform better in most if not all cases
  • That temporary tables would be slower on environments with heavy concurrency as sql server would handle more locks especially when using it within a transaction. I have not tested this scenario here though.
  • That temporary tables may be a faster solution if constraints/indexes are needed in between queries, especially when a large set of data is expected, statistics on table variables columns arent created and it can’t have indexes. But surprisingly, this experiment shows Table Variables performed poorly among the three with handling large data set as being its slowest.
  • That table variables are a good option technically/programmatically

You just be the judge. It would be prudent to conduct tests. I can’t say my experiment here is conclusive at best, but from hereon, I would avoid using Table Variables whenever I can. For very simple queries like in the tests, it performed poorly. I can’t imagine how things will be with larger datasets and more complex queries.

I am on to my next experiment!

**************************************************
Toto Gamboa is a consultant specializing on databases, Microsoft SQL Server and software development operating in the Philippines. He is currently a member and one of the leaders of Philippine SQL Server Users Group, a Professional Association for SQL Server (PASS) chapter and is one of Microsoft’s MVP for SQL Server in the Philippines. You may reach him by sending an email to totogamboa@gmail.com

Rationalizing Cloud Computing

For the past couple of years, people have been hot on cloud computing bandwagon. And as in any case of a birth of a new trend, people tend to get into misconceptions that eventually led them to get burned. Some just plainly embrace something without bothering to find out what is in it for them.

As a co-owner of a very small ISV trying to grapple with trends in computing and making things work for clients and stakeholders, I would always submit to the pressures of looking into the possibility of taking advantage of what a new trend can offer, cloud computing included. And during these times, I have significantly looked into what cloud computing brings to the table and how my company can take advantage of them. I have at least looked into Azure and non-Azure offerings and have attempted to assemble a design for our company’s products that can take advantage of the good stuff that are in the cloud.

Though my company is definitely a cloud computing enthusiast as some efforts are actually being done to take advantage of the cloud, as far as I am concerned, it won’t be for everybody. But recently, there is a rise of people who are investigating their possibilities with the cloud. This is maybe due to the advent of Microsoft’s farm betting Azure or perhaps there are just too many misconceptions peddled here and there. In most cases, I always see some wrong assumptions being tossed into every cloud discussion that I have had. Clients too are asking me about possibilities of being in the cloud based on wrong assumptions fed to them.

But before I get into those misconceptions, I would highlight the traits that made me and my company embrace cloud computing:

  • Cloud is the great equalizer. For very small ISVs like my company, I can see cloud as a leveling of playing field against the big guys, at least on the issue on having to spend upfront on system infrastructure to offer something for my market. For under $100 a month, a small ISV can start big.
  • Reach. My company would be able to increase reach without much overhead.
  • Scalability. The fact that I would be able to scale as the need arises, cloud computing is definitely for companies like mine.
  • Administration. Being a very small ISV, system administration tasks such as deployment, support and maintenance takes too much toll on our internal resources. Having an easily accessible uniform environment like the cloud, it would allow my company to increase the number of clients to attend to without adding too much stress on our internal resource.

There are other niceties that made me embrace cloud computing, but the above items mentioned are the major factors that I believe can only be achieved through the cloud.

As convinced as I am that cloud computing is one area that my company should invest, I am also convinced that cloud computing is definitely not for everybody and definitely not for every situation out there, at least for its current incarnation. I’d probably list down 5 misconceptions that I often encounter. My rationalization of these misconceptions will be based on what I know based on the efforts my company is doing in the cloud. These includes mostly Azure based clouds as well as few non Azure ones.

Top 5 Misconceptions I Often Encounter

  1. When I move my existing application from on-premise to cloud, my app will become scalable. Well, NOT without rewriting your apps to scale-out and be cloud-ready. In the cloud, scaling is not all automatic. For example, If you have an application that is not designed to scale out, it will not scale-out in the cloud. By scaling out, you have to know how to partition your storage in multiple storage instances. And instances can be spread geographically across continents. To scale-out, one has to start from scale-out design patterns, probably from scratch. Lucky you are if your current application is designed to scale-out.
  2. I can easily move current on-premise application to the cloud. Contrary to peddled notion that you can easily move current on-premise application (legacy) to the cloud, it is really based on what you actually expect to benefit from the cloud. Like in Azure for example, it supports moving legacy applications to the cloud by using virtual machines to host legacy apps (e.g. hosting a Win Forms based app). But this approach for me is inefficient unless all you want to do is transfer from on-premise to the cloud to perhaps get rid of system administration issues while ignoring other implications and issues like performance, latency, security, etc. You would have to figure out new issues as simple as backing up your data in the cloud. If your legacy app supports features available in on-premise environments, you might encounter problems running them in the cloud as current generation clouds do not support all features that are mostly staple in the on-premise environments.
  3. I have been designing on-premise apps for years, it is enough skill set to get me to the cloud. It is if you were designing on-premise apps using technologies and patterns similar to that available in the cloud. If not, you have to take a look at the cloud differently. If you want to write apps for the cloud, you have to think cloud. Scaling is probably the one issue that would force you to think cloud. If we are used to design our apps with a single instance mindset, it is about time to think differently and think multiple. Though of course, no one is stopping us to write apps to the cloud that way, but that is not real cloud apps are supposed to. Single instance design won’t scale even if you run it over current generation cloud. In SQL Azure, we have a 10GB database size limit. If we exceed the 10GB ceiling, what would our apps do? This seemingly easy to answer question could probably discourage anyone to embrace cloud.
  4. I just have to design apps like I use to and deploy it in the cloud, and the technologies behind it takes care of the rest. This can’t be. If you were into scale-out designs by now, chances are, you wouldn’t think of writing an app and deploying it over the cloud. Chances are, you might have deployed a cloud app already or you probably didn’t know that your design approaches are very much like those implemented over the cloud. But for most, I doubt that it’s the case. Most on-premise apps are designed not to scale-out but to scale-up. For example, most SQL Server implementations just take advantage of new hardware to scale as added load comes. Intrinsically, Microsoft SQL Server is not designed to automatically load balance, for example, a single query statement. One has to learn how to partition tables into a single view with underlying partitions spread across a federation of servers so SQL Server load balances a query. This is not supported in SQL Azure though. However, even if this is supported in SQL Azure in the future, this is not the scaling wonders the cloud provides, your application still sees only a single instance of a view. With today’s multi-million users accessing just one web application at one point in time, you can only scale as much. With cloud computing, your scaling possibilities are like crazy. You don’t have limits in terms of servers, of CPUs, of storage. Having resources like this would force you to rethink of the way you design your applications. The cloud isn’t about the technologies behind. For me it is one big design consideration. It is about the way you design your applications to take advantage of the unlimited resources found in the cloud.
  5. Any apps can or should be placed in the cloud. No one will stop you if you place anything regardless of how it will end up in the cloud. However, there could be applications that are better off off the cloud. For example, if you have an app with one massive synchronous process that you think has no way of being broken down into several pieces, you might as well stick to scaling-up on premise (unless your cloud offers hybrid services like allowing you to scale-up with a very powerful server). For apps requiring heavy informational security and if you aren’t comfortable with the notion that someone from your cloud provider might get into your highly secured data, you might as well have things on premise. There could be a lot of other applications that may not be practical in placed in the cloud.

It is very important that the cloud is understood clearly before everyone gets excited. While I am embracing it, I am welcoming it with a cautious attitude. Being relatively new, there are just too many considerations to think of, and too many questions that don’t even have clear answers for the time being.

**************************************************
Toto Gamboa is a consultant specializing on databases, Microsoft SQL Server and software development operating in the Philippines. He is currently a member and one of the leaders of Philippine SQL Server Users Group, a Professional Association for SQL Server (PASS) chapter and is one of Microsoft’s MVP for SQL Server in the Philippines. You may reach him by sending an email to totogamboa@gmail.com

Why Am I Still Stuck with T-SQL?

Not a day pass without something new coming out from software companies like Microsoft. And it has been a challenge to keep up for application developers like me. I happen to start my all-Microsoft stack effort during the heydays of Visual Basic 3 and Access. Prior to that, I was a Borland kind of kid mesmerized at how neat my code was when printed on reams of continuous paper.

I was really fast then in absorbing new software technologies for application development and related products but I am no longer a kid I used to be. I am now a slow slumbering oldie when it comes to absorbing software development technologies. I actually envy those guys now who are doing and using technologies that just came out of the baking oven. I feel they are so smart to figure out new things so fast. Isn’t that great?

And every time I get to mingle with these developers, they often wonder why I am still stuck with Transact-SQL (T-SQL). Every time .. and it never fails. It now makes me wonder why I am stuck with T-SQL when there are a lot of new alternatives. This makes me beg to question if I am still relevant with the times. Well, let’s see.

Am I in? Or Out?

My first serious brush with T-SQL was when I was contracted to develop a school library system. I thought of using VB3 + Access but being a fast kid then, I opted to choose what was the latest and ended up using VB4 and SQL Server 6.5. I was mostly doing VB code and using DAO/ADO to connect to my databases and still doing client-side cursors when going through my tables here and there. I had trouble adapting to sql’s set-based processing and mindset with my Clipper-heavy and Access background. In no time, I was able to absorb T-SQL and began moving some of my data manipulation code to stored procedures.

When VB5 came out I decided to upgrade the school library application with an improved database structure with all data manipulation in stored procedures. This time, no more non-TSQL code for my data. I was able to re-use some TSQL code taken from the app’s previous version.

VB6 came out and Microsoft touted a better data access component in RDO. Around that time, I was able to get more libraries to use my system so I virtually kept up with anything new from Microsoft. I upgraded the front-end portion of my library system while I was able to re-use all my stored procedures.

Shortly after, the Web’s irresistible force dawned on me and I took up ASP and VB Script and migrated a portion of my library application to face the web. During this time, I also upgraded to SQL Server 7.0. I had some inline SQL codes which were prone to SQL Injection but I was able to retain all my stored procedures.

When .NET came out, I had my library system upgraded to an entirely new schema, platform and language (ASP.NET, Windows Forms, ADO.NET, SQL Server 2000/2005). This time, I hired somebody else to code it for me.

Never Obsolete

With all the changes I made to the library system, the only technology that remained constant was T-SQL. In most cases, I was able to re-use code. In all cases, I was able to take advantage of the benefits of re-using my T-SQL experience .. all these while I managed to bulk up my knowledge on T-SQL.

VB4 is gone; my T-SQL is still here. VB5 is gone; my T-SQL is still here. VB6 is gone; my T-SQL is still here. ASP is gone; still my T-SQL is here. DAO, ADO, and RDO are gone but my T-SQL remained. I moved from VB to C#, yet I am still using T-SQL.

Today we have ADO.NET, we have LINQ. Soon they will be gone (I have heard LINQ-To-SQL is now deprecated). And tomorrow I can probably move from ASP.NET to Silverlight or some else new, or from ASP.NET to PHP, but I have the feeling I still be using T-SQL. Microsoft is even realizing in going back to something as basic as tables and queues with Azure but it can’t ignore T-SQL, thus we have SQL Azure.

Betting My Future

I am inclined to think that my investments on T-SQL have already been paid back immensely and I am still reaping its benefits. With the advent of the relatively new cloud computing, and having various players offering various cloud computing technologies and services, I can’t help the urge in identifying which part of the cloud computing technology stack will survive against the onslaught of constant change and would manage to stay relatively stable. I am afraid some of our current investments to other technologies wont be as useful in the cloud but I am betting my future once again with SQL.

What about you?

**************************************************
Toto Gamboa is a consultant specializing on databases, Microsoft SQL Server and software development operating in the Philippines. He is currently a member and one of the leaders of Philippine SQL Server Users Group, a Professional Association for SQL Server (PASS) chapter and is one of Microsoft’s MVP for SQL Server in the Philippines. You may reach him by sending an email to totogamboa@gmail.com