Thoughts on Designing Databases for SQL Azure – Part 3

In the first article of this series, I raised an issue considered to be one of sharding’s oddities. The issue raised was what would one do should a single tenant occupying a shard exceeds a shard’s capability (e.g. in terms of storage and computing power). The scenario I was referring in the first article was that I opted to choose “country” as my way of defining a tenant (or sharding key). In this iteration, I’ll once again attempt to share my thoughts on how I would approach the situation.

Off the bat, I’d probably blurt out the following when ask how to solve this issue:

  • Increase the size of the shard
  • Increase the computing power of the machine where the shard is situated

On in-premise sharding implementations, throwing in more hardware is easier to accomplish. However, doing the above suggestions when you are using SQL Azure, is easier said than done. Here is why:

  • Microsoft limits SQL Azure’s database sizes to 1GB, 5GB, and 50GB chunks.
  • The computing instance of where a shard can reside in SQL Azure is as finite as well

I have heard of unverified reports that Microsoft allows on a case-to-case basis to increase an SQL Azure’s database size to more than 50GB and probably situate a shard on some fine special rig. This however leads to a question on how much Microsoft allows each and every SQL Azure subscriber to avail of such special treatment. And it could probably cost one a fortune to get things done this way.

However, there are various ways to circumvent on the issue at hand without getting special treatment. One can also do the following:

  • You can tell your tenant not to grow big and consume much computing power (Hey … Flickr does this. :P)
  • You can probably shard a shard. Sometimes, things can really go complicated but anytime of the day, one can chop into pieces a shard. Besides, at this point, you could have probably eaten sharding for breakfast, lunch and dinner.

So how does one shard a shard?

In the first part of this series, I used as an example of sharding a database by Country. To refresh, here is an excerpt from the first article:

Server: ABC
Database: DB1
Table: userfiles
userfiles_pk user uploaded file country_code
1 john file1 grmy
2 john file2 grmy
6 edu file1 grmy
Server: ABC
Database: DB1
Table: country
country_code country
grmy germany
can canada
ity italy
Server: CDE
Database: DB1
Table: userfiles
userfiles_pk user uploaded file country_code
3 allan file1 can
4 allan file2 can
5 allan file3 can
9 jon file1 can
10 jon file2 can
11 jon file3 can
Server: CDE
Database: DB1
Table: country
country_code country
grmy germany
can canada
ity italy

In the sample above, the first shard contains data related only to grmy (germany) and the second shard contains data related only to can (canada). To break the shard further into pieces, one needs to find a another candidate key for sharding. If there is none, as in the case of our example, one should create one. We can probably think of splitting up a country by introducing regions from within (e.g. split by provinces, by cities, or by states). In this example, we can probably pick city as a our sharding key. To illustrate how, see the following shards:

Shard #1
Server: ABC1
Database: DB1
Table: userfiles
userfiles_pk User uploaded file country_code city_code
1 John file1 grmy berlin
2 John file2 grmy berlin

 

Shard #2
Server: ABC2
Database: DB1
Table: userfiles
userfiles_pk user uploaded file country_code city_code
6 edu file1 grmy hamburg

By deciding to further subdivide a country by cities where each city becomes a shard, the following statements would be true:

  • The new sharding key is now city_code.
  • Our shard would only occupy data related to a city.
  • Our shard would only occupy data related to a city.
  • Various shards can be in the same server. Shards don’t need to be in separate servers.
  • The increase in the number of shards would also increase the amount we spend on renting SQL Azure databases. According to Wikipedia, Germany alone have 2062 cities. This is some serious monthly spending that we have here. However this example is just for illustration purposes to convey the idea of sharding. One can always pick/create the most practical and cost-effective key for further sharding to address the issue of going beyond a shard’s capacity without the spending overhead due to poor design choices.
  • At a certain point in the future, we might exceed a shard’s capacity once again breaking our design.

**************************************************
Toto Gamboa is a consultant specializing on databases, Microsoft SQL Server and software development operating in the Philippines. He is currently a member and one of the leaders of Philippine SQL Server Users Group, a Professional Association for SQL Server (PASS) chapter and is one of Microsoft’s MVP for SQL Server in the Philippines. You may reach him by sending an email to totogamboa@gmail.com

Advertisement

Thoughts on Designing Databases for SQL Azure – Part 2

In the first article, I showed an example how a database’s design could impact us technically and financially. And sharding isn’t all just about splitting up data. It also brings to the table a group of terrible monsters to slay. There are a lot of concerns that needs to be considered when one attempts to shard a database, especially in SQL Azure.

NoSQL, NoRel, NoACID

In breaking things apart, one is bordering on clashing religions. One monster to slay is the issue of ACIDity. People discuss NoSQL, NoRel, NoACID to be one of the trends out there. And most even swear to the fact that these approaches are better than SQL. In my case, I prefer to call it NoACID and it is not by any means more or less than SQL. I have NoACID implementations on some projects I had. And I love SQL.  To simplify, I’ll put in these trends in a NoX lump as they commonly attempt to disengage with the realities of SQL.

For me, NoX is not a religion, it is simply a requirement. The nature of the app you build will dictate if you need to comply to the principles of ACID (Atomicity, Consistency, Isolation, Durability). If ACID is required, it is required regardless of your data and storage engine or your prefered religion. If it is required, you have to support it. Most cloud apps that we see, like Google and Facebook, could probably have ACID to be absent in their requirements list. Google is primarily read only so it does make sense to have data scattered all over various servers in all continents without the need for ACID. By nature, ACID in this regard, can be very minimal or absent. Facebook on the otherhand is read/write intensive. Seems like it is driven by a massive highly sophisticated message queuing engine. Would ACID be required in Facebook? I am not quite sure about Facebook’s implementation but the way I look at it, ACID can be optional. ACID can well be present in operations concerned only to one tenant in case of an FB account. Outside of this, the absence of ACID could probably be compensated by queuing and data synching.

If Facebook and Google decided to require ACID, they could be facing concerns on locking a lot of things. While locked on, latency could be one of the consequences. It is therefore very important to lay out firsthand if ACID is a requirement or not. For a heavy transactional system, a sharded design presents a lot of obstacles to hurdle. In SQL Azure, this is even harder as SQL Azure does not support distributed transactions like we used to with SQL Server. This means, if your transaction spans across multiple shards, there is no simple way to do it as SQL Azure does not support it, thus ACID can be compromised. SQL Azure however does support local transactions. This means you can definitely perform ACIDic operations within a shard.

To be continued…

**************************************************
Toto Gamboa is a consultant specializing on databases, Microsoft SQL Server and software development operating in the Philippines. He is currently a member and one of the leaders of Philippine SQL Server Users Group, a Professional Association for SQL Server (PASS) chapter and is one of Microsoft’s MVP for SQL Server in the Philippines. You may reach him by sending an email to totogamboa@gmail.com

Going Down The Decision Making Lane: Azure vs Hosted vs On-premise

Cloud computing has brought new possibilities and challenges to ISVs like my company. It has also brought along a lot of seemingly difficult questions that has no direct answers. However, for anyone looking at the possibility of investing its future on new technologies, a lot of questions needed some answers.

In this blog, I have come up with a list of concerns, cases, and questions that would usually be essential when choosing the right decision. To make things a lot clearer, I would probably use a real world example based from my own circumstances to weigh things down between competing platforms that are all worth exploring.

My Case

In my last cloud computing related blog (http://totogamboa.com/2010/12/11/my-first-shot-at-sharding-databases/), I mentioned one of my company’s products that is being considered going beyond the usual realm of on-premise applications. In this blog, I continue to explore new possibilities for this application as we enter a new era of cloud computing.

The obvious question for our company now is where to go from here as far as our library system is concerned.

My Options

I have with me several platforms to choose. All is begging to be explored further, all is wanting attention, to be leveraged, used and abused. Since our library system is heavily glued to Microsoft technologies, my shortlist would naturally explore the several platforms where Microsoft technologies claim market and mindshare dominance.

  1. On-Premise. Microsoft boast a powerful array of on-premise technologies that anyone can take advantage of and have their products comfortably sit on. I’d say that I would be looking up for a typical setup of Windows Server, Internet Information Services/ASP.NET, and SQL Server sitting in our client’s datacenter.
  2. Hosted. Microsoft also has proven strongly that their technologies can be set to serve as hosts over the Internet. This can be proven by the large number of providers of hosting sites that offer Microsoft technologies.  Hosting providers typically offer the same set of technologies one can setup on-premise. Typical setup would be Windows Server, Internet Information Services/ASP.NET, and SQL Server hosted in the provider’s datacenters.
  3. Azure. The relatively new kid on the block is the stack of Microsoft technologies designed mainly as services over the Internet. The stack is composed of Azure technologies and the entirety is touted to be the platform of the future. The new setup includes Windows Azure, a web role type supporting IIS/ASP.NET, and SQL Azure. All these are managed by Microsoft and situated in their humungous datacenters scattered around the globe.

My Possibilities

Picking a choice isn’t always easy, especially when matters discussing the right software solution for an individual’s case. But I would like to attempt to see where I would go from here as far as our library system go. With today’s options and the complexity that comes along with each at our disposal, things, like getting a sound technology decisions, are harder to come by than before.

In this exercise, I prepared important points that need to be clarified to come up with the right, if not better solution. Do take note though that I am not generalizing here by just focusing on technologies that Microsoft provides out there. Remember I am working on answers based only on my circumstances and published specifications pulled from the web.

Here is a compilation of my personal list of concerns in trying to go through a decision making process that would shape up the future of our library system.

Concerns Azure (As of Dec 2010) Hosted On-Premise
What hardware do I have to provide? front side only front side only front and end sides
Is redundancy possible? Yes Yes Yes
Is scaling up possible? Up to 8 CPU (8 x 1.6GHz) only16GB RAM Yes by special arrangement Best money can buy
Is scaling out possible? Yes Yes Yes
Is scaling down possible? Yes No No1
Is Full Text Search available? No Yes Yes
Is database backup & restore for SQL Server available? No. Only possible via SSIS. Yes Yes
Need to rewrite library system? Yes No No
How do I scale out? SQL Azure / sharding SQL Server / sharding SQL Server / sharding / horizontal partitioning
Administration/operation of infrastructure Done by Microsoft Done by provider Done by client/us
Visual Studio Support Yes Yes Yes
ASP.NET Webforms or MVC Support Yes Yes Yes
SQL Server Support SQL Azure SQL Server SQL Server
SQL Server Degree of Capability Subset of SQL Server only Full feature Full feature
Platform Portability If I write on Azure, it will run only on Azure, unless there is a small scale on-premise Azure that can be deployed in the future. If I write on Hosted Environment, it can run also On-Premise If I write On-Premise, it can also run Hosted

1 There is wastage of hardware resources if system is not designed to scale down. Hardware stays when capacity is scaled down. It only remains underutilized.

Summing It Up

At this point in time, the best option out there for my company is to remain in Hosted environments, as seen in information provided in the table of concerns above. All of my requirements needed from SQL Server are available. It is simply that there are still the missing pieces in SQL Azure that prevents us from fully adopting it. The company should also start distancing itself from on-premise implementations. This doesn’t mean though that both Azure and on-premise options are no good. Azure is shaping up to be a formidable platform of choice, especially for projects that are yet to be developed. Soon, and I am quite sure fulltext search capability will be supported in SQL Azure. When that happens, I will have to go through this same exercise all over again.

SQL Azure

In the end, as an database person, it has always been the issue of which RDBMSes may provide for me in various platforms available out there. For now, it is still the non-cloud version of SQL Server that seals the deal for me. SQL Azure isn’t far behind. It is just a fulltext search away! 🙂

**************************************************
Toto Gamboa is a consultant specializing on databases, Microsoft SQL Server and software development operating in the Philippines. He is currently a member and one of the leaders of Philippine SQL Server Users Group, a Professional Association for SQL Server (PASS) chapter and is one of Microsoft’s MVP for SQL Server in the Philippines. You may reach him by sending an email to totogamboa@gmail.com

My First Shot At Sharding Databases

Sometime in 2004, I was faced with a question whether to have our library system’s design, which I started way back in 1992 using Clipper on Novell Netware and later was ported to ASP/MSSQL, be torn apart and come up with a more advanced, scalable and flexible design. The usual problem I would encounter most often is that sometimes in an academic organization, there could be varying structures. In public schools, they have regional structures where libraries are shared by various schools in the region. In some organizations, a school have sister schools with several campuses each with one or more libraries in it but managed by only one entity. In one setup, a single campus can have several schools in it, with each having one or more libraries. These variations pose a lot of challenge in terms of programming and deployment. A possible design nightmare. Each school or library would often emphasize their independence and uniqueness against other schools and libraries, for example wanting to have their own library policies and control over their collections and users and customers and yet have that desire to share their resources to others and interoperate with one another. Even within inside a campus, one library can even operate on a different time schedule from the other library just a hallway apart. That presented a lot of challenge in terms of having a sound database design.

The old design from 1992 to 2004 was a single database with most tables have references to common tables called “libraries” and “organizations”. That was an attempt to partition content by libraries or organization (a school). Scaling though wasn’t a concern that time as even the largest library in the country won’t probably consume a few gigs of harddisk space. The challenge came as every query inside the system has to filter everything by library or school. As features of our library system grew in numbers and became more advanced and complex, it is apparent that the old design, though very simple when contained in a single database, would soon burst into a problem. Coincidentally though, I have been contemplating to advance the product in terms of feature set. Flexibility was my number one motivator, second was the possibility of doing it all over the web.

Then came the ultimate question, should I retain the design and improve on it, or should I be more daring and ambitious. I scoured over the Internet for guidance of a sound design and after a thorough assessment of current and possibly future challenges that would include scaling, I ended up with a decision to instead break things apart and abandon the single database mindset. The old design went into my garbage bin. Consequently, that was the beginning of my love of sharding databases to address issues of library organization, manageability and control and to some extent, scalability.

The immediate question was how I am gonna do the sharding. Picking up the old schema from the garbage bin, it was pretty obvious that breaking them apart by libraries is the most logical. I haven’t heard the concept of a “tenant” then, but I dont have to as the logic behind in choosing it is as ancient as it can be. There were other potential candidate for keys to shard the database like “schools” or “organization”, but the most logical is the “library”. It is the only entity that can stand drubbing and scrubbing. I went on to design our library system with each database containing only one tenant, the library. As of this writing, our library system have various configurations: one school have several libraries inside their campus, another have several campuses scattered all over metro manila with some campus having one or more libraries but everything sits on a VPN accessing a single server.

Our design though is yet to become fully sharded at all levels as another system acts as a common control for all the databases. This violates the concept of a truly sharded design where there should be no shared entity among shards. Few tweaks here and there though would fully comply with the concept. Our current design though is 100% sharded at the library level.

So Why Sharding?

The advent of computing in the cloud present to us new opportunities, especially with ISVs. With things like Azure, we will be forced to rethink our design patterns. The most challenging perhaps is on how to design not only to address concerns of scalability, but to make our applications tenant-aware and tenant-ready. This challenge is not only present in the cloud, but a lot of on-premise applications can be designed this way. This could help in everyone’s eventual transition to the cloud. But cloud or not, we could benefit a lot on sharding. In our case, we can pretty much support any configuration out there. We also got to mimic the real world operation of libraries. And it eases up on a lot of things like security and control.

Developing Sharded Applications

Aside from databases, applications need to be fully aware that it is not anymore accessing a single database where it can easily query everything with ease without minding other data exists somewhere. Could be on a different database, sitting on another server. Though the application will be a bit more complex in terms of design, often, it is easier to comprehend and develop if you have an app instance mind only a single tenant as oppose to an app instance trying to filter out other tenants just to get the information set of just one tenant.

Our Library System

Currently our library system runs on sharded mode both on premise and on cloud-like hosted environments. You might want to try its online search:

Looking at SQL Azure

Sharding isn’t automatic to any Microsoft SQL Server platform including SQL Azure. One needs to do it by hand and from ground up. This might change in the future though. I am quite sure Microsoft will see this compelling feature. SQL Azure is the only Azure based product that currently does not have natural/inherent support for scaling out.  If I am Microsoft, they should offer a full SQL Server service like shared windows hosting sites do along side SQL Azure so it eases up adoption. Our systems database design readiness (being currently sharded) would allow us to easily embrace the new service. But I understand, it would affect, possibly dampen their SQL Azure efforts if they do it. But I would try to reconsider it than offering a very anemic product.

As of now, though we may have wanted to take our library system to Azure with few minor tweaks, we just can’t in this version of SQL Azure for various reasons as stated below:

  • SQL Server Full Text Search. SQL Azure does not support this in its current version.
  • Database Backup and Restore. SQL Azure does not support this in its current version.
  • Reporting Services. SQL Azure does not support this in its current version.
  • $109.95 a month on Azure/SQL Azure versus $10 a month shared host with a full-featured IIS7/SQL Server 2008 DB

My Design Paid Off

Now I am quite happy that the potentials of a multi-tenant, sharded database design, though is as ancient, it is beginning to get attention with the advent of cloud computing. My 2004 database design is definitely laying the groundwork for cloud computing adoption. Meanwhile, I have to look for solutions to address what’s lacking in SQL Azure. There could be some easy work around.

I’ll find time on the technical aspects of sharding databases in my future blogs. I am also thinking that PHISSUG should have one of this sharding tech-sessions.

**************************************************
Toto Gamboa is a consultant specializing on databases, Microsoft SQL Server and software development operating in the Philippines. He is currently a member and one of the leaders of Philippine SQL Server Users Group, a Professional Association for SQL Server (PASS) chapter and is one of Microsoft’s MVP for SQL Server in the Philippines. You may reach him by sending an email to totogamboa@gmail.com

Rationalizing Cloud Computing

For the past couple of years, people have been hot on cloud computing bandwagon. And as in any case of a birth of a new trend, people tend to get into misconceptions that eventually led them to get burned. Some just plainly embrace something without bothering to find out what is in it for them.

As a co-owner of a very small ISV trying to grapple with trends in computing and making things work for clients and stakeholders, I would always submit to the pressures of looking into the possibility of taking advantage of what a new trend can offer, cloud computing included. And during these times, I have significantly looked into what cloud computing brings to the table and how my company can take advantage of them. I have at least looked into Azure and non-Azure offerings and have attempted to assemble a design for our company’s products that can take advantage of the good stuff that are in the cloud.

Though my company is definitely a cloud computing enthusiast as some efforts are actually being done to take advantage of the cloud, as far as I am concerned, it won’t be for everybody. But recently, there is a rise of people who are investigating their possibilities with the cloud. This is maybe due to the advent of Microsoft’s farm betting Azure or perhaps there are just too many misconceptions peddled here and there. In most cases, I always see some wrong assumptions being tossed into every cloud discussion that I have had. Clients too are asking me about possibilities of being in the cloud based on wrong assumptions fed to them.

But before I get into those misconceptions, I would highlight the traits that made me and my company embrace cloud computing:

  • Cloud is the great equalizer. For very small ISVs like my company, I can see cloud as a leveling of playing field against the big guys, at least on the issue on having to spend upfront on system infrastructure to offer something for my market. For under $100 a month, a small ISV can start big.
  • Reach. My company would be able to increase reach without much overhead.
  • Scalability. The fact that I would be able to scale as the need arises, cloud computing is definitely for companies like mine.
  • Administration. Being a very small ISV, system administration tasks such as deployment, support and maintenance takes too much toll on our internal resources. Having an easily accessible uniform environment like the cloud, it would allow my company to increase the number of clients to attend to without adding too much stress on our internal resource.

There are other niceties that made me embrace cloud computing, but the above items mentioned are the major factors that I believe can only be achieved through the cloud.

As convinced as I am that cloud computing is one area that my company should invest, I am also convinced that cloud computing is definitely not for everybody and definitely not for every situation out there, at least for its current incarnation. I’d probably list down 5 misconceptions that I often encounter. My rationalization of these misconceptions will be based on what I know based on the efforts my company is doing in the cloud. These includes mostly Azure based clouds as well as few non Azure ones.

Top 5 Misconceptions I Often Encounter

  1. When I move my existing application from on-premise to cloud, my app will become scalable. Well, NOT without rewriting your apps to scale-out and be cloud-ready. In the cloud, scaling is not all automatic. For example, If you have an application that is not designed to scale out, it will not scale-out in the cloud. By scaling out, you have to know how to partition your storage in multiple storage instances. And instances can be spread geographically across continents. To scale-out, one has to start from scale-out design patterns, probably from scratch. Lucky you are if your current application is designed to scale-out.
  2. I can easily move current on-premise application to the cloud. Contrary to peddled notion that you can easily move current on-premise application (legacy) to the cloud, it is really based on what you actually expect to benefit from the cloud. Like in Azure for example, it supports moving legacy applications to the cloud by using virtual machines to host legacy apps (e.g. hosting a Win Forms based app). But this approach for me is inefficient unless all you want to do is transfer from on-premise to the cloud to perhaps get rid of system administration issues while ignoring other implications and issues like performance, latency, security, etc. You would have to figure out new issues as simple as backing up your data in the cloud. If your legacy app supports features available in on-premise environments, you might encounter problems running them in the cloud as current generation clouds do not support all features that are mostly staple in the on-premise environments.
  3. I have been designing on-premise apps for years, it is enough skill set to get me to the cloud. It is if you were designing on-premise apps using technologies and patterns similar to that available in the cloud. If not, you have to take a look at the cloud differently. If you want to write apps for the cloud, you have to think cloud. Scaling is probably the one issue that would force you to think cloud. If we are used to design our apps with a single instance mindset, it is about time to think differently and think multiple. Though of course, no one is stopping us to write apps to the cloud that way, but that is not real cloud apps are supposed to. Single instance design won’t scale even if you run it over current generation cloud. In SQL Azure, we have a 10GB database size limit. If we exceed the 10GB ceiling, what would our apps do? This seemingly easy to answer question could probably discourage anyone to embrace cloud.
  4. I just have to design apps like I use to and deploy it in the cloud, and the technologies behind it takes care of the rest. This can’t be. If you were into scale-out designs by now, chances are, you wouldn’t think of writing an app and deploying it over the cloud. Chances are, you might have deployed a cloud app already or you probably didn’t know that your design approaches are very much like those implemented over the cloud. But for most, I doubt that it’s the case. Most on-premise apps are designed not to scale-out but to scale-up. For example, most SQL Server implementations just take advantage of new hardware to scale as added load comes. Intrinsically, Microsoft SQL Server is not designed to automatically load balance, for example, a single query statement. One has to learn how to partition tables into a single view with underlying partitions spread across a federation of servers so SQL Server load balances a query. This is not supported in SQL Azure though. However, even if this is supported in SQL Azure in the future, this is not the scaling wonders the cloud provides, your application still sees only a single instance of a view. With today’s multi-million users accessing just one web application at one point in time, you can only scale as much. With cloud computing, your scaling possibilities are like crazy. You don’t have limits in terms of servers, of CPUs, of storage. Having resources like this would force you to rethink of the way you design your applications. The cloud isn’t about the technologies behind. For me it is one big design consideration. It is about the way you design your applications to take advantage of the unlimited resources found in the cloud.
  5. Any apps can or should be placed in the cloud. No one will stop you if you place anything regardless of how it will end up in the cloud. However, there could be applications that are better off off the cloud. For example, if you have an app with one massive synchronous process that you think has no way of being broken down into several pieces, you might as well stick to scaling-up on premise (unless your cloud offers hybrid services like allowing you to scale-up with a very powerful server). For apps requiring heavy informational security and if you aren’t comfortable with the notion that someone from your cloud provider might get into your highly secured data, you might as well have things on premise. There could be a lot of other applications that may not be practical in placed in the cloud.

It is very important that the cloud is understood clearly before everyone gets excited. While I am embracing it, I am welcoming it with a cautious attitude. Being relatively new, there are just too many considerations to think of, and too many questions that don’t even have clear answers for the time being.

**************************************************
Toto Gamboa is a consultant specializing on databases, Microsoft SQL Server and software development operating in the Philippines. He is currently a member and one of the leaders of Philippine SQL Server Users Group, a Professional Association for SQL Server (PASS) chapter and is one of Microsoft’s MVP for SQL Server in the Philippines. You may reach him by sending an email to totogamboa@gmail.com