SQL Server: Real Time Data Processing? When Not To …

During these times when we have the likes of Google, Yahoo, and Ebay, CNN where-in their massive Internet applications and services are run in mammoth data centers filled with thousands of powerful servers ready to process millions of users request and data streaming through mega bandwidth networks, it is but automatic for us developers and database enthusiasts to think of processing data in real-time whenever somebody asks us how we can handle massive amount of data on a very constrained, highly distributed and unreliable environment. Is there such an environment? Let us see.

Let us imagine the rural banks in the country. These rural banks are small and can be found in the most remote of baranggays. They are not so technology savvy, and they still use traditional means of communications. Perhaps only a few have huge servers, while the rest only use personal computers. To make it more constrained and limited, let us imagine that these rural banks use varying forms of networking technology, perhaps some use unreliable dial-ups and low-bandwidth lease-lines. To make it even more pitiful, let us imagine that some can only afford to use decades old floppy diskettes to transport data from one branch to another.

Here is the big question. Can we come up with an affordable solution that would allow them to experience a secured nationwide bank-to-bank interoperations?

Our edge in technology might perhaps convince us that the exercise would be futile as how can we interconnect them when not all of them are connected to a WAN, or we don’t have any assurance of a reliable network connection. We might also wonder how will we integrate some of them that still use diskettes.

Yes, absolutely, we can cook up some system for them … but there is one trivial question we need to ask: Do We Always Have To Process Data in Real Time?

To some, yes. For me? NO.

Anyways, let us not delve too much in what that solution would be like. But let us take a look at an old reliable technology that has made some appearance in some of the latest software offerings out there that can help us solve our rural bank challenge.

I am referring to SQL Server 2005’s Service Broker. For those that are familiar with MSMQ, you will easily comprehend this new feature of SQL Server. Service Broker is designed around the basic functions of sending and receiving messages (just like an email). In its advance form, messages can be secured, queued and routed to specific destinations regardless of whether the communication channel is online or offline (again, just line an email).

Imagine, one rural branch (Quiapo Branch) sends out a message of a money transfer to (Sulo Branch) to notify the Sulo Branch that X amount of money can be released to an authorized recipient. The Quiapo Branch teller just click “Send Message” without knowing that their DSL line just went off. Using Service Broker, the teller won’t even notice that their line went off. The Service Broker will just queue and keep the message until the resumption of the DSL line. Once online, Service Broker sends out the message to the intended destination. If the DSL interruption is for about 1 hour, the transaction delay could just be a bit more than an hour.

With these, we are giving the rural banks the means to operate just like they were online. They need not be bothered if they are offline. The system will just resume its normal operations automatically. So having unreliable and very slow communication/network lines wont be that of an issue.

So who says that we always need to process data in real time?

I’ll try to find time in coming up with something that will tell more about Service Broker. Perhaps somebody out there would want to 🙂

Additional Information: Service Broker can be used in a lot of scenarios. It can even be used to customize complex data replication requirements. It can be used to split up data paths into multiple channels so that messages can be distributed in as many queues as possible (or queued to a file and transferred unto floppy disks), thereby, increasing capacity and chances to still transport data. This also means that Service Broker can be use to scale applications. Service Broker is also an excellent solution if you want to integrate loosely coupled applications.

Toto Gamboa is a consultant specializing on databases, Microsoft SQL Server and software development operating in the Philippines. He is currently a member and one of the leaders of Philippine SQL Server Users Group, a Professional Association for SQL Server (PASS) chapter and is one of Microsoft’s MVP for SQL Server in the Philippines. You may reach him by sending an email to totogamboa@gmail.com


Responsible Software Architecture

Lately, I have had an overload of information regarding “software architecture”. I have been doing software for like a decade or more already and I have seen a bit enough to know what works out there and what are just passing fad. Today, developers, programmers or whatever you want to call them are so enamored to discuss anything about software architecture with vigor and gusto that I have not seen before.

In 2000, when I was doing some software architecture work as an external consultant to some really large project, the technical project manager cautioned me not to “over-architect”. It occurred to me that probably she had experienced something that make her said that. I saw the objectiveness of that cautious remark and I have instilled that remark into every design work I had thereafter. Until now, I think I had been a responsible individual in doing software architecture work.

I recently ask that technical project manager why the warning. She said something to this effect:

– The first time you piece together the super design of your mammoth killer-app, you make all the mistakes of a first timer.
– The second time you piece them all together, you tend to over-compensate on the shortcomings of the first.
– The third time you piece them together, you now have the confidence of having a near perfect architecture.

Perhaps this was the reason why there is a saying in the software development space that “you can’t make it right until you reach version 3.0”. Or was I referring to Microsoft apps of the past?

The big question to us right now is that, how many of us have managed to stay on to reach version 3.0?

Pundits have advocated that this could be avoided and have things right the first try, thus, we now saw everywhere guides to the best patterns and practices in various colors and flavor. And you would see everywhere heated debates on almost everything. Scrutinizing each approach and questioning motives why a particular practice is best to implement while others should not be. Some worked, some failed. Some were misapplied, some were near perfect. But there is one thing conclusive … all these are no SILVER BULLET that would solve all our woes in software architecture and the business of software design.

What is very noticeable nowadays is that developers tend to believe in almost everything that they’ve read out there. They are enamored to the fact that they can technically articulate and converse proficiently at any technical subject at hand and bask at the feeling of knowing the differences between subjects. However, with these observations, I also happen to notice that these same software architects and designers build their software around the capabilities and limitations of each technology architecture they tend to believe (or perhaps the easiest to them) instead of understanding first the requirements and adopt something that fits with what is needed based on the limits of what they know. Often, they blame the lack of support from the architecture they preferred when they encounter something that is really hard to accomplish.

In almost all cases too, software architects and designers seem to spend too much time architecting a solution. Some without regard to known constraints such as time and financial resources, experience or the lack of it for the said solution. Some would tend to apply the complexities of a multi-tiered architecture to even the simplest of apps just so to be known to peers as architecture savvy.

So I ask, what do others do? Do they understand first the requirements then find and apply appropriate patterns and practices that can deliver the requirements and expectations of users? Do they try to instead fit the user requirements to that of their preferred architecture? Do they pick several of the best out there and find ways and means to glue things together to come up with something deliverable?

As for me, I tend to absorb first the requirements and know my constraints (time, manpower, logistics, technical capability, experience, etc). Then, I tend to immerse myself into what the software be like in its final form. Once I get a clear picture of what the software would be like, that is when I start doing technical work. This involves picking the right architecture to adopt. It is not always necessary for me to check on what is new. If what I have known and tested to work fits given all the scenarios and what-ifs, I pick that architecture regardless how relevant, or how obsolete it has become. Sometimes, I see people who have a knack on choosing whatever is new and untested out there as long as it is popular. If I find my architecture stock insufficient to the work at hand, then that is the time I start investigating other possible options. This can be real hard sometimes, since in most cases, you don’t have the benefit of experience. If I am not careful and responsible in choosing the right stuff, the consequences could be unimaginable.

By and large, I think having too much architecture can be as bad as having none or too little. I also strongly believe that software designers and architects should be on responsibly knowing how much architecture is needed for a particular software and not how to produce the technically most elegant or advanced architecture for any software to prove we are architecture-savvy.

Since there is really no silver bullet when it comes to software architecture, and forever we will be avid learners of what is new and cool out there, and, forever we will be adopting to changes as dictated by our users, I am espousing that let us be responsible in selecting and applying what we think is the right architecture.

Avoid LIKE Like A Plague

What is the best thing about using LIKE operator in T-SQL? It is very convenient to use in doing some few pattern searching.

Many of us are using LIKE just for this reason alone. For example, running the following query

SELECT first_name
FROM _name
WHERE first_name LIKE ‘%electra%’

on a several thousand row table would easily give you result in a blink of an eye (assuming that a non-clustered index was created on first_name column).

But let us try to go deeper as to how SQL Server executes the LIKE queries:

— Contains Search
SELECT first_name
FROM _name
WHERE first_name LIKE ‘%electra%’

Execution Plan Used = Index Scan

— Ends With, Where Last Character Can Be Any Character
SELECT first_name
FROM _name
WHERE first_name LIKE ‘%ann_’

Execution Plan Used = Index Scan

— Begins With Search
SELECT first_name
FROM _name
WHERE first_name LIKE ‘electra%’

Execution Plan Used = Index Seek

We know that every time SQL Server executes a query, its Query Optimizer tries its best to determine the best execution plan that it can do. It will always attempt to use an Index Seek wherever/whenever possible.

On the 1st and 2nd example, SQL Server failed to determine or find an appropriate index to use. Thus, it opted to use an Index Scan. In an Index Scan, SQL Server scans all the rows in an index. This means that SQL Server has to scan each row in an index to determine if the row qualifies in your search criteria. If you are searching for just 3 rows within a million row table, and those records are in the 8th, 9th and 10th, SQL Server will have to scan up to the millionth record of the index before it can return all the rows that you need. Now that is quite unacceptable to most of us. However, Index Scan isn’t bad at all. There are times where an Index Scan is faster then Index Seek. Every time you see in your execution plan that SQL Server is using Index Scan, take a look at the number of rows it is processing. If it is just a few thousands rows, then you are fine and Index Scan can be really fast.

On the 3rd example, SQL Server was able to determine that it can use an index effectively by using Index Seek. Index Seek, as we know means that SQL Server’s Query Optimizer was able to find a useful index to locate the records we are searching. If we are searching for 3 records in a million row table, SQL Server will return the rows in a blink of an eye.

So how would this information help us?

Often, when we develop apps, we tend to use a lot the LIKE operator. During this stage, we seldom find performance problems related to LIKE operator as typically, our tables are not well populated with large amount of data. In most cases, we often have a few hundreds of rows or less in our tables during development stages. Perhaps we are thinking that since we have created an index, then we are ok.

We will only notice the performance problems associated with LIKE when the tables gets populated heavily with real data.

Hope this helps us in understanding when and how to use the LIKE operator. Hey, you may want to explore SQL Server’s Full Text Searching.