The Hidden Cost of Scaling with NoSQL

The Hidden Cost of Scaling with NoSQL

Today’s applications are big. With hundreds of thousands of users, many of whom are uploading content regularly, the size of the data store can grow massive in short order. Not surprisingly, for data architects, scale is a real concern.

At companies with global internet reach, such as Google and Facebook, NoSQL—a new breed of highly scalable non-relational data stores—has occasionally been chosen over relational database technology. The fact that NoSQL has been used in such high-profile applications has turned out to be excellent advertising for the benefits of this new database paradigm, catching the attention of many application architects.

The temptation to replace traditional relational database systems with one designed for petabyte scale is understandable. Who wouldn’t want to scale to petabytes of data without the threat of performance degradation? But relational database technology has not stood still in recent years. Between the tradeoffs of the NoSQL architecture and major advances in the scale and manageability of relational databases, many companies, especially ISVs, will find a relational database is still the best choice for their business.

The Non-relational Tradeoff

The excitement that NoSQL products brought to the IT industry has been met with an equal amount of skepticism. Critics point out that, depending on what you need your application to do, a non-relational database can be ill-advised. While NoSQL databases are not all alike, there are certain tradeoffs common to them all.

Data integrity—In order to achieve high performance despite massive size, non-relational database systems compromise data correctness guarantees. The traditional rules about writing data are loosened, making it far more likely that data can be lost or overwritten. Thus the best applications for a non-relational approach are those that have low-to-medium requirements for data integrity, for example, social media applications. Any application whose data integrity requirements are absolute requires a relational database; NoSQL is a non-starter.

Flexible indexing—Relational databases are very good at letting users query data from multiple perspectives. Joins and indexes are not weaknesses of relational databases, they are strengths. To achieve speed and scale, NoSQL technology relies on assumptions about how data will need to be viewed; it can be extremely difficult or even impossible to achieve an alternate view of data.

Interactive updating of data—Many NoSQL solutions are  designed for bulk updates and quick reads. They are not optimized for applications requiring fine-grain updates and rapid saves.

Concurrency guarantees—When many people are accessing a database it can be important to define and guarantee when and how updates are revealed to concurrent users. NoSQL generally provides no guarantees for the propagation of updates.

The ISV Dilemma

For many companies the decisions to go relational vs. non-relational will be fairly obvious. Sadly, for ISVs the decision can be a bit murky. Social networking, an example of where NoSQL is an obvious fit, is not the bread and butter of most ISVs. Yet scale is still a top-level priority, as most ISV applications are intended to be sold to hundreds of enterprise and small business customers; across the whole ecosystem of customers, the data volume can grow large.

Thus, ISVs find themselves in the middle: Needing data integrity and flexibility yet desperately desirous of a simple solution for achieving very large scale. A small risk of loss of data integrity can seem like a small price to pay. But the choice to use a single, “multi-tenant” NoSQL database may be a far bigger compromise than it seems.

What makes ISVs unique is that, while they are building applications that will ultimately be used by many thousands of users, those users are segregated by customer. Unlike Facebook, which is designed to let any user interact with any other user across a community of millions, ISV applications are used within companies, where the employees of Customer A will never need to read or write the data from Customer B. Thus, for any given instance of the application, the size of the data store should never reach the levels where NoSQL shows its strengths.

Why Isolate Data?

While ISVs can avoid the challenge of scale by isolating customer data, it is far from the only reason to favor that architecture. In addition to the technical issues, there are a number of excellent business reasons why NoSQL, with its multi-tenant database architecture, can be undesirable for ISVs.

Security—Not every organization is willing or able to accept the risk of letting its data reside in a shared repository. Indeed, for many enterprises, finding out that one solution eradicates the risk of a competitive data leak can be the selling point that wins the account.

Governance—Industry laws can limit organizations’ choice in terms of data storage. Many laws stipulate rules about the physical location of data and potential security breaches. For the organizations who are forced to comply with such laws—and there are many—a NoSQL solution would not suffice.

Customization—ISVs know better than anyone: Companies will inevitably find ways to optimize any software solution to better serve their specific business or workflows. Learning that your solution is difficult to customize due to its database architecture will win neither appreciation nor loyalty.

Best of Both Worlds

Of course, from a management perspective, bundling a standalone relational database within each customer’s application can seem daunting. Managing multiple databases is a challenging task. ISVs don’t have the benefit of installing a DBA at every customer. And smaller customers often will not have a DBA to do it for them.

But thanks to modern advances in relational database technology, management of many databases doesn’t have to be harrowing. Today’s RDBMS marketplace offers self-management and self-tuning features to help ease these challenges, providing assurance that ISVs can give customers exactly the database architecture they need. Automated, self-tuning databases offer ISVs the best of both worlds: a solution in which neither scale, management nor data reliability must be compromised.

For ISVs, the benefits of data isolation and data integrity are key selling points. By choosing a leading relational database system with features that complement the ISV application model, companies won’t have to scale their applications at the expense of scaling future revenue.

An Enigma cipher machine in an Excel spreadsheet, made by a non-programmer : programming

I just love Visual Basic, and the Excel / Access paradigm.  I just wish that Excel had basic RDBMS support.

This is interesting to me.. but it seems REALLY REALLY sad that this actually takes -any- time at all to execute.. I mean, how complex can the algorithm really be?

Maybe I’ll re-create this in .NET to see how it should perform?

I’m just excited.. I finally found the source code for this..

Honestly, I’m going to try to turn this into SQL this weekend 🙂

Download the spreadsheet here


True Cost of Indexes

March Madness – SQL Azure – sys.dm_db_partition_stats – Index Cost

Handing your money over to SQL Azure.Yesterday we looked at how you could determine the cost of a single row stored in SQL Azure. Part of my focus for doing so was to help you understand that a bad database design can lead to higher costs. Creating unnecessarily wide tables will result in more storage which then results in a higher cost. You will be able to provide a rough idea of just how much that 349 column customer table is costing your company.

Yesterday we focused on the size of a row on a page. For indexes there is a size limit: 900 bytes is all you get (this is for both SQL Azure and for on-premise versions of SQL Server). At first glance you may be thinking to yourself “self, why would I care about something that would be so small? And that is a fair question, so let me explain something to you about database design that I have seen repeatedly in my career as a production DBA:

People tend to create WAY more indexes than they need.

Creating indexes to solve performance problems is like crack: the first one is free but the last one is the one that causes your heart to come to a stop.

How many times have you come across tables that are over-indexed? How about tables with an index for every column (you know, just in case we need them). I’ve seen it throughout my days as a production DBA and today I am here to help show you how to find those costs in SQL Azure.

The code we used yesterday gave us the starting point. We will take that piece of code and instead of pointing it at the sys.objects view we will instead point at the sys.indexes. We can also filter for just the non-clustered indexes, since every table in SQL Azure has a clustered index defined we already know the costs for them (we did that yesterday). This code will show me the size of my non-clustered indexes:

SELECT OBJECT_NAME(sys.indexes.object_id)AS [TableName], AS [IndexName],
SUM(reserved_page_count) * 8.0 / 1024 as [SizeinMB]
FROM sys.dm_db_partition_stats
INNER JOIN sys.indexes
ON sys.dm_db_partition_stats.object_id = sys.indexes.object_id
AND sys.dm_db_partition_stats.index_id = sys.indexes.index_id
GROUP BY sys.indexes.object_id,

I can then take this information and break down the cost for my indexes, similar to what we did yesterday. I will use the same calculated monthly cost so if you need to know how I got that number you should go and read what I did yesterday. If I remove the filter for non-clustered indexes then we can see a summary of all objects:

SELECT OBJECT_NAME(sys.indexes.object_id)AS [TableName], AS [IndexName],
SUM(reserved_page_count) * 8.0 / 1024 as [SizeinMB],
row_count [Row Count],
(CASE row_count WHEN 0 THEN 0 ELSE
(sum(reserved_page_count) * 8192)/ row_count END) [Bytes Per Row],
(CASE row_count WHEN 0 THEN 0 ELSE
((sum(reserved_page_count) * 8192)/ row_count)
* (0.00000257749) END) [Monthly Cost Per Row]
FROM sys.dm_db_partition_stats
INNER JOIN sys.indexes
ON sys.dm_db_partition_stats.object_id = sys.indexes.object_id
AND sys.dm_db_partition_stats.index_id = sys.indexes.index_id
GROUP BY sys.indexes.object_id,, row_count
ORDER BY [Bytes Per Row] DESC

You could rewrite the code and sum up the amount of data based upon the tables in order to get an idea of the true cost for a table in SQL Azure. Or you could decided to break out the costs per day. You have a lot of options here, SQL Azure makes the information available for you to analyze.

The reason I wanted to show you all this? Mostly it has to do with helping to improve database design, same as yesterday. If you have ever come across a database that has WAY too many indexes you are going to be able to put a price tag on what those extra indexes would cost. Usually you aren’t able to attach a cost, and cost is also the biggest motivating factor for how and why things get done (or not get done).

mySql linked server

I think that Microsoft SQL Server is the best database in the world.

Sometimes, it is necessary to work with inferior 3rd party products.

Here is a walkthrough on setting up a Linked Server on mySQL. In other words- you can use this data as if it were located in God’s database- SQL Server.

In this article we going to look into creating a heterogeneous linked server. i.e. We are going to create a linked server to MYSQL database server from SQL Server environment and then going to query datas from MYSQL. First step in heterogeneous linked server is to create a ODBC connection to that RDBMS platform. For creating ODBC connection we need the ODBC drivers for that RDBMS platform. In this we need drivers for MYSQL, this is not included in windows you need to download it from MYSQL Website. Lets discuss on this step by step

Testing Environment

Windows Server 2008
SQL Server 2008
MySQL Version 5.0.51
MySQL ODBC Driver 5.1
MySQL DBName : sqlarticles

Implementation Steps

  • Download MySQL ODBC drivers from the link Get MySQL ODBC Driver
  • Install the MySQL drivers from the downloaded setup.
  • Once installation completed go to Run> Type odbcad32 and press enter
  • In the ODBC admin window go to System tab as shown below


  • When you click on Add button a list of ODBC drivers available in the system will be shown, select MySQL ODBC XX.X Driver (this will not be displayed if you havent installed MySQL Drivers, where XX is the version number) and then click finish button


  • Now you will be shown with the ODBC driver configuration, all the fields in the window are self explanatory hence fill all the details required and click on Test button to test the ODBC connection as show below and click on Ok to close it.



  • Now you have created the ODBC connection successfully, the next step will be to create a new linked server in SQL Server.
  • Open Management studio and connect to the SQL Server Instance
  • Expand Server Objects in object explorer and right click on Linked Servers folder and then click on New Linked server in the menu as shown below


  • New Linked server window will be popup, fill all the details required. Provide the name for the Linked server, Select Other Data Source Option button, Select Provider as Microsoft OLE DB Provider for ODBC Drivers and provide the ODBC name in the field Data Source.


  • Now click on the Server Options in the left pane to configure it as per your requirement. Ive configured it as below, once this done click on OK button to create the linked server.


  • Now we have created the linked server successfully however to retrieve the datas correctly from MySQL we need to configure the OLE DB provider in SQL Server correctly (you can also modify this with respect to your requirement).
  • Expand the folder Providers under Linked server directory in SSMS and go to properties of MSDASQL provider as shown below and configure it.


If the above provider is not configured properly you will thrown any of the error message below when querying through linked server hence dont forget to configure the provider properly.

OLE DB provider “MSDASQL” for linked server “MYSQL” returned message “[Microsoft][ODBC Driver Manager] Data source name not found and no default driver specified”.
Msg 7303, Level 16, State 1, Line 1
Cannot initialize the data source object of OLE DB provider “MSDASQL” for linked server “MYSQL”.

Msg 7313, Level 16, State 1, Line 1
An invalid schema or catalog was specified for the provider “MSDASQL” for linked server “MYSQL”.

Querying Data from MySQL Linked Server

Connect to SSMS and open a new query window and type the query to retrieve datas from MySQL linked server.



In the query above you cant change the database name when you use four part identifier format since the database name is already selected while configuring the ODBC. To select datas from different database using the same linked server can be accomplished by using OPENQUERY command. You can see from the below example that Im selecting datas from Information_Schema db and also from sqlarticles db using the same MySQL linked server

--Below is the catalog defined in ODBC
(MYSQL, 'SELECT * FROM sqlarticles.article')
--Using different catalog in the query


SQL Server 2012 introduces 14 new built-in functions.

14 New Functions and 1 Changed Function

SQL Server 2012 introduces 14 new built-in functions. These functions ease the path of migration for information workers by emulating functionality that is found in the expression languages of many desktop applications. However these functions will also be useful to experienced users of SQL Server.

The new functions are:

Conversion functions

Date and time functions

Logical functions

String functions