The Hidden Cost of Scaling with NoSQL
Today’s applications are big. With hundreds of thousands of users, many of whom are uploading content regularly, the size of the data store can grow massive in short order. Not surprisingly, for data architects, scale is a real concern.
At companies with global internet reach, such as Google and Facebook, NoSQL—a new breed of highly scalable non-relational data stores—has occasionally been chosen over relational database technology. The fact that NoSQL has been used in such high-profile applications has turned out to be excellent advertising for the benefits of this new database paradigm, catching the attention of many application architects.
The temptation to replace traditional relational database systems with one designed for petabyte scale is understandable. Who wouldn’t want to scale to petabytes of data without the threat of performance degradation? But relational database technology has not stood still in recent years. Between the tradeoffs of the NoSQL architecture and major advances in the scale and manageability of relational databases, many companies, especially ISVs, will find a relational database is still the best choice for their business.
The Non-relational Tradeoff
The excitement that NoSQL products brought to the IT industry has been met with an equal amount of skepticism. Critics point out that, depending on what you need your application to do, a non-relational database can be ill-advised. While NoSQL databases are not all alike, there are certain tradeoffs common to them all.
Data integrity—In order to achieve high performance despite massive size, non-relational database systems compromise data correctness guarantees. The traditional rules about writing data are loosened, making it far more likely that data can be lost or overwritten. Thus the best applications for a non-relational approach are those that have low-to-medium requirements for data integrity, for example, social media applications. Any application whose data integrity requirements are absolute requires a relational database; NoSQL is a non-starter.
Flexible indexing—Relational databases are very good at letting users query data from multiple perspectives. Joins and indexes are not weaknesses of relational databases, they are strengths. To achieve speed and scale, NoSQL technology relies on assumptions about how data will need to be viewed; it can be extremely difficult or even impossible to achieve an alternate view of data.
Interactive updating of data—Many NoSQL solutions are designed for bulk updates and quick reads. They are not optimized for applications requiring fine-grain updates and rapid saves.
Concurrency guarantees—When many people are accessing a database it can be important to define and guarantee when and how updates are revealed to concurrent users. NoSQL generally provides no guarantees for the propagation of updates.
The ISV Dilemma
For many companies the decisions to go relational vs. non-relational will be fairly obvious. Sadly, for ISVs the decision can be a bit murky. Social networking, an example of where NoSQL is an obvious fit, is not the bread and butter of most ISVs. Yet scale is still a top-level priority, as most ISV applications are intended to be sold to hundreds of enterprise and small business customers; across the whole ecosystem of customers, the data volume can grow large.
Thus, ISVs find themselves in the middle: Needing data integrity and flexibility yet desperately desirous of a simple solution for achieving very large scale. A small risk of loss of data integrity can seem like a small price to pay. But the choice to use a single, “multi-tenant” NoSQL database may be a far bigger compromise than it seems.
What makes ISVs unique is that, while they are building applications that will ultimately be used by many thousands of users, those users are segregated by customer. Unlike Facebook, which is designed to let any user interact with any other user across a community of millions, ISV applications are used within companies, where the employees of Customer A will never need to read or write the data from Customer B. Thus, for any given instance of the application, the size of the data store should never reach the levels where NoSQL shows its strengths.
Why Isolate Data?
While ISVs can avoid the challenge of scale by isolating customer data, it is far from the only reason to favor that architecture. In addition to the technical issues, there are a number of excellent business reasons why NoSQL, with its multi-tenant database architecture, can be undesirable for ISVs.
Security—Not every organization is willing or able to accept the risk of letting its data reside in a shared repository. Indeed, for many enterprises, finding out that one solution eradicates the risk of a competitive data leak can be the selling point that wins the account.
Governance—Industry laws can limit organizations’ choice in terms of data storage. Many laws stipulate rules about the physical location of data and potential security breaches. For the organizations who are forced to comply with such laws—and there are many—a NoSQL solution would not suffice.
Customization—ISVs know better than anyone: Companies will inevitably find ways to optimize any software solution to better serve their specific business or workflows. Learning that your solution is difficult to customize due to its database architecture will win neither appreciation nor loyalty.
Best of Both Worlds
Of course, from a management perspective, bundling a standalone relational database within each customer’s application can seem daunting. Managing multiple databases is a challenging task. ISVs don’t have the benefit of installing a DBA at every customer. And smaller customers often will not have a DBA to do it for them.
But thanks to modern advances in relational database technology, management of many databases doesn’t have to be harrowing. Today’s RDBMS marketplace offers self-management and self-tuning features to help ease these challenges, providing assurance that ISVs can give customers exactly the database architecture they need. Automated, self-tuning databases offer ISVs the best of both worlds: a solution in which neither scale, management nor data reliability must be compromised.
For ISVs, the benefits of data isolation and data integrity are key selling points. By choosing a leading relational database system with features that complement the ISV application model, companies won’t have to scale their applications at the expense of scaling future revenue.