Here’s a question we hear a lot – Should I use SQL databases or NoSQL databases? It’s a question that gets asked because often underlying it is another question – What’s broken in SQL databases that NoSQL databases fixes? The answer to that one is much easier. Nothing is broken because they are different approaches to creating databases in the same way that assembler and higher level languages are to creating applications.
Think of a typical high level language. It abstracts away all the ideas of machine code – of scheduling, memory management, interrupts, processor stacks and and buffers – into a different intellectual framework that is the language. You write a program in the language and a compiler or interpreter steps in and turns your code into digestible chunks of machine code (or intermediate code) to be run on some actual hardware. You don’t care about that though, all you care is that your code can go into any machine and right things happen.
You can think of this as akin to SQL; you write your high level query which is generally portable between different SQL databases and the database’s internal compiler or interpreter turns it into executable operations which it can then run to give the results you are expecting. There’s a whole query engine in your database that looks for the optimal way to turn your SQL query into the optimal set of operations to get your results. You usually only care about what it’s doing when your queries aren’t running as fast as you’d hope, in the same way that you only care about your compiler when it generates slow code for your application.
Now think of assembler. Assembler is unique to the processor family it runs on. These are the smallest operations the processor will let you program it with and they all run as fast as the processor can. They do exactly what they say and no more. High level language compilers convert programs into assembler (eventually) so they can be run, but writing in bare assembler can be even more efficient as long as you can take into account all the internal "moving parts" of the processor. The downside is that you can’t move your assembler code to a different processor family.
And now think of NoSQL like that. The query engine and low level operations of a database exposed through an API to give you a more intimate control of your database operations. For databases that’s something like find a record by a key, update a record with that key, construct a query from a chain of operands. These small operations can be combined by applications to create powerful applications.
NoSQL emerged in a world of SQL not to replace it but to allow people to experiment with new ways of working with databases and optimising databases to particular tasks. The same deal with assembler applies with NoSQL; you get direct control of the underlying system, you have to worry about managing that system a lot more – selecting indexes, creating reliable operations which don’t crash into each other, making sure you aren’t locking out other operations – these are things you will, at any scale, have to think about at some point. The good news is that NoSQL databases have matured so the underlying mechanisms are more resilient and reliable to these issues. NoSQL databases have also focused on particular data types or arrangements – JSON document, columnar storage, graphs – and on different architectures – in-memory, sharded, distributed, replicated – to create databases which are very powerful for particular use cases.
SQL is a language of general purpose utility. It sets out with a relational, table centric structure and you rely on the database to make optimal decisions in interpreting your intent and coming up with the best path to get your results. Because of that SQL also shaped how the underlying databases operated and how they developed over time.
To jump ship to another analogy temporarily, NoSQL is like RISC processors were to the CISC processors in the 80s and 90s. RISC processors gave chip designers a whole new way to approach problems of scale and moved the task of building optimised instruction pipelines up to the compilers used to create code for the RISC chips. Some even went as far as turning CISC instructions into RISC instructions on the fly. The two approaches often found themselves facing off over performance. Where are we now? The lessons learnt from RISC processors are embedded in CISC designs while a new class of more complex RISC chip is to be found optimized for power consumption in a billion devices – the biggest niche ever.
Here’s the cool part. Those billion RISC devices interoperate with all the CISC and other RISC devices out there over the internet and through the millions of servers in the cloud. It’s not an either/or choice. It’s a best-for-the-task selections. When you go out and buy a computer in 2016, you pick it for suitability for a task, not whether it has a RISC or CISC design philosophy at the heart of its CPU. In the same way, when picking a database, or databases, for a task you should select for suitability for that task.
Which brings us back to the assembler/higher level language analogy in this analogy inception. What this analogy offers us is a simple rule of thumb for thinking about how SQL and NoSQL impact on that decision of suitability. A NoSQL database will tend to be optimized for a class of problems and it’s important to understand what those problems are. SQL can always be, at least in theory, compiled into the operations of a NoSQL database and there are tools out there which will do this for you. You’ll usually find them in the Business Intelligence and Analytics aisle. Some NoSQL databases are internalising the same ideas to offer subsets of SQL too, raising the bar on NoSQL’s assembler-ness in this analogy to something closer to the capabilities of a high level language.
Opt for NoSQL and you get handed the keys to the database, along with a specially selected set of components and the freedom to assemble them how you wish. Opt for SQL, you’ll get access to an often feature rich semi-autonomous car which will take you from A to B efficiently every day. As a developer you’d never say "I’ll just use assembler for all my apps" or "I’ll use only this high level languages"; you would keep all options open.
The best solution? Opt for whatever is best for your task; not just one but as many as you need. If your application stack needs an in-memory database or messaging bus binding together applications using a document database for client facing applications, a database for backend analytics and a JSON document search database, then thats the architecture you should go for.
That and the ability to deploy production grade versions of all those databases whenever you need them.