NoSQL brings forth new technologies. Hadoop, MongoDb, Cassandra and many others have gained huge popularity.
Suddenly, it seems, everyone is doing big data and NoSql these days. It was exactly what we were waiting for.
Now there is no denying it, there are few companies that have gathered enough data to make sure a relational database is no longer a match.
Why use a relational database?
Before we discard relational databases because there is a new kid in town we should at least know what is good about the previous, and clearly obsolete technology.
- SQL is a widely known standard. Learn it once, use many.
- the ACID properties. Don’t know them? http://en.wikipedia.org/wiki/ACID (how are you a developer if you don’t know these?)
- Proper integrity constraints. Fight data that reference junk. Relational databases do this for you.
- Type constraints. If your application tries to push in “chicken soup recipe” in a number field, you’ll know.
- Indexing data. You have to admit, improving performance through indexes is pretty easy.
So in short, your database helps you out in a few ways.
Not all is well in fairy land, however. The validation of the entities you push in the database is far from complete. This means your application will need to do some validation as well. Your database validation may not be enough.
Why use nosql alternatives?
Usually Nosql databases can cope with huge amounts of data. You can use map reduce. They are blazing fast.
The trick? Remove constraints and the schema definition all together. Just push data in. This is demonstrated by the following image.
So the important thing is, your application (and not the database) will have to ensure all the data quality.
It should be mentioned however that the constraints on relational databases aren’t very strong. Data quality can seldom be guaranteed by database constraints alone. There are extremes here.
If the validation isn’t a full guarantee, then why bother? Strangely enough constraints and simple typing has saved my neck multiple times. It’s just one less thing to worry about. It’s a safety net.
What is huge amounts of data?
Iff (http://en.wikipedia.org/wiki/If_and_only_if) your data volume is so big it cannot be handled by conventional/relational databases then it can be considered big data.
Why should I care about data quality?
If you store terabytes of data your entire effort becomes useless if that data is unreliable. Being careless about the quality of your data can put you in a very nasty position. It isn’t uncommon that if data is corrupt it is so badly wounded that there is no way of using it.
Absurd situations in start-up land
There is a new trend out there. People start using these new technologies everywhere.
It turns out you can move really fast when you don’t have to care about a data schema.
There are 3 cases here:
1. If you are making a prototype, I’m completely with you! I mean, you don’t know what data is gonna be in there right?
2. If you are going to have Terrabytes of data in no time, yup you probably made the right choice.
3. If you are using it because you want to move fast and sell a prototype then this may happen to you:
Sometimes somebody saves the day just before you reach rock bottom. But those heroes are uncommon and under-appreciated.
Don’t let the speed of development of a prototype mislead you, it is and always will be a prototype. Here are a few quotes of a few failed start ups that couldn’t distinguish between a prototype and a product:
The customers are afraid to use the product. They have the feeling it will break any moment.
When the data got mingled between different companies, of which some were direct competitors, well that was the last straw.