Why does large Social Network projects switch to use Cassandra instead of Mysql?
Most of Social networks (like Facebook) has stopped using mysql as main database and switched to use Cassandra or other no-sql DB. And we can consider this change as big grow for this new open-source data store, Cassandra, which was developed originally by Facebook to solve the problem of inbox search and to be fast, reliable and had the ability to handle read and write requests at the same time.
so what is wrong with mysql? good question, for strapping projects, it becomes very difficult to build a high performance, write intensive,
application on a data set that is growing quickly, with no end in sight. These demands eliminated most of the value of a relational database, while still sustaining all the overhead.
Thus, many of the projects that deal with a semi-real-time processing have opted a new way for data store such as Cassandra, CouchDB, and MongoDB. For example, messaging in Facebook is an environment as heavily used as it requires a system that can not only store data but also provide results for search queries at blazing fast speeds.
Stu Hood, the technical lead for the search team in the Email and Apps division of Rackspace, said: "
I think that distributed databases solve a problem that a lot of companies with large datasets have had to solve independently in the
past. Cassandra has an approach that hybridizes the Bigtable and Dynamo models, where a lot of its competitors chose to take one path or the other. Over the Bigtable clones, Cassandra has huge high-availability advantages, and no single point of failure (possible because of the eventually consistent approach). When compared to the Dynamo adherents, Cassandra has the advantage of a more advanced datamodel, allowing for a single “row” to contain billions of column/value pairs: enough to fill a machine. You also get efficient range queries for the top level key, and even within your values."
As a summery, Cassandra have the following criterias:
- Supports high availability.
- Eventual Consistency (Trade off strong consistency in favor of high availability).
- Incremental scalability.
- Optimistic Replication.
- "Knobs" to tune tradeoffs between consistency, durability and latency.
- Low total cost of ownership.
- Minimal Administration.
- Querying by column, range of keys.
- Writes are much faster than reads
- Map/reduce possible with Apache Hadoop
Simon Nadhem said: NoSQL use cases at 2012-03-20 19:42:07
Also you can consider the following as NoSQL use cases:
1-Massive data volumes:
- Massively distributed architecture required to store data
- Google, Amazon, Yahoo, Facebook: 10 - 100K servers
2- Extreme query workload
- impossible to efficiently do joins at that scale with an RDBMS
3- Schema evolution
- Schema flexibility (migration) is not trivial at large scale
- Schema chugs can be gradually introduced with NoSql
Alex Miller said: NoSql stands for at 2012-03-20 19:45:12
By the way,, i heard my friend saying non-sql :D,,, NoSql stands for "Not Only Sql"
Ken North said: at 2012-03-20 20:05:21
MySQL and InnoDB remain in use at Facebook. Here are recent posts to the "MySQL at Facebook" blog. http://on.fb.me/GEX1mM NoSQL
Boring Useless Guy said: nice fiction.. at 2012-04-10 16:11:36
really nice story.. though failed to mention that twitter dropped the plan to use cassandra after initial tests and now they use it only for analytics.. and have no plans to move tweets from mysql to cassandra or for that matter any other sql or nosql solution.. facebook used to use cassandra for inbox only.. rest was and is still mysql... and even messaging system has moved from cassandra to hbase.. so only site to switch is digg.. or there anybody else making these so called switches??
Alaa Alomari said: Re: Boring Useless Guy at 2012-04-10 17:01:47
Twitter started as a “content management platform not a messaging platform” so many optimizations were needed to change the initial model based on aggregated reads to the current messaging model where all users need to be updated with the latest tweets. The changes were done in three areas: cache, Message Queue and Memcached client.
For relationships between users, twitter FlockDB (not mysql)
Wobblemyhead said: Are you sure!!! at 2012-08-07 07:07:58
Twitter is not use Cassandra for their main DB. They go back to MySQL since 2010, Cassandra only for the small new project. You didn't check the source???
Facebook is not use Cassandra, only for the Inbox Search. But....when 2010 their new version Message came out. They replace Cassandra to HBase.
Please do some research......god
Alaa Alomari said: nobody said that twitter is using Cassandra at 2012-08-07 07:21:24
@Wobblemyhead: nobody said that twitter is using Cassandra, I have said that twitter is using FlockDB for managing relations between users. and for facebook,,,, they were using cassandra from 2008 to 2010 then they switched to hbase