Feb 19, 2012

Scaleability Day 1 - NoSQL - Learning From Others

With rapidly growing performance and availability requirements, contemporary web enterprises are facing difficulties accommodating a huge user base with all entailing consequences. One possible solution to address issues related to RDBMS is to reject the relational data model in favor of the emerging non-relational paradigm generally referred by the industry as the NoSQL model. NoSQL is the industry’s response to the inherent limitations within the relational DNA.

The largest players took upon endeavors of building their own solutions to overcome data management challenges. Often these solutions evolved into standalone products, which eventually became industry standards (BigTable, Dynamo, FlockDB, Cassandra, etc.) Not everyone was prepared to deal with explosive growth.
A great example of this is Twitter.  Granted, Twitters’ original version was intended as an internal tool for Odeo, and it was not expected to go as big as it did.  Twitter to this day continues to struggle periodically with the wealth of traffic and data that crosses its cables.

As the Resultly idea began to take shape, it was apparent to us that as time progressed we would need to deal with a large volume of data, collection, and users. Twitter and many other startups we looked at handled this poorly in our opinion.  We understood that much of these problems lie in the design of the original Twitter structure and its reliance on a traditional relational database.  Relational databases are inherently difficult to scale as growth of a service or its data increases.  

As we mentioned in our previous blogs, we expect Resulty to grow rapidly. There is no doubt that there will be many pitfalls associated with our growth, but we can avoid some traps by learning from others’ mistakes.  The nature of our application imposes severe data throughput requirements. Service availability will be crucial to our success, especially during initial user acquisition phase. We learned Twitter’s lesson well, and we wanted to make the right choices from the start. It became obvious that NoSQL is the appropriate data management framework for our company. 

Among several contenders we reviewed, Apache Cassandra was chosen.  It is a free open source data management system with many characteristics that Resultly is expected to rely on. Resultly is written on a .NET C# and the availability of .NET client for Cassandra (Aquilies) was the final point in making this decision.

Similar to many other NoSQL solutions, Cassandra is built from the ground up to handle massive amounts of data efficiently. Cassandra is redundant ensuring high level of serviceavailability with no single point of failure. Data distribution and replication among servers and clusters are handled automatically on the level of drivers. It is highly scalable- adding new servers is a matter of a few command lines without any modification to the application code. 

 of NoSQL systems is, by far, the most important feature to us. The NoSQL approach does not impose a rigorous predefined data structure which may and should change dynamically in agile applications. Changes to data schema (known as refactoring) are expensive and should be avoided in the relativistic model. However, this becomes a non-issue in NoSQL systems since there is no schema at least in the relational sense.  

In place of epilogue.
Resultly’s personal experience with Cassandra was an adventure and an intense brain workout. While learning about Cassandra and the realm of NoSQL in general, we encountered quite a few warnings on the interwebs concerning the difficulties one would face while switching from SQL-based data organization thinking to the SQL-free approach. We had to learn from scratch and the learning curve, we attest, is definitely steep. Besides a few shortcomings (paging, sorting, etc), however, the benefits NoSQL offers, are worth the challenge. 
At the end of the day we, realized that Cassandra’s cold gaze is pointed not at the software developers trying to get acquainted with her, but rather at the emerging chaos in improperly organized information.