Apache Cassandra

Apache Cassandra is scalable and distributed partitioned row storage optimized for fast writes. It is an AP database based on CAP theorem with tunable C (tunable consistency). Basically you choose what is more important, availability or partition tolerance, and tune the consistency based on your choice. It is a masterless database, a new node in the cluster can pick up information from any node, which means there is no single point of failure, and database can be scaled horizontally.

As far as data modeling goes, to be successful and have efficient data model, it is important to understand some of the implementation internals. Data is stored in rows which have partition keys, so the right choice of this key is important to build up performance. This is also important for data modeling, since in comparison with relational data modeling data is modeled as it will be read, not written. Basically you should think of queries and model data like that. Data is denormalized, it can exist in multiple columns and each column is optimized for one query.

Because of high throughput, high availability and effortless scale out it is well suited for:

  • recommendation engines
  • messaging
  • fraud detection
  • product catalogs and playlists
  • Internet of Things and sensor data