Brewer (2000) and Gilbert and Lynch (2012) concluded that for a distributed shared-data system you could only have at most two of the three properties: consistency, availability, partition-tolerance (CAP theory). Gilbert and Lynch (2012) describes these three as akin to the safety of the data, live data, and reliability of the data. Thus, systems that are giving up
- consistency creates a system that needs expirations, conflict resolution, and optimistic locking (Brewer, 2000). A lack of consistency means that there is a chance that the data or processes may not return the right response to a request (Gilbert & Lynch, 2012).
- availability creates a system that needs pessimistic locking and making some partitions unavailable (Brewer, 2000). A lack of availability means that there is a chance that a request may not get a response (Gilbert & Lynch, 2012).
- Partition-tolerance creates a system that needs a 2-phase commit and cache validation profiles (Brewer, 2000). A lack of partition-tolerance means that there is a chance that messages between servers, tasks, threads, can be lost forever and never are committed (Gilbert & Lynch, 2012).
Therefore, in a NoSQL distributed database systems (DDBS), it means that partition-tolerance should exist, and therefore administrators should then select between consistency and availability (Gilbert & Lynch, 2012; Sakr, 2014). However, if the administrators focus on availability they can try to achieve weak consistency, or if the administrators focus on consistency, they are planning on having a strong consistency system. An availability focus is having access to the data even during downtimes (Sakr, 2014). However, providing high levels of availability can cost money. Per the web application Uptime.is:
Availability Level | Monthly downtime | Yearly downtime |
99.9% | 43m 49.7s | 8h 45m 75.0s |
99.99% | 4m 23.0s | 52m 35.7s |
99.999% | 26.3s | 5m 15.6s |
99.9999% | 2.6s | 31.6s |
To achieve high levels of availability means having a set of fail-safe systems to build for fault tolerance.
From the previous paragraph, there is both strong and weak consistency. Strong consistency ensures that all copies of the data are updated in real-time, whereas weak consistency means that eventually all the copies of the data will be updated (Connolly and Begg, 2014; Sakr, 2014). Thus, there is a resource cost to have stronger consistency over weaker consistency due to how fast the data needs to be updated (Gilbert & Lynch, 2012). Consequently, this is where the savings come from when handling for overhead in a NoSQL DDBS.
Finally, the table below illustrates some of the NoSQL databases that are either an AP or CP system (Hurst, 2010).
Availability & Partition Tolerance
NoSQL systems |
Consistency & Partition Tolerance
NoSQL systems |
Dynamo, Voldemort, Tokyo Cabinet, KAI, Riak, CouchDB, SimpleDB, Cassandra | Big Table, MongoDB, Terrastore, Hypertable, Hbase, Scalaris, Berkley DB, MemcacheDB, Redis |
Resources
- Brewer, E. (2000). Towards robust distributed systems. Proceedings of 19th Annual ACM Symposium Principles of Distributed Computing (PODC00). 7–10.
- Connolly, T., & Begg, B. (2014). Database Systems: A Practical Approach to Design, Implementation, and Management, (6th ed.). Pearson Learning Solutions. VitalBook file.
- Gilbert, S., and Lynch N. A. (2012). Perspectives on the CAP Theorem. Computer 45(2), 30–36. doi: 10.1109/MC.2011.389
- Hurst, N. (2010). Visual guide to NoSQL systems. Retrieved from http://blog.nahurst.com/visual-guide-to-nosql-systems
- Sakr, S. (2014). Large Scale and Big Data, (1st ed.). Vitalbook file.
- Uptime (n.d.) Uptime and downtime with 99.9% SLA. Retrieved from https://uptime.is/