Quantitative Analysis of Consistency in NoSQL Key-Value Stores

Authors Si Liu, Jatin Ganhotra, Muntasir Raihan Rahman, Son Nguyen, Indranil Gupta, José Meseguer



PDF
Thumbnail PDF

File

LITES-v004-i001-a003.pdf
  • Filesize: 3.19 MB
  • 26 pages

Document Identifiers

Author Details

Si Liu
  • Department of Computer Science, University of Illinois at Urbana-Champaign, IL, USA
Jatin Ganhotra
  • Department of Computer Science, University of Illinois at Urbana-Champaign, IL, USA
Muntasir Raihan Rahman
  • Microsoft, Redmond, WA, USA
Son Nguyen
  • Addepar Inc., Sunnyvale, CA, USA
Indranil Gupta
  • Department of Computer Science, University of Illinois at Urbana-Champaign, IL, USA
José Meseguer
  • Department of Computer Science, University of Illinois at Urbana-Champaign, IL, USA

Cite AsGet BibTex

Si Liu, Jatin Ganhotra, Muntasir Raihan Rahman, Son Nguyen, Indranil Gupta, and José Meseguer. Quantitative Analysis of Consistency in NoSQL Key-Value Stores. In LITES, Volume 4, Issue 1 (2017). Leibniz Transactions on Embedded Systems, Volume 4, Issue 1, pp. 03:1-03:26, Schloss Dagstuhl – Leibniz-Zentrum für Informatik (2017)
https://doi.org/10.4230/LITES-v004-i001-a003

Abstract

The promise of high scalability and availability has prompted many companies to replace traditional relational database management systems (RDBMS) with NoSQL key-value stores. This comes at the cost of relaxed consistency guarantees: key-value stores only guarantee eventual consistency in principle. In practice, however, many key-value stores seem to offer stronger consistency. Quantifying how well consistency properties are met is a non-trivial problem.  We address this problem by formally modeling key-value stores as probabilistic systems and quantitatively analyzing their consistency properties by both statistical model checking and implementation evaluation. We present for the first time a formal probabilistic model of Apache Cassandra, a popular NoSQL key-value store, and quantify how much Cassandra achieves various consistency guarantees under various conditions. To validate our model, we evaluate multiple consistency properties using two methods and compare them against each other. The two methods are: (1) an implementation-based evaluation of the source code; and (2) a statistical model checking analysis of our probabilistic model.

Subject Classification

ACM Subject Classification
  • Computer systems organization → Cloud computing
  • Information systems → Key-value stores
  • Software and its engineering → Model checking
  • Theory of computation → Rewrite systems
Keywords
  • NoSQL Key-value Store
  • Consistency
  • Statistical Model Checking
  • Rewriting Logic
  • Maude

Metrics

  • Access Statistics
  • Total Accesses (updated on a weekly basis)
    0
    PDF Downloads

References

  1. Gul A. Agha, José Meseguer, and Koushik Sen. Pmaude: Rewrite-based specification language for probabilistic object systems. Electr. Notes Theor. Comput. Sci., 153(2):213-239, 2006. URL: http://dx.doi.org/10.1016/j.entcs.2005.10.040
  2. Mustaque Ahamad, Gil Neiger, James E. Burns, Prince Kohli, and Phillip W. Hutto. Causal memory: Definitions, implementation, and programming. Distributed Computing, 9(1):37-49, 1995. URL: http://dx.doi.org/10.1007/BF01784241
  3. Musab AlTurki and José Meseguer. Pvesta: A parallel statistical model checking and quantitative analysis tool. In Andrea Corradini, Bartek Klin, and Corina Cîrstea, editors, Algebra and Coalgebra in Computer Science - 4th International Conference, CALCO 2011, Winchester, UK, August 30 - September 2, 2011. Proceedings, volume 6859 of Lecture Notes in Computer Science, pages 386-392. Springer, 2011. URL: http://dx.doi.org/10.1007/978-3-642-22944-2_28
  4. Peter Bailis, Alan Fekete, Ali Ghodsi, Joseph M. Hellerstein, and Ion Stoica. The potential dangers of causal consistency and an explicit solution. In Michael J. Carey and Steven Hand, editors, ACM Symposium on Cloud Computing, SOCC'12, San Jose, CA, USA, October 14-17, 2012, page 22. ACM, 2012. URL: http://dx.doi.org/10.1145/2391229.2391251
  5. Peter Bailis, Shivaram Venkataraman, Michael J. Franklin, Joseph M. Hellerstein, and Ion Stoica. Probabilistically bounded staleness for practical partial quorums.PVLDB, 5(8):776-787, 2012. URL: http://vldb.org/pvldb/vol5/p776_peterbailis_vldb2012.pdf.
  6. Sumita Barahmand and Shahram Ghandeharizadeh.BG:A benchmark to evaluate interactive social networking actions. In CIDR 2013, Sixth Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 6-9, 2013, Online Proceedings. www.cidrdb.org, 2013. URL: http://www.cidrdb.org/cidr2013/Papers/CIDR13_Paper93.pdf.
  7. Enrico Barbierato, Marco Gribaudo, and Mauro Iacono. Performance evaluation of nosql big-data applications using multi-formalism models. Future Generation Comp. Syst., 37:345-353, 2014. URL: http://dx.doi.org/10.1016/j.future.2013.12.036
  8. Doug Beaver, Sanjeev Kumar, Harry C. Li, Jason Sobel, and Peter Vajgel. Finding a needle in haystack: Facebook's photo storage. In Remzi H. Arpaci-Dusseau and Brad Chen, editors, 9th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2010, October 4-6, 2010, Vancouver, BC, Canada, Proceedings, pages 47-60. USENIX Association, 2010. URL: http://www.usenix.org/events/osdi10/tech/full_papers/Beaver.pdf.
  9. Theophilus Benson, Aditya Akella, and David A. Maltz. Network traffic characteristics of data centers in the wild. In IMC, pages 267-280, 2010. Google Scholar
  10. David Bermbach and Stefan Tai. Eventual consistency: How soon is eventual? an evaluation of amazon s3's consistency behavior. In Karl M. Göschka, Schahram Dustdar, and Vladimir Tosic, editors, Proceedings of the 6th Workshop on Middleware for Service Oriented Computing, MW4SOC 2011, Lisbon, Portugal, December 12-16, 2011, page 1. ACM, 2011. URL: http://dx.doi.org/10.1145/2093185.2093186
  11. Eric A. Brewer. Towards robust distributed systems (abstract). In Gil Neiger, editor, Proceedings of the Nineteenth Annual ACM Symposium on Principles of Distributed Computing, July 16-19, 2000, Portland, Oregon, USA., page 7. ACM, 2000. URL: http://dx.doi.org/10.1145/343477.343502
  12. Cassandra, 2016.http://cassandra.apache.org.
  13. Manuel Clavel, Francisco Durán, Steven Eker, Patrick Lincoln, Narciso Martí-Oliet, José Meseguer, and Carolyn L. Talcott, editors. All About Maude - A High-Performance Logical Framework, How to Specify, Program and Verify Systems in Rewriting Logic, volume 4350 of Lecture Notes in Computer Science. Springer, 2007. Google Scholar
  14. Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. Benchmarking cloud serving systems with YCSB. In Joseph M. Hellerstein, Surajit Chaudhuri, and Mendel Rosenblum, editors, Proceedings of the 1st ACM Symposium on Cloud Computing, SoCC 2010, Indianapolis, Indiana, USA, June 10-11, 2010, pages 143-154. ACM, 2010. URL: http://dx.doi.org/10.1145/1807128.1807152
  15. James C. Corbett, Jeffrey Dean, Michael Epstein, Andrew Fikes, Christopher Frost, J. J. Furman, Sanjay Ghemawat, Andrey Gubarev, Christopher Heiser, Peter Hochschild, Wilson C. Hsieh, Sebastian Kanthak, Eugene Kogan, Hongyi Li, Alexander Lloyd, Sergey Melnik, David Mwaura, David Nagle, Sean Quinlan, Rajesh Rao, Lindsay Rolig, Yasushi Saito, Michal Szymaniak, Christopher Taylor, Ruth Wang, and Dale Woodford. Spanner: Google's globally-distributed database. In Chandu Thekkath and Amin Vahdat, editors, 10th USENIX Symposium on Operating Systems Design and Implementation, OSDI 2012, Hollywood, CA, USA, October 8-10, 2012, pages 261-264. USENIX Association, 2012. URL: https://www.usenix.org/conference/osdi12/technical-sessions/presentation/corbett.
  16. DB-Engines, 2016.http://db-engines.com/en/ranking.
  17. Akon Dey, Alan Fekete, Raghunath Nambiar, and Uwe Röhm.YCSB+T: benchmarking web-scale transactional databases. In Workshops Proceedings of the 30th International Conference on Data Engineering Workshops, ICDE 2014, Chicago, IL, USA, March 31 - April 4, 2014, pages 223-230. IEEE Computer Society, 2014. URL: http://dx.doi.org/10.1109/ICDEW.2014.6818330
  18. Jonas Eckhardt, Tobias Mühlbauer, Musab AlTurki, José Meseguer, and Martin Wirsing. Stable availability under denial of service attacks through formal patterns. In Juan de Lara and Andrea Zisman, editors, Fundamental Approaches to Software Engineering - 15th International Conference, FASE 2012, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2012, Tallinn, Estonia, March 24 - April 1, 2012. Proceedings, volume 7212 of Lecture Notes in Computer Science, pages 78-93. Springer, 2012. URL: http://dx.doi.org/10.1007/978-3-642-28872-2_6
  19. Jonas Eckhardt, Tobias Mühlbauer, José Meseguer, and Martin Wirsing. Statistical model checking for composite actor systems. In Narciso Martí-Oliet and Miguel Palomino, editors, Recent Trends in Algebraic Development Techniques, 21st International Workshop, WADT 2012, Salamanca, Spain, June 7-10, 2012, Revised Selected Papers, volume 7841 of Lecture Notes in Computer Science, pages 143-160. Springer, 2012. URL: http://dx.doi.org/10.1007/978-3-642-37635-1_9
  20. Andrea Gandini, Marco Gribaudo, William J. Knottenbelt, Rasha Osman, and Pietro Piazzolla. Performance evaluation of nosql databases. In András Horváth and Katinka Wolter, editors, Computer Performance Engineering - 11th European Workshop, EPEW 2014, Florence, Italy, September 11-12, 2014. Proceedings, volume 8721 of Lecture Notes in Computer Science, pages 16-29. Springer, 2014. URL: http://dx.doi.org/10.1007/978-3-319-10885-8_2
  21. Jon Grov and Peter Csaba Ölveczky. Formal modeling and analysis of google's megastore in real-time maude. In Shusaku Iida, José Meseguer, and Kazuhiro Ogata, editors, Specification, Algebra, and Software - Essays Dedicated to Kokichi Futatsugi, volume 8373 of Lecture Notes in Computer Science, pages 494-519. Springer, 2014. URL: http://dx.doi.org/10.1007/978-3-642-54624-2_25
  22. Jon Grov and Peter Csaba Ölveczky. Increasing consistency in multi-site data stores: Megastore-cgc and its formal analysis. In Dimitra Giannakopoulou and Gwen Salaün, editors, Software Engineering and Formal Methods - 12th International Conference, SEFM 2014, Grenoble, France, September 1-5, 2014. Proceedings, volume 8702 of Lecture Notes in Computer Science, pages 159-174. Springer, 2014. URL: http://dx.doi.org/10.1007/978-3-319-10431-7_12
  23. Leslie Lamport. Time, clocks, and the ordering of events in a distributed system. Commun. ACM, 21(7):558-565, 1978. URL: http://dx.doi.org/10.1145/359545.359563
  24. Si Liu, Son Nguyen, Jatin Ganhotra, Muntasir Raihan Rahman, Indranil Gupta, and José Meseguer. Quantitative analysis of consistency in nosql key-value stores. In Javier Campos and Boudewijn R. Haverkort, editors, Quantitative Evaluation of Systems, 12th International Conference, QEST 2015, Madrid, Spain, September 1-3, 2015, Proceedings, volume 9259 of Lecture Notes in Computer Science, pages 228-243. Springer, 2015. URL: http://dx.doi.org/10.1007/978-3-319-22264-6_15
  25. Si Liu, Peter Csaba Ölveczky, Muntasir Raihan Rahman, Jatin Ganhotra, Indranil Gupta, and José Meseguer. Formal modeling and analysis of RAMP transaction systems. In Sascha Ossowski, editor, Proceedings of the 31st Annual ACM Symposium on Applied Computing, Pisa, Italy, April 4-8, 2016, pages 1700-1707. ACM, 2016. URL: http://dx.doi.org/10.1145/2851613.2851838
  26. Si Liu, Muntasir Raihan Rahman, Stephen Skeirik, Indranil Gupta, and José Meseguer. Formal modeling and analysis of cassandra in maude. In Stephan Merz and Jun Pang, editors, Formal Methods and Software Engineering - 16th International Conference on Formal Engineering Methods, ICFEM 2014, Luxembourg, Luxembourg, November 3-5, 2014. Proceedings, volume 8829 of Lecture Notes in Computer Science, pages 332-347. Springer, 2014. URL: http://dx.doi.org/10.1007/978-3-319-11737-9_22
  27. Si Liu, Muntasir Raihan Rahman, Stephen Skeirik, Indranil Gupta, and José Meseguer. Formal modeling and analysis of Cassandra in Maude. Technical Report. Manuscript, 2014. URL: https://sites.google.com/site/siliunobi/icfem-cassandra.
  28. Wyatt Lloyd, Michael J. Freedman, Michael Kaminsky, and David G. Andersen. Don't settle for eventual: scalable causal consistency for wide-area storage with COPS. In Ted Wobber and Peter Druschel, editors, Proceedings of the 23rd ACM Symposium on Operating Systems Principles 2011, SOSP 2011, Cascais, Portugal, October 23-26, 2011, pages 401-416. ACM, 2011. URL: http://dx.doi.org/10.1145/2043556.2043593
  29. Haonan Lu, Kaushik Veeraraghavan, Philippe Ajoux, Jim Hunt, Yee Jiun Song, Wendy Tobagus, Sanjeev Kumar, and Wyatt Lloyd. Existential consistency: measuring and understanding consistency at facebook. In Ethan L. Miller and Steven Hand, editors, Proceedings of the 25th Symposium on Operating Systems Principles, SOSP 2015, Monterey, CA, USA, October 4-7, 2015, pages 295-310. ACM, 2015. URL: http://dx.doi.org/10.1145/2815400.2815426
  30. MongoDB, 2016.http://www.mongodb.org.
  31. Rasha Osman and Pietro Piazzolla. Modelling replication in nosql datastores. In Gethin Norman and William H. Sanders, editors, Quantitative Evaluation of Systems - 11th International Conference, QEST 2014, Florence, Italy, September 8-10, 2014. Proceedings, volume 8657 of Lecture Notes in Computer Science, pages 194-209. Springer, 2014. URL: http://dx.doi.org/10.1007/978-3-319-10696-0_16
  32. Muntasir Raihan Rahman, Wojciech M. Golab, Alvin AuYoung, Kimberly Keeton, and Jay J. Wylie. Toward a principled framework for benchmarking consistency. In Michael J. Freedman and Neeraj Suri, editors, Proceedings of the Eighth Workshop on Hot Topics in System Dependability, HotDep 2012, Hollywood, CA, USA, October 7, 2012. USENIX Association, 2012. URL: https://www.usenix.org/conference/hotdep12/workshop-program/presentation/rahman.
  33. Redis, 2016.http://redis.io.
  34. Koushik Sen, Mahesh Viswanathan, and Gul Agha. On statistical model checking of stochastic systems. In Kousha Etessami and Sriram K. Rajamani, editors, Computer Aided Verification, 17th International Conference, CAV 2005, Edinburgh, Scotland, UK, July 6-10, 2005, Proceedings, volume 3576 of Lecture Notes in Computer Science, pages 266-280. Springer, 2005. URL: http://dx.doi.org/10.1007/11513988_26
  35. Koushik Sen, Mahesh Viswanathan, and Gul A. Agha.VESTA:A statistical model-checker and analyzer for probabilistic systems. In Second International Conference on the Quantitative Evaluaiton of Systems (QEST 2005), 19-22 September 2005, Torino, Italy, pages 251-252. IEEE Computer Society, 2005. URL: http://dx.doi.org/10.1109/QEST.2005.42
  36. Stephen Skeirik, Rakesh B. Bobba, and José Meseguer. Formal analysis of fault-tolerant group key management using zookeeper. In 13th IEEE/ACM International Symposium on Cluster, Cloud, and Grid Computing, CCGrid 2013, Delft, Netherlands, May 13-16, 2013, pages 636-641. IEEE Computer Society, 2013. URL: http://dx.doi.org/10.1109/CCGrid.2013.98
  37. Doug Terry. Replicated data consistency explained through baseball. Commun. ACM, 56(12):82-89, 2013. URL: http://dx.doi.org/10.1145/2500500
  38. Douglas B. Terry, Vijayan Prabhakaran, Ramakrishna Kotla, Mahesh Balakrishnan, Marcos K. Aguilera, and Hussam Abu-Libdeh. Consistency-based service level agreements for cloud storage. In Michael Kaminsky and Mike Dahlin, editors, ACMSIGOPS 24th Symposium on Operating Systems Principles, SOSP'13, Farmington, PA, USA, November 3-6, 2013, pages 309-324. ACM, 2013. URL: http://dx.doi.org/10.1145/2517349.2522731
  39. Werner Vogels. Eventually consistent. Commun. ACM, 52(1):40-44, 2009. URL: http://dx.doi.org/10.1145/1435417.1435432
  40. Hiroshi Wada, Alan Fekete, Liang Zhao, Kevin Lee, and Anna Liu. Data consistency properties and the trade-offs in commercial cloud storage: the consumers' perspective. In CIDR 2011, Fifth Biennial Conference on Innovative Data Systems Research, Asilomar, CA, USA, January 9-12, 2011, Online Proceedings, pages 134-143. www.cidrdb.org, 2011. URL: http://www.cidrdb.org/cidr2011/Papers/CIDR11_Paper15.pdf.
  41. Brian White, Jay Lepreau, Leigh Stoller, Robert Ricci, Shashi Guruprasad, Mac Newbold, Mike Hibler, Chad Barb, and Abhijeet Joglekar. An integrated experimental environment for distributed systems and networks. In David E. Culler and Peter Druschel, editors, 5th Symposium on Operating System Design and Implementation (OSDI 2002), Boston, Massachusetts, USA, December 9-11, 2002. USENIX Association, 2002. URL: http://www.usenix.org/events/osdi02/tech/white.html.
  42. Martin Wirsing, Jonas Eckhardt, Tobias Mühlbauer, and José Meseguer. Design and analysis of cloud-based architectures with KLAIM and maude. In Francisco Durán, editor, Rewriting Logic and Its Applications - 9th International Workshop, WRLA 2012, Held as a Satellite Event of ETAPS, Tallinn, Estonia, March 24-25, 2012, Revised Selected Papers, volume 7571 of Lecture Notes in Computer Science, pages 54-82. Springer, 2012. URL: http://dx.doi.org/10.1007/978-3-642-34005-5_4
  43. Håkan L. S. Younes and Reid G. Simmons. Statistical probabilistic model checking with a focus on time-bounded properties. Inf. Comput., 204(9):1368-1409, 2006. URL: http://dx.doi.org/10.1016/j.ic.2006.05.002
Questions / Remarks / Feedback
X

Feedback for Dagstuhl Publishing


Thanks for your feedback!

Feedback submitted

Could not send message

Please try again later or send an E-mail