Analyzing Costs and Optimizations for an Elastic Key-Value Store on Amazon Web Services

##plugins.themes.academic_pro.article.main##

David Chiu
Travis Hall
Farhana Kabir
Apeksha Shetty
Gagan Agrawal

Abstract

Cloud computing has emerged to provide virtual, pay-as-you-go computing and storage services over the Internet, where the usage cost directly depends on consumption. One compelling feature in Clouds is elasticity, where a user can demand, and be immediately given access to, more (or relinquish) resources based on requirements. However, this feature introduces new challenges in developing application and services. In this paper, we focus on the challenges of elastic data management in Cloud environments. Particularly, we consider an elastic key-value store, which is used to cache intermediate results in a service-oriented system, and accelerate future queries by reusing the stored values. Such a key-value store can clearly benefit from the elasticity offered by Clouds, by expanding the cache during query-intensive periods. However, supporting an elastic key-value store involves many challenges, including selecting an appropriate indexing scheme, data migration upon elastic resource provisioning, and optimizations to remove certain overheads in the Cloud.This paper focuses on the design of an elastic key-value store. We consider three ubiquitous methods for indexing: B-Trees, Extendible Hashing, and Bloom Filters, and we show how these schemes can be modified to exploit elasticity in Clouds. We also evaluate various performance aspects associated with the use of these indexing schemes. Furthermore, we have developed a heuristic to request elastic compute resources for expanding the cache such that instance startup overheads are minimized in our scheme. Our evaluation studies show that the index selection depends on various application and system level parameters that we have identified. And while we confirm that B-Trees, which pervade many of today’s key-value systems, would scale well, we show cases when Extendible Hashing would outperform B-Trees.We also conduct an analysis which focuses on cost–performance tradeoffs of maintaining the cache. We have compared several Amazon Web Service (AWS Cloud) resources as possible cache placements and found that application dependent attributes such as unit-data size, total cache size, and persistence, have far reaching impli- cations on the cost of cache sustenance. Moreover, while instance-based caches expectedly yield higher cost, the performance that they afford may outweigh lower cost options.

##plugins.themes.academic_pro.article.details##

How to Cite
David Chiu, Travis Hall, Farhana Kabir, Apeksha Shetty, & Gagan Agrawal. (2011). Analyzing Costs and Optimizations for an Elastic Key-Value Store on Amazon Web Services. International Journal of Next-Generation Computing, 2(2), 159–179. https://doi.org/10.47164/ijngc.v2i2.90

References

  1. ARMBRUST, et al., M. 2009. Above the clouds: A berkeley view of cloud computing. Tech. Rep. UCB/EECS-2009-28, EECS Department, University of California, Berkeley. Feb.
  2. BAYER, R. AND MCCREIGHT, E. 1970. Organization and maintenance of large ordered indices. In SIGFIDET ’70: Proceedings of the 1970 ACM SIGFIDET (now SIGMOD) Workshop on Data Description, Access and Control. ACM, New York, NY, USA, 107–141.
  3. BONOMI, F., MITZENMACHER, M., PANIGRAH, R., SINGH, S., AND VARGHESE, G. 2006. Beyond bloom filters: from approximate membership checks to approximate state machines. In SIGCOMM ’06: Proceedings of the 2006 conference on Applications, technologies, architectures, and protocols for computer communications. ACM, New York, NY, USA, 315–326.
  4. CHANG, F., DEAN, J., GHEMAWAT, S., HSIEH, W. C., WALLACH, D. A., BURROWS, M., CHANDRA, T., FIKES, A., AND GRUBER, R. E. 2006. Bigtable: a distributed storage system for structured data. In Proceedings of the 7th USENIX Symposium on Operating Systems Design and Implementation - Volume 7. USENIX Association, Berkeley, CA, USA, 15–15.
  5. CHIU, D. 2010. Auspice: Automatic service planning in cloud/grid environments. Ph.D. thesis, The Ohio State University, Columbus, OH.
  6. CHIU, D., SHETTY, A., AND AGRAWAL, G. 2010. Elastic cloud caches for accelerating service-oriented computations. In Proceedings of Supercomputing (SC’10).
  7. COMER, D. 1979. The ubiquitous b-tree. ACM Computing Surveys 11, 121–137.
  8. DAS, S., AGRAWAL, D., AND ABBADI, A. E. 2009. ElasTraS: An Elastic Transactional Data Store in the Cloud. In Proceedings of Workshop on Hot Topics in Cloud (HotCloud).
  9. DECANDIA, G., HASTORUN, D., JAMPANI, M., KAKULAPATI, G., LAKSHMAN, A., PILCHIN, A., SIVASUBRAMANIAN, S., VOSSHALL, P., AND VOGELS, W. 2007. Dynamo: amazon’s highly available key-value store. In Proceedings of twenty-first ACM SIGOPS symposium on Operating systems principles. SOSP ’07. ACM, New York, NY, USA, 205–220.
  10. DEJUN, J., PIERRE, G., AND CHI, C.-H. 2009. Ec2 performance analysis for resource provisioning of service-oriented applications. In Proceedings of the 3rd Workshop on Non-Functional Properties and SLA Management in Service-Oriented Computing.
  11. ELMASRI, R. AND NAVATHE, S. B. 2003. Fundamentals of Database Systems, Fourth Edition. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA.
  12. ENBODY, R. J. AND DU, H. C. 1988. Dynamic hashing schemes. ACM Comput. Surv. 20, 2, 850–113.
  13. EVANGELINOS, C. AND HILL, C. 2008. Cloud computing for parallel scientific hpc applications: Feasibility of running coupled atmosphere-ocean climate models on amazon’s ec2. In Cloud Computing and Its Applications. Chicago, IL.
  14. FAGIN, R., NIEVERGELT, J., PIPPENGER, N., AND STRONG, H. R. 1979. Extendible hashing - a fast access method for dynamic files. ACM Trans. Database Syst. 4, 315–344.
  15. FAN, L., CAO, P., ALMEIDA, J., AND BRODER, A. Z. 2000. Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Trans. Netw. 8, 3, 281–293.
  16. FITZPATRICK, B. 2004. Distributed caching with memcached. Linux J. 2004, 5–.
  17. HILL, Z. AND HUMPHREY, M. 2009. A quantitative analysis of high performance computing with amazon’s ec2 infrastructure: The death of the local cluster? In Proceedings of the 10th IEEE/ACM International Conference on Grid Computing. Banff, Alberta, Canada.
  18. JUVE, G., DEELMAN, E., VAHI, K., MEHTA, G., BERRIMAN, G. B., BERMAN, B. P., AND MAECHLING, P. 2010. Data sharing options for scientific workflows on amazon ec2. In SC. 1–9.
  19. KARGER, et al., D. 1997. Consistent hashing and random trees: Distributed caching protocols for relieving hot spots on the world wide web. In ACM Symposium on Theory of Computing. 654–663.
  20. KARGER, et al., D. 1999. Web caching with consistent hashing. In WWW’99: Proceedings of the 8th International Conference on the World Wide Web. 1203–1213.
  21. KIRSCH, A. AND MITZENMACHER, M. 2008. Less hashing, same performance: Building a better bloom filter. Random Struct. Algorithms 33, 2, 187–218.
  22. LAKSHMAN, A. AND MALIK, P. 2009. Cassandra: structured storage system on a p2p network. In Proceedings of the 28th ACM symposium on Principles of distributed computing. PODC ’09. ACM, New York, NY, USA, 5–5.
  23. LIM, H., BABU, S., AND CHASE, J. 2010. Automated Control for Elastic Storage. In Proceedings of International Conference on Autonomic Computing (ICAC). memcached. Memcachedb http://memcachedb.org/.
  24. OLSON, M. A., BOSTIC, K., AND SELTZER, M. 1999. Berkeley db. In Proceedings of the annual conference on USENIX Annual Technical Conference. USENIX Association, Berkeley, CA, USA, 43–43.
  25. PUTZE, F., SANDERS, P., AND SINGLER, J. 2009. Cache-, hash-, and space-efficient bloom filters. J. Exp. Algorithmics 14, 4.4–4.18.
  26. ROWSTRON, A. AND DRUSCHEL, P. 2001. Pastry: Scalable, distributed object location and routing for large-scale peer-to-peer systems. In IFIP/ACM International Conference on Distributed Systems Platforms (Middleware). 329–350.
  27. STOICA, I., MORRIS, R., KARGER, D., KAASHOEK, M. F., AND BALAKRISHNAN, H. 2001. Chord: A scalable peer-to-peer lookup service for internet applications. In Proceedings of the ACM SIGCOMM ’01 Conference. San Diego, California.
  28. ULLMAN, J. D., GARCIA-MOLINA, H., AND WIDOM, J. 2001. Database Systems: The Complete Book. Prentice Hall PTR, Upper Saddle River, NJ, USA.
  29. WANG, J., WU, S., GAO, H., LI, J., AND OOI, B. C. 2010. Indexing multi-dimensional data in a cloud system. In SIGMOD. Indianapolis, Indiana.
  30. ZHAO, B. Y., HUANG, L., STRIBLING, J., RHEA, S. C., JOSEPH, A. D., AND KUBIATOWICZ, J. D. 2004. Tapestry: A resilient global-scale overlay for service deployment. IEEE Journal on Selected Areas in Communications 22, 1 (Jan.), 41–53.