Simplifying the Development and Deployment of MapReduce AlgorithmsSimplifying the Development and Deployment of MapReduce Algorithms

##plugins.themes.academic_pro.article.main##

Ferosh Jacob
Amber Wagner
Prateek Bahri
Susan Vrbsky
Jeff Gray

Abstract

This paper describes the existing challenges of creating MapReduce algorithms and how our approach minimizes these challenges. MapRedoop is a framework that can be used to transform a program written in a DSL to a MapReduce implementation, which can be deployed and executed in a cloud platform such as Eucalyptus or Amazon’s Elastic Compute Cloud (EC2). Assorted examples selected from various domains have been rewritten in the MapRedoop framework to demonstrate its expressiveness and usefulness. Our performance analysis reveals that the advantages gained using our approach can be attained with comparable execution times to the methodologies currently in practice.

##plugins.themes.academic_pro.article.details##

How to Cite
Ferosh Jacob, Amber Wagner, Prateek Bahri, Susan Vrbsky, & Jeff Gray. (2011). Simplifying the Development and Deployment of MapReduce AlgorithmsSimplifying the Development and Deployment of MapReduce Algorithms. International Journal of Next-Generation Computing, 2(2), 139–158. https://doi.org/10.47164/ijngc.v2i2.91

References

  1. Allen, E., Chase, D., Hallett, J., Luchangco, V., Maessen, J.-W., Ryu, S., Jr., G. L. S., and Tobin Hochstadt, S. 2007. The Fortress Language Specification. Tech. rep., Sun Microsystems, Inc.
  2. Chafi, H., Sujeeth, A. K., Brown, K. J., Lee, H., Atreya, A. R., and Olukotun, K. 2011. A domain-specific approach to heterogeneous parallelism. In Proceedings of the Symposium on Principles and Practice of Parallel Programming. San Antonio, TX, 35–46.
  3. Charles, P., Grothoff, C., Saraswat, V., Donawa, C., Kielstra, A., Ebcioglu, K., von Praun, C., and Sarkar, V. 2005. X10: an object-oriented approach to non-uniform cluster computing. ACM SIGPLAN Notices 40, 10 (October), 519–538.
  4. Czarnecki, K. and Eisenecker, U. 2000. Generative programming: methods, tools, and applications. ACM Press/Addison-Wesley Publishing Co. New York, NY.
  5. Dean, J. and Ghemawat, S. 2004. Mapreduce: Simplified data processing on large clusters. In Proceedings of the Symposium on on Operating Systems Design & Implementation. USENIX Association, San Francisco, CA, 137–150.
  6. Dean, J. and Ghemawat, S. 2008. Mapreduce: Simplified data processing on large clusters. Communications of the ACM 51, 1 (January), 107–113.
  7. Deimel, J. and Lionel, E. 1985. The uses of program reading. ACM SIGCSE Bulletin 17, 2 (June), 5–14.
  8. Diaconescu, R. and Zima, H. 2007. An approach to data distributions in chapel. Internation Journal of High Performance Computing Applications 21, 3 (August), 313–335.
  9. Fritz, N., Lucas, P., and Slusallek, P. 2004. CGiS, a new language for data-parallel gpu programming. In Proceedings of the 9th International Workshop Vision, Modeling, and Visualization. Stanford, CA, 241–248.
  10. Ghemawat, S., Gobioff, H., and Leung, S.-T. 2003. The google file system. In Proceedings of the Symposium on Operating systems principles. ACM, Bolton Landing, NY, 29–43.
  11. He, B., Fang, W., Luo, Q., Govindaraju, N. K., and Wang, T. 2008. Mars: A MapReduce framework on graphics processors. In Proceedings of the Innternational Conference on Parallel Architectures and Compilation Techniques. ACM, Toronto, Ontario, Canada, 260–269.
  12. Jacob, F., Arora, R., Bangalore, P., Mernik, M., and Gray, J. 2009. Raising the level of abstraction of gpu-programming. In Proceedings of the 16th International Conference on Parallel and Distributed Processing Techniques and Applications. Las Vegas, Nevada, 339–345.
  13. Jacob, F., Whittaker, D., Thapaliya, S., Bangalore, P., Mernik, M., and Gray, J. 2010. Cudacl : A tool for cuda and opencl programmers. In Proceedings of the International Conference of High Performance Computing. Goa, India, 1–11.
  14. Kanungo, T., Mount, D. M., Netanyahu, N. S., Piatko, C. D., Silverman, R., and Wu, A. Y. 2002. An efficient k-means clustering algorithm: Analysis and implementation. IEEE Transactions on Pattern Analysis Machine Intelligence 24, 7 (July), 881–892.
  15. Kernighan, B. W. and Plauger, P. J. 1982. The Elements of Programming Style, 2nd ed. McGraw-Hill, Inc., New York, NY.
  16. Lin, J. and Dyer, C. 2010. Data-Intensive Text Processing with MapReduce. Synthesis Lectures on Human Language Technologies. Morgan & Claypool Publishers.
  17. Low, Y., Gonzalez, J., Kyrola, A., Bickson, D., Guestrin, C., and Hellerstein, J. M. 2010. Graphlab: A new framework for parallel machine learning. Clinical Orthopaedics and Related Research abs/1006.4990.
  18. Manjunatha, A., Anderson, P., Ranabahu, A., and Sheth, A. 2011. Identifying and implementing the underlying operators for nuclear magnetic resonance based metabolomics data analysis. In Proceedings of the International Conference on Bioinformatics and Computational Biology. ACM, New Orleans, LA.
  19. Mernik, M., Heering, J., and Sloane, A. M. 2005. When and how to develop domain-specific languages. ACM Computing Surveys 37, 4 (December), 316–344.
  20. Page, L., Brin, S., Motwani, R., and Winograd, T. 1998. The PageRank citation ranking: Bringing order to the web. Tech. rep., Stanford Digital Library Technologies Project.
  21. Parnas, D. L. and Clements, P. C. 1986. A rational design process: How and why to fake it. IEEE Transactions on Software Engineering 12, 2 (February), 251–257.
  22. Pike, R., Dorward, S., Griesemer, R., and Quinlan, S. 2005. Interpreting the data: Parallel analysis with sawzall. Scientific Programming 13, 4 (October), 277–298.
  23. Raja, A. and Lakshmanan, D. 2010. Article: Domain specific languages. International Journal of Computer Applications 1, 21 (February), 99–105.
  24. Ranabahu, A., Sheth, A., Manjunatha, A., and Thirunarayan, K. 2010. Towards cloud mobile hybrid application generation using semantically enriched domain specific languages. In International Workshop on Mobile Computing and Clouds. ACM, Santa Clara, CA.
  25. Ranger, C., Raghuraman, R., Penmetsa, A., Bradski, G., and Kozyrakis, C. 2007. Evaluating mapreduce for multi-core and multiprocessor systems. In Proceedings of the International Symposium on High Performance Computer Architecture. IEEE Computer Society, Phoenix, AZ, 13–24.
  26. Sugiki, A., Kato, K., Ishii, Y., Taniguchi, H., and Hirooka, N. 2010. Kumoi: A high-level scripting environment for collective virtual machines. In International Conference on Parallel and Distributed Systems. Vol. 0. IEEE Computer Society, Shanghai, China, 322–329.
  27. Sujeeth, A. K., Lee, H., Brown, K. J., Rompf, T., Chafi, H., Wu, M., Atreya, A. R., Odersky, M., and Olukotun, K. 2011. OptiML: An implicitly parallel domain-specific language for machine learning. In Proceedings of the International Conference on Machine Learning. Haifa, Israel.
  28. Wu, H. and Gray, J. 2005. Testing domain-specific languages in eclipse. In Companion to the 20th annual ACM SIGPLAN conference on Object-oriented programming, systems, languages, and applications. ACM, San Diego, CA, 173–174.
  29. Zobel, J. and Moffat, A. 2006. Inverted files for text search engines. ACM Computing Surveys. 38, 2 (July).