返回介绍

参考文献

发布于 2024-08-24 16:53:17 字数 23851 浏览 0 评论 0 收藏 0

  1. Jeffrey Dean and Sanjay Ghemawat: MapReduce: Simplified Data Processing on Large Clusters , at 6th USENIX Symposium on Operating System Design and Implementation (OSDI), December 2004.
  2. Joel Spolsky: The Perils of JavaSchools , joelonsoftware.com, December 25, 2005.
  3. Shivnath Babu and Herodotos Herodotou: Massively Parallel Databases and MapReduce Systems , Foundations and Trends in Databases, volume 5, number 1, pages 1–104, November 2013. doi:10.1561/1900000036
  4. David J. DeWitt and Michael Stonebraker: MapReduce: A Major Step Backwards , originally published at databasecolumn.vertica.com, January 17, 2008.
  5. Henry Robinson: The Elephant Was a Trojan Horse: On the Death of Map-Reduce at Google , the-paper-trail.org, June 25, 2014.
  6. The Hollerith Machine , United States Census Bureau, census.gov.
  7. IBM 82, 83, and 84 Sorters Reference Manual , Edition A24-1034-1, International Business Machines Corporation, July 1962.
  8. Adam Drake: Command-Line Tools Can Be 235x Faster than Your Hadoop Cluster , aadrake.com, January 25, 2014.
  9. GNU Coreutils 8.23 Documentation , Free Software Foundation, Inc., 2014.
  10. Martin Kleppmann: Kafka, Samza, and the Unix Philosophy of Distributed Data , martin.kleppmann.com, August 5, 2015.
  11. Doug McIlroy: Internal Bell Labs memo , October 1964. Cited in: Dennis M. Richie: Advice from Doug McIlroy , cm.bell-labs.com.
  12. M. D. McIlroy, E. N. Pinson, and B. A. Tague: UNIX Time-Sharing System: Foreword , The Bell System Technical Journal, volume 57, number 6, pages 1899–1904, July 1978.
  13. Eric S. Raymond: The Art of UNIX Programming . Addison-Wesley, 2003. ISBN: 978-0-13-142901-7
  14. Ronald Duncan: Text File Formats – ASCII Delimited Text – Not CSV or TAB Delimited Text , ronaldduncan.wordpress.com, October 31, 2009.
  15. Alan Kay: Is 'Software Engineering' an Oxymoron? , tinlizzie.org.
  16. Martin Fowler: InversionOfControl , martinfowler.com, June 26, 2005.
  17. Daniel J. Bernstein: Two File Descriptors for Sockets , cr.yp.to.
  18. Rob Pike and Dennis M. Ritchie: The Styx Architecture for Distributed Systems , Bell Labs Technical Journal, volume 4, number 2, pages 146–152, April 1999.
  19. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung: The Google File System , at 19th ACM Symposium on Operating Systems Principles (SOSP), October 2003. doi:10.1145/945445.945450
  20. Michael Ovsiannikov, Silvius Rus, Damian Reeves, et al.: The Quantcast File System , Proceedings of the VLDB Endowment, volume 6, number 11, pages 1092–1101, August 2013. doi:10.14778/2536222.2536234
  21. OpenStack Swift 2.6.1 Developer Documentation , OpenStack Foundation, docs.openstack.org, March 2016.
  22. Zhe Zhang, Andrew Wang, Kai Zheng, et al.: Introduction to HDFS Erasure Coding in Apache Hadoop , blog.cloudera.com, September 23, 2015.
  23. Peter Cnudde: Hadoop Turns 10 , yahoohadoop.tumblr.com, February 5, 2016.
  24. Eric Baldeschwieler: Thinking About the HDFS vs. Other Storage Technologies , hortonworks.com, July 25, 2012.
  25. Brendan Gregg: Manta: Unix Meets Map Reduce , dtrace.org, June 25, 2013.
  26. Tom White: Hadoop: The Definitive Guide, 4th edition. O'Reilly Media, 2015. ISBN: 978-1-491-90163-2
  27. Jim N. Gray: Distributed Computing Economics , Microsoft Research Tech Report MSR-TR-2003-24, March 2003.
  28. Márton Trencséni: Luigi vs Airflow vs Pinball , bytepawn.com, February 6, 2016.
  29. Roshan Sumbaly, Jay Kreps, and Sam Shah: The 'Big Data' Ecosystem at LinkedIn , at ACM International Conference on Management of Data (SIGMOD), July 2013. doi:10.1145/2463676.2463707
  30. Alan F. Gates, Olga Natkovich, Shubham Chopra, et al.: Building a High-Level Dataflow System on Top of Map-Reduce: The Pig Experience , at 35th International Conference on Very Large Data Bases (VLDB), August 2009.
  31. Ashish Thusoo, Joydeep Sen Sarma, Namit Jain, et al.: Hive – A Petabyte Scale Data Warehouse Using Hadoop , at 26th IEEE International Conference on Data Engineering (ICDE), March 2010. doi:10.1109/ICDE.2010.5447738
  32. Cascading 3.0 User Guide , Concurrent, Inc., docs.cascading.org, January 2016.
  33. Apache Crunch User Guide , Apache Software Foundation, crunch.apache.org.
  34. Craig Chambers, Ashish Raniwala, Frances Perry, et al.: FlumeJava: Easy, Efficient Data-Parallel Pipelines , at 31st ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), June 2010. doi:10.1145/1806596.1806638
  35. Jay Kreps: Why Local State is a Fundamental Primitive in Stream Processing , oreilly.com, July 31, 2014.
  36. Martin Kleppmann: Rethinking Caching in Web Apps , martin.kleppmann.com, October 1, 2012.
  37. Mark Grover, Ted Malaska, Jonathan Seidman, and Gwen Shapira: Hadoop Application Architectures . O'Reilly Media, 2015. ISBN: 978-1-491-90004-8
  38. Philippe Ajoux, Nathan Bronson, Sanjeev Kumar, et al.: Challenges to Adopting Stronger Consistency at Scale , at 15th USENIX Workshop on Hot Topics in Operating Systems (HotOS), May 2015.
  39. Sriranjan Manjunath: Skewed Join , wiki.apache.org, 2009.
  40. David J. DeWitt, Jeffrey F. Naughton, Donovan A.Schneider, and S. Seshadri: Practical Skew Handling in Parallel Joins , at 18th International Conference on Very Large Data Bases (VLDB), August 1992.
  41. Marcel Kornacker, Alexander Behm, Victor Bittorf, et al.: Impala: A Modern, Open-Source SQL Engine for Hadoop , at 7th Biennial Conference on Innovative Data Systems Research (CIDR), January 2015.
  42. Matthieu Monsch: Open-Sourcing PalDB, a Lightweight Companion for Storing Side Data , engineering.linkedin.com, October 26, 2015.
  43. Daniel Peng and Frank Dabek: Large-Scale Incremental Processing Using Distributed Transactions and Notifications , at 9th USENIX conference on Operating Systems Design and Implementation (OSDI), October 2010.
  44. "Cloudera Search User Guide," Cloudera, Inc., September 2015.
  45. Lili Wu, Sam Shah, Sean Choi, et al.: The Browsemaps: Collaborative Filtering at LinkedIn , at 6th Workshop on Recommender Systems and the Social Web (RSWeb), October 2014.
  46. Roshan Sumbaly, Jay Kreps, Lei Gao, et al.: Serving Large-Scale Batch Computed Data with Project Voldemort , at 10th USENIX Conference on File and Storage Technologies (FAST), February 2012.
  47. Varun Sharma: Open-Sourcing Terrapin: A Serving System for Batch Generated Data , engineering.pinterest.com, September 14, 2015.
  48. Nathan Marz: ElephantDB , slideshare.net, May 30, 2011.
  49. Jean-Daniel (JD) Cryans: How-to: Use HBase Bulk Loading, and Why , blog.cloudera.com, September 27, 2013.
  50. Nathan Marz: How to Beat the CAP Theorem , nathanmarz.com, October 13, 2011.
  51. Molly Bartlett Dishman and Martin Fowler: Agile Architecture , at O'Reilly Software Architecture Conference, March 2015.
  52. David J. DeWitt and Jim N. Gray: Parallel Database Systems: The Future of High Performance Database Systems , Communications of the ACM, volume 35, number 6, pages 85–98, June 1992. doi:10.1145/129888.129894
  53. Jay Kreps: But the multi-tenancy thing is actually really really hard , tweetstorm, twitter.com, October 31, 2014.
  54. Jeffrey Cohen, Brian Dolan, Mark Dunlap, et al.: MAD Skills: New Analysis Practices for Big Data , Proceedings of the VLDB Endowment, volume 2, number 2, pages 1481–1492, August 2009. doi:10.14778/1687553.1687576
  55. Ignacio Terrizzano, Peter Schwarz, Mary Roth, and John E. Colino: Data Wrangling: The Challenging Journey from the Wild to the Lake , at 7th Biennial Conference on Innovative Data Systems Research (CIDR), January 2015.
  56. Paige Roberts: To Schema on Read or to Schema on Write, That Is the Hadoop Data Lake Question , adaptivesystemsinc.com, July 2, 2015.
  57. Bobby Johnson and Joseph Adler: The Sushi Principle: Raw Data Is Better , at Strata+Hadoop World, February 2015.
  58. Vinod Kumar Vavilapalli, Arun C. Murthy, Chris Douglas, et al.: Apache Hadoop YARN: Yet Another Resource Negotiator , at 4th ACM Symposium on Cloud Computing (SoCC), October 2013. doi:10.1145/2523616.2523633
  59. Abhishek Verma, Luis Pedrosa, Madhukar Korupolu, et al.: Large-Scale Cluster Management at Google with Borg , at 10th European Conference on Computer Systems (EuroSys), April 2015. doi:10.1145/2741948.2741964
  60. Malte Schwarzkopf: The Evolution of Cluster Scheduler Architectures , firmament.io, March 9, 2016.
  61. Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, et al.: Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , at 9th USENIX Symposium on Networked Systems Design and Implementation (NSDI), April 2012.
  62. Holden Karau, Andy Konwinski, Patrick Wendell, and Matei Zaharia: Learning Spark. O'Reilly Media, 2015. ISBN: 978-1-449-35904-1
  63. Bikas Saha and Hitesh Shah: Apache Tez: Accelerating Hadoop Query Processing , at Hadoop Summit, June 2014.
  64. Bikas Saha, Hitesh Shah, Siddharth Seth, et al.: Apache Tez: A Unifying Framework for Modeling and Building Data Processing Applications , at ACM International Conference on Management of Data (SIGMOD), June 2015. doi:10.1145/2723372.2742790
  65. Kostas Tzoumas: Apache Flink: API, Runtime, and Project Roadmap , slideshare.net, January 14, 2015.
  66. Alexander Alexandrov, Rico Bergmann, Stephan Ewen, et al.: The Stratosphere Platform for Big Data Analytics , The VLDB Journal, volume 23, number 6, pages 939–964, May 2014. doi:10.1007/s00778-014-0357-y
  67. Michael Isard, Mihai Budiu, Yuan Yu, et al.: Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks , at European Conference on Computer Systems (EuroSys), March 2007. doi:10.1145/1272996.1273005
  68. Daniel Warneke and Odej Kao: Nephele: Efficient Parallel Data Processing in the Cloud , at 2nd Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS), November 2009. doi:10.1145/1646468.1646476
  69. Lawrence Page, Sergey Brin, Rajeev Motwani, and Terry Winograd: "The PageRank"
  70. Leslie G. Valiant: A Bridging Model for Parallel Computation , Communications of the ACM, volume 33, number 8, pages 103–111, August 1990. doi:10.1145/79173.79181
  71. Stephan Ewen, Kostas Tzoumas, Moritz Kaufmann, and Volker Markl: Spinning Fast Iterative Data Flows , Proceedings of the VLDB Endowment, volume 5, number 11, pages 1268-1279, July 2012. doi:10.14778/2350229.2350245
  72. Grzegorz Malewicz, Matthew H.Austern, Aart J. C. Bik, et al.: Pregel: A System for Large-Scale Graph Processing , at ACM International Conference on Management of Data (SIGMOD), June 2010. doi:10.1145/1807167.1807184
  73. Frank McSherry, Michael Isard, and Derek G. Murray: Scalability! But at What COST? , at 15th USENIX Workshop on Hot Topics in Operating Systems (HotOS), May 2015.
  74. Ionel Gog, Malte Schwarzkopf, Natacha Crooks, et al.: Musketeer: All for One, One for All in Data Processing Systems , at 10th European Conference on Computer Systems (EuroSys), April 2015. doi:10.1145/2741948.2741968
  75. Aapo Kyrola, Guy Blelloch, and Carlos Guestrin: GraphChi: Large-Scale Graph Computation on Just a PC , at 10th USENIX Symposium on Operating Systems Design and Implementation (OSDI), October 2012.
  76. Andrew Lenharth, Donald Nguyen, and Keshav Pingali: Parallel Graph Analytics , Communications of the ACM, volume 59, number 5, pages 78–87, May doi:10.1145/2901919
  77. Fabian Hüske: Peeking into Apache Flink's Engine Room , flink.apache.org, March 13, 2015.
  78. Mostafa Mokhtar: Hive 0.14 Cost Based Optimizer (CBO) Technical Overview , hortonworks.com, March 2, 2015.
  79. Michael Armbrust, Reynold S Xin, Cheng Lian, et al.: Spark SQL: Relational Data Processing in Spark , at ACM International Conference on Management of Data (SIGMOD), June 2015. doi:10.1145/2723372.2742797
  80. Daniel Blazevski: Planting Quadtrees for Apache Flink , insightdataengineering.com, March 25, 2016.
  81. Tom White: Genome Analysis Toolkit: Now Using Apache Spark for Data Processing , blog.cloudera.com, April 6, 2016.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
    我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
    原文