Hadoop 和关系数据库管理系统
Hadoop主要用于处理非结构化或半结构化数据。我想使用Hadoop来处理大量的结构化数据。
虽然 hadoop 能够从数据库读取(通过 DBInputFormat),但它不被认为是一种可扩展的方法,因为数据库连接的数量是有限的。
有人使用hadoop从RDBMS读取数据吗?表现如何?它可以支持多少个节点?
谢谢
Hadoop is mainly used to process unstructured or semi-structured data. I want to use Hadoop to process large amount of structured data.
Though hadoop is capable of reading from database (via DBInputFormat), it is not considered as a scalable approach as number of database connection would be limited.
Has anybody used hadoop to read data from RDBMS? What was the performance? How many nodes could it support?
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以使用 Sqoop 将数据从 RDBMS 导入到 Hadoop。
Hadoop 在处理非结构化数据方面表现出色,因为您将约束(创建结构化数据)推到了最后。这也允许在放置什么结构上发挥创造力,这将定义您可以提取的信息类型。
从来没有说你不能处理结构化数据,但获得的里程很低。 RDBMS 可以高效地处理结构化数据。
You can use Sqoop to import data from RDBMS to Hadoop.
Hadoop shines at processing unstructured data because you are pushing the constraints (creating structured data) to the end. This also allows for creativity on what structure to put, which will define the kind of information you can extract.
It is never said that you can not process structured data but the mileage obtained is low. RDBMS can process structured data as efficiently.