将 Pig 结果导出到数据库的方法
有没有办法将Pig的结果直接导出到mysql这样的数据库?
Is there a way to export the results from Pig directly to a database like mysql?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
有没有办法将Pig的结果直接导出到mysql这样的数据库?
Is there a way to export the results from Pig directly to a database like mysql?
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
接受
或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
发布评论
评论(5)
在牢记 Orangeoctopus 所说的同时(谨防 DDOS...),您是否查看过 DBStorage?
While keeping in mind what orangeoctopus said (beware of DDOS...) have you had a look to DBStorage?
我看到的主要问题是每个减速器实际上都会在同一时间插入数据库。
如果您认为这不是问题,我建议您编写 自定义存储方法 使用 JDBC(或类似的方法)直接插入数据库而不向 HDFS 写入任何内容。
如果您害怕对自己的数据库执行 DDOS 攻击,也许在 HDFS 上收集数据并执行单独的批量加载到 mysql 中会更好。
The main problem I see is that each reducer is effectively going to insert into the database around the same time.
If you don't think this will be an issue, I suggest you write a custom Storage method that uses JDBC (or something similar) to insert into the database directly and writing nothing out to HDFS.
If you are afraid of performing a DDOS attack on your own database, perhaps collecting the data on HDFS and performing a separate bulk load into mysql would be better.
我目前正在试验嵌入式 Pig 应用程序,该应用程序通过 PigServer.OpenIterator 和 JDBC 连接。它在测试中效果很好,但我还没有大规模尝试过。这类似于已经建议的自定义存储方法,但从单点运行,因此不会发生意外的 DDOS 攻击。如果您不运行数据库服务器上的负载(我个人更喜欢除了数据库本身之外什么也不运行),那么您实际上最终会支付两次网络传输成本(集群 -> 登台计算机,登台计算机 -> 数据库服务器)数据库服务器),但这与“写出文件并批量加载它”选项没有什么不同。
I'm currently experimenting with an embedded pig application which loads results into mysql via PigServer.OpenIterator and a JDBC connection. It's worked very well in testing, but I haven't tried it at scale yet. This is similar to the custom storage method already suggested, but runs from a single point, so no accidental DDOS attack. You effectively end up paying the network transfer cost twice (cluster -> staging machine, staging machine -> DB server) if you don't run the load off the DB server (I personally prefer to run nothing except the DB itself off the DB server), but that's no different than the "write the file out and bulk load it" option.
Sqoop 可能是个好方法,但很难设置(恕我直言),因为所有这些 Hadoop 相关项目......
Pig 的 DBStorage 运行良好(至少在存储方面)。
不要忘记注册 PiggyBank 和 MySQL 驱动程序:
以下是示例调用:
Sqoop may be the good way to go, but it is difficult to set-up (IMHO) as all these Hadoop related projects...
Pig's DBStorage is working fine (at least for storing).
Don't forget to register the PiggyBank and your MySQL driver:
Here is a sample call:
尝试使用 Sqoop
Try using Sqoop