将 Pig 结果导出到数据库的方法

发布于 2024-10-11 09:51:58 字数 36 浏览 5 评论 0原文

有没有办法将Pig的结果直接导出到mysql这样的数据库?

Is there a way to export the results from Pig directly to a database like mysql?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

固执像三岁 2024-10-18 09:51:58

在牢记 Orangeoctopus 所说的同时(谨防 DDOS...),您是否查看过 DBStorage?

data = LOAD '...' AS (...);
...
STORE data INTO DBStorage('com.mysql.jdbc.Driver', 'dbc:mysql://host/db', 'INSERT ...');

While keeping in mind what orangeoctopus said (beware of DDOS...) have you had a look to DBStorage?

data = LOAD '...' AS (...);
...
STORE data INTO DBStorage('com.mysql.jdbc.Driver', 'dbc:mysql://host/db', 'INSERT ...');
極樂鬼 2024-10-18 09:51:58

我看到的主要问题是每个减速器实际上都会在同一时间插入数据库。

如果您认为这不是问题,我建议您编写 自定义存储方法 使用 JDBC(或类似的方法)直接插入数据库而不向 HDFS 写入任何内容。

如果您害怕对自己的数据库执行 DDOS 攻击,也许在 HDFS 上收集数据并执行单独的批量加载到 mysql 中会更好。

The main problem I see is that each reducer is effectively going to insert into the database around the same time.

If you don't think this will be an issue, I suggest you write a custom Storage method that uses JDBC (or something similar) to insert into the database directly and writing nothing out to HDFS.

If you are afraid of performing a DDOS attack on your own database, perhaps collecting the data on HDFS and performing a separate bulk load into mysql would be better.

小鸟爱天空丶 2024-10-18 09:51:58

我目前正在试验嵌入式 Pig 应用程序,该应用程序通过 PigServer.OpenIterator 和 JDBC 连接。它在测试中效果很好,但我还没有大规模尝试过。这类似于已经建议的自定义存储方法,但从单点运行,因此不会发生意外的 DDOS 攻击。如果您不运行数据库服务器上的负载(我个人更喜欢除了数据库本身之外什么也不运行),那么您实际上最终会支付两次网络传输成本(集群 -> 登台计算机,登台计算机 -> 数据库服务器)数据库服务器),但这与“写出文件并批量加载它”选项没有什么不同。

I'm currently experimenting with an embedded pig application which loads results into mysql via PigServer.OpenIterator and a JDBC connection. It's worked very well in testing, but I haven't tried it at scale yet. This is similar to the custom storage method already suggested, but runs from a single point, so no accidental DDOS attack. You effectively end up paying the network transfer cost twice (cluster -> staging machine, staging machine -> DB server) if you don't run the load off the DB server (I personally prefer to run nothing except the DB itself off the DB server), but that's no different than the "write the file out and bulk load it" option.

故事灯 2024-10-18 09:51:58

Sqoop 可能是个好方法,但很难设置(恕我直言),因为所有这些 Hadoop 相关项目......

Pig 的 DBStorage 运行良好(至少在存储方面)。

不要忘记注册 PiggyBank 和 MySQL 驱动程序:

-- Register Piggy bank
REGISTER /opt/cmr/pig/pig-0.10.0/lib/piggybank.jar;

-- Register MySQL driver
REGISTER /opt/cmr/mysql/drivers/mysql-connector-java-5.1.15-bin.jar

以下是示例调用:

-- Store a relation into a SQL table
STORE relation INTO 'unused' USING org.apache.pig.piggybank.storage.DBStorage('com.mysql.jdbc.Driver', 'jdbc:mysql://<mysqlserver>/<database>', '<login>', '<password>', 'REPLACE INTO <table> (<column1>, <column2>) VALUES (?, ?)');

Sqoop may be the good way to go, but it is difficult to set-up (IMHO) as all these Hadoop related projects...

Pig's DBStorage is working fine (at least for storing).

Don't forget to register the PiggyBank and your MySQL driver:

-- Register Piggy bank
REGISTER /opt/cmr/pig/pig-0.10.0/lib/piggybank.jar;

-- Register MySQL driver
REGISTER /opt/cmr/mysql/drivers/mysql-connector-java-5.1.15-bin.jar

Here is a sample call:

-- Store a relation into a SQL table
STORE relation INTO 'unused' USING org.apache.pig.piggybank.storage.DBStorage('com.mysql.jdbc.Driver', 'jdbc:mysql://<mysqlserver>/<database>', '<login>', '<password>', 'REPLACE INTO <table> (<column1>, <column2>) VALUES (?, ?)');
此刻的回忆 2024-10-18 09:51:58

尝试使用 Sqoop

Try using Sqoop

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文