如何在 Amazon Elastic Mapreduce 之上使用 Hive 来处理 Amazon Simple DB 中的数据？

发布于 2024-09-07 06:27:20 字数 151 浏览 12 评论 0原文

我在 Amazon Simple DB 域中有大量数据。我想在 Elastic Map Reduce（在 hadoop 之上）上启动 Hive，并以某种方式从 simpledb 导入数据，或者连接到 simpledb 并对其运行 hiveql 查询。我在导入数据时遇到问题。有什么指点吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

浅紫色的梦幻 2024-09-14 06:27:20

作为流式 hadoop 作业的输入，您可以有 simpleDB 的一系列 select 语句。

例如，您的输入可以包含（以不太详细的形式）：

collectionA between dates 123 and 234
collectionA between dates 235 and 559
collectionA between dates 560 and 3000
...

然后您将实现一个执行以下转换的映射器脚本：
input_select_statement =>;执行选择语句=> 使用流式处理

这将非常容易，因为您可以使用任何您喜欢的语言的任何库，而不必担心实现任何复杂的 Hadoop java 内容。

希望这有帮助。

（最简单的方法是在本地运行一个脚本，其功能与上面相同，但将结果加载到 s3 中。我每晚运行一个这样的脚本来处理我们的许多数据库数据）

As input to a streaming hadoop job you could have a sequence of select statements for simpleDB.

for example, your input could contain (in a less verbose form):

collectionA between dates 123 and 234
collectionA between dates 235 and 559
collectionA between dates 560 and 3000
...

Then you would implement a mapper script that performed the following transformation:
input_select_statement => execute_select_statement => output_results

This would be super easy using streaming because you could use any library for any language you like and not have to worry about implementing any of the complicated Hadoop java stuff.

Hope this helps.

(the hacky way to do it would be to have a single script that you run locally that does the same as above, but loads the results into s3. I run a script like that nightly for a lot of our database data)

回复收藏 0 原文

~没有更多了~