每个 HBase 表一个Reducer
基本上,我需要将数据路由到正确的Reducer。每个Reducer 都将是一个TableReducer。
我有以下文件
venodor1, user1, xxxx=n 供应商1、用户1、xxxx=n 供应商2、用户2、xxxx=n venodor2, user2, xxxx=n
我需要将其插入到以下 hbase 表中
: [用户1] => {数据:xxxx = n} [用户2] => {data:xxxx = n}
表供应商2: [用户1] => {数据:xxxx = n} [用户2] => {data:xxxx = n}
格式为[ROW_ID] => {[FAMILY]:[COLUMN] = [VALUE]}
- 每个供应商都有不同的 hbase 表
- 行,需要根据行中的值转到不同的 hbase 表。
有办法做到这一点吗?与级联?还有其他解决办法吗?
谢谢, 费德里科
Basically, I need to route data to the right Reducer. Each Reducer is going to be a TableReducer.
I have a the following file
venodor1, user1, xxxx=n
venodor1, user1, xxxx=n
venodor2, user2, xxxx=n
venodor2, user2, xxxx=n
I need to insert that in the following hbase tables
Table vendor1:
[user1] => {data:xxxx = n}
[user2] => {data:xxxx = n}
Table vendor2:
[user1] => {data:xxxx = n}
[user2] => {data:xxxx = n}
Format is [ROW_ID] => {[FAMILY]:[COLUMN] = [VALUE]}
- each vendor has a different hbase table
- rows need to go to different hbase tables base on a value in the line.
Is there a way to do that ? With Cascading ? Is there another work around this?
Thanks,
Federico
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我找到了方法...让减速器处理表。
不使用 TableReducer,只需使用Reducer。
在设置加载表(表应该是属性)时,将自动刷新设置为 false 并设置缓冲区大小。
在清理所有表上的flushCommit() 时。
对于键和值,Reducer 输出应该是 NullWritable(除非您确实想输出某些内容)。在reduce上只需执行table1.put tabe2.put等
TableReducer实现,它在一张表的幕后做类似的事情。
I found the way... Letting the reducer handling the tables.
Instead of using a TableReducer, just use a Reducer.
On setup load the tables (tables should be properties) set auto flush to false and set a buffer size.
On cleanup flushCommit() on all the tables.
Reducer output should be NullWritable for Key and Value (unless you do want to output something). On reduce Just do table1.put tabe2.put etc
TableReducer implementation it's doing something like this under the hood for one table.