使用 HBase 进行分析

发布于 2024-12-16 14:15:20 字数 743 浏览 1 评论 0原文

我对 HBase 几乎完全陌生。我想将当前基于 MySQL 的站点跟踪放到 HBase 中,因为 MySQL 根本无法再扩展。

我完全迷失了第一步...

我需要跟踪用户的不同操作,并且需要能够通过某些方面(日期、他们来自的国家/地区、他们执行操作的产品等)来聚合它们... )

我当前存储它的方式是,我有一个包含所有这些方面(国家、日期、产品……)的复合 PK 的表,其余字段是操作计数器。执行操作时,我将其插入表中,并将操作的列加一(ON DUPLICATE KEY UPDATE...)。

*date      | *country | *product | visited | liked | put_to_basket | purchased
2011-11-11 | US       | 123      | 2       | 1     | 0             | 0
2011-11-11 | GB       | 123      | 23      | 10    | 5             | 4
2011-11-12 | GB       | 555      | 54      | 0     | 10            | 2

我有一种感觉,这完全违背了 HBase 的方式,而且也不能真正扩展(如果键插入变得昂贵,数量就会不断增加)并且不太灵活。

如何在HBase中有效地利用它的属性跟踪用户操作?表格应该是什么样子? MapReduce 出现在哪里?

感谢您的所有建议!

I'm almost completely new to HBase. I would like to take my current site tracking based on MySQL and put it to HBase because MySQL simply doesn't scale anymore.

I'm totally lost int eh first step...

I need to track different actions of users and need to be able to aggregate them by some aspects (date, country they come from, product they performed the action with, etc...)

The way I store it currently is that I have a table with a composite PK with all these aspects (country, date, product, ...) and the rest of the fields are counters for actions. When an action is performed, I insert it to the table incrementing the action's column by one (ON DUPLICATE KEY UPDATE...).

*date      | *country | *product | visited | liked | put_to_basket | purchased
2011-11-11 | US       | 123      | 2       | 1     | 0             | 0
2011-11-11 | GB       | 123      | 23      | 10    | 5             | 4
2011-11-12 | GB       | 555      | 54      | 0     | 10            | 2

I have a feeling that this is completely against the HBase way, and also doesn't really scale (with the growing number if keys inserts get expensive) and not really flexible.

How to track user actions with it attributes effectively in HBase? How table(s) should look like? Where MapReduce comes in the picture?

Thanks for all suggestions!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

慕巷 2024-12-23 14:15:20

Lars George 的“HBASE:权威指南”解释了与您想要实现的设计非常相似的设计 在介绍章节

Lars George's "HBASE: the definitive guide" explains a design very similar to what you want to achieve in the introduction chapter

月寒剑心 2024-12-23 14:15:20

这可以按如下方式完成,

在 Hbase 中具有唯一的行 id,如下所示,

rowid = 日期 + 国家/地区 + 产品 --->将它们附加到单个实体中并将其作为键。

然后将计数器作为列。因此,当您收到诸如此类的事件时,

if(event == liked){
increment the liked column of the hbase by 1 for the corresponding key combination.
}

对于其他情况也是如此。

希望这有帮助!

This can be done as follows,

Have the unique row id in Hbase as follows,

rowid = date + country + product ---> append these into a single entity and have it as key.

Then have the counters as columns. So when you get an event like,

if(event == liked){
increment the liked column of the hbase by 1 for the corresponding key combination.
}

and so on for other cases.

Hope this helps!!

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文