数据可视化与数据库

发布于 2024-10-03 04:37:33 字数 261 浏览 2 评论 0原文

您好,

我查了一下这个网站的问题,没有找到相关的问题。

我目前已经构建了一个 Flex/PHP/MySQL 应用程序,我从 Hadoop 集群中提取数据并将其转储到 MySQL 表。随着我的数据集不断增长,这会出现几个问题。

我正在寻找更强大的开源解决方案,因此开始研究 HBase 以及如何利用 PHP 或 Java 将数据提取到可视化应用程序。

你们中有人在 Hadoop 或 HBase 之上构建过可视化平台吗?

谢谢你!

Greetings,

I have been looking through the questions on this site and I haven't found any related questions.

I have currently built a Flex/PHP/MySQL app where I take an extract from my Hadoop cluster and dump to a MySQL table. There are several problems with this as my data set continues to grow.

I am looking for a much more robust open-source solution, and therefore have started to examine HBase and how to leverage PHP or Java to extract my data to a visualization app.

Have any of you built any visualization platforms on top of Hadoop or HBase?

Thank you!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

弄潮 2024-10-10 04:37:33

我不完全确定您是否指的是从 HBase 获取信息。我假设您想要构建一个聚合应用程序,它对 HBase 中存储的数据进行“求和”、“计数”、“平均”等数据挖掘操​​作以生成图形/可视化。

在这种情况下,具体答案将取决于您尝试分析的数据的性质。此类应用程序之一是 StumpleUpon 的 http://opentsdb.net

在 HBase 上编写数据汇总器非常容易,因为它可以通过 MapReduce 来实现。
http:// hbase.apache.org/docs/r0.89.20100726/apidocs/org/apache/hadoop/hbase/mapred/package-summary.html

在我们的组织中,我们使用 Solr 来执行财务报告的查询和聚合功能然后我们将它们存储在 CMS 中进行渲染。因此,我们可以为同一数据集定制渲染。如果您有兴趣将其存储到 HBase+Solr 上的 CMS 中,以下内容将会很有趣 -

如果您希望像访问持久存储一样访问数据并对 ORM 感兴趣,您可能会遇到以下相关问题,否则请忽略它。以下内容复制自 - Java ORM for Hbase 我的另一个答案。

我认为 HBase 的优势在于将动态列保留在静态列族中。根据我使用 HBase 开发应用程序的经验,我发现确定单元格限定符和值并不像 SQL 那么容易。

例如,一本书有很多作者,根据您的访问模式、作者编辑、应用层缓存实现,您可能希望选择将整个作者保存在图书表中(即作者驻留在 2 个表中,作者表和图书表) ) 或仅作者 ID。此外,作者集合可以作为 XML/JSON 保存到一个单元格中,也可以保存到各个作者的单个单元格中。

有了这种理解,我得出的结论是,编写一个成熟的 ORM(例如 Hibernate)不仅非常困难,而且实际上也不是决定性的。所以我采取了不同的方法,就像 iBatis 之于 Hibernate 一样。

让我尝试解释一下它是如何工作的。为此,我将使用 此处此处

  1. 第一个也是最重要的任务是实现 ObjectRowConverter 接口,在本例中为 SessionDataObjectConverter。抽象类封装了从 HBase 社区讨论和学习的基本最佳实践。该扩展基本上让您可以 100% 控制如何将对象转换为 HBase 行,反之亦然。对于 API 的唯一限制是,您的域对象必须实现 PersistentDTO 接口,该接口在内部用于创建 Put、Delete、对 id 对象执行 byte[] 操作,反之亦然。
  2. 下一个任务是像 HBaseImplModule 中那样连接依赖项。如果您感兴趣,请告诉我,我将完成依赖注入。

就是这样。如何使用它们可以此处。它基本上使用CommonReadDao、CommonWriteDao从HBase读取和写入数据。常见的 read dao 实现了查询上的多线程行到对象转换、多线程 get by ids、get by id 并具有类似于 Hibernate Criteria 的 API,可通过 Scan 查询 HBase(没有可用的聚合函数)。 Common write dao 通过一些附加功能实现常见的写入相关代码,例如乐观/悲观锁定、单元覆盖/合并检查保存、更新、删除时实体(不)存在等。

这个 ORM 是为了我们的内部目的而开发的,我已经够我的脖子了,因此还不能做一些文档。但如果您有兴趣,请告诉我,我优先腾出时间来编写文档。

I am not entirely sure whether you are referring to fetching of information from HBase or not. I am assuming that you want to build a aggregational application which does 'sum', 'count', 'avg' etc. data mining like operations on data stored in HBase to generate graphs/visualizations.

In that case specific answer would depend on the nature of data you are trying to analyze. One such application would http://opentsdb.net from StumpleUpon.

Its pretty easy to write data summarizers on HBase, as it can be achieved through MapReduce.
http://hbase.apache.org/docs/r0.89.20100726/apidocs/org/apache/hadoop/hbase/mapred/package-summary.html

In our organization we are using Solr to perform queries and aggregation functions for financial reports and then we are storing them in a CMS for rendering. Thus allowing us to customize rendering for same dataset. If you are interested in storing it into a CMS on HBase+Solr the followings will be interesting -

And if you are looking to access your data just as accessing a persistent storage and interested in an ORM you may the following then relevant else please ignore it. The following is copied from - Java ORM for Hbase Another answer by me.

The strength of HBase as I see it is in keeping dynamic columns into static column families. From my experience developing applications with HBase I find that it is not as easy as SQL to determine cell qualifiers and values.

For example, a book as many authors, depending on your access patterns, author edits, app-layer cache implementation you might want to choose to save whole author in the book table (that is author resides in 2 table, author table and book table) or just the author id. Further more the collection of author can be saved into one cell as XML/JSON or individual cells for individual authors.

With this understanding I concluded writing a full-blown ORM such as Hibernate will not only be very difficult might not actually be conclusive. So I took a different approach, much more like as iBatis is to Hibernate.

Let me try to explain how it works. For this I will use source codes from here and here.

  1. The first and foremost task is to implement a ObjectRowConverter interface, in this case SessionDataObjectConverter. The abstract class encapsulates basic best practices as discussed and learnt from the HBase community. The extension basically gives you 100% control on how to convert your object to HBase row and vice-versa. For this only restriction from the API is that your domain objects must implement the interface PersistentDTO which is used internally to create Put, Delete, do byte[] to id object and vice versa.
  2. Next task is to wire the dependencies as done in HBaseImplModule. Please let me know if you interested I will go through the dependency injections.

And thats it. How they are used are available here. It basically uses CommonReadDao, CommonWriteDao to read and write data to and from HBase. The common read dao implements multithreaded row to object conversion on queries, multithreaded get by ids, get by id and has its Hibernate Criteria like API to query to HBase via Scan (no aggregation functions available). Common write dao implements common write related code with some added facilities, such as optimistic/pessimistic locking, cell override/merge checking entity (non)-existence on save, update, delete etc.

This ORM has been developed for our internal purpose and I have been upto my neck and hence can not yet do some documentation. But if you are interested let me know and I will make time for documentation with priority.

坠似风落 2024-10-10 04:37:33

Check out metatron discovery: https://github.com/metatron-app/metatron-discovery. They use Druid and Hive for their OLAP & data store. It's an open source so that you can check their code. It might be helpful.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文