hbase快速查询

发布于 2024-09-25 04:53:05 字数 408 浏览 6 评论 0原文

我在这里运行一个小测试/poc。

我每天需要将几百万行加载到数据库中。它不是日志文件数据,我有逗号分隔的行(列),它完全适合关系数据库。

加载后,我需要允许非常快速的搜索机制。稍微了解一下 Google 对 bigtable 的实现及其周围的结构,我最初想到的是使用与 hbase 集成的 hive。 Hive 因为它的查询功能。加载效果很好,比 RDBMS 性能更好。然而,查询瓶颈(这正是我们首先寻找 RDBMS 替代方案的原因)在 Hive 中仍然存在。

测试 hive 的查询并不是真正出色的性能。也许我需要寻找替代方案..

还有其他东西吗?我可以在 hbase 上放置任何其他工具/解决方案/库吗?或者甚至没有 hbase ? (我将 hbase 视为 RDBMS 的替代方案,转向分布式计算)

请提出建议...

I am running a little test/poc here.

I need to load a few million rows every day into a database. And it's not log file data, I have comma delimited rows (of columns) which would exactly fit a relational database.

After the loading, I need to allow a very fast search mechanism. Looking a bit at Google's implementation of bigtable and structure around it, I originally thought of using hive integrated with hbase. Hive because of its querying capabilities. The loading works out fine, better than RDBMS perf. However, the querying bottleneck, which was the reason to look for alternatives to RDBMS in the first place, continues with hive too.

Testing hive for querying is not really blazing performance. Perhaps I need to look for alternatives..

Is there something else ? any other tool/solution/library that I can put on top of hbase ? or even without hbase ? (I looked at hbase as an alternative to the RDBMS, moving towards dist computing)

Suggestions please...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

奢望 2024-10-02 04:53:05

如果您需要一般搜索功能,您可能需要查看诸如 SolrElasticSearch 代替。如果您为所需的查询(关键设计)而不是一般搜索准备数据,HBase 效果很好。您还可以查看Lily,它结合了Solr和HBase

If you want general search capabilities you may want to look at solutions like Solr or ElasticSearch instead. HBase works well if you prepare the data for the queries you need (key design) not for general search. Also you can look at Lily which combines Solr and HBase

剪不断理还乱 2024-10-02 04:53:05

您遇到的问题是 hive 将其大部分查询作为本质上很慢的 MapReduce 程序运行。

如果您编写自己的程序来运行适当的扫描,然后自己对其进行分组,那么 hbase 会非常快。如果您想要一种查询语言,尽管目前我不知道有任何解决方案。

很难说更多,因为您对数据的描述以及您想要在其上运行的查询类型非常通用。

The problem you have is that hive runs most of its queries as mapreduce programs which are inherently slow.

If you write your own program to run appropriate scans and then group it yourself, hbase can be very fast. If you want a query language though there are currently no solutions I am aware of.

It's hard to say more than that as your description of the data and the kind of queries you want to run on it is very generic.

原野 2024-10-02 04:53:05

对于如此数量的行使用 MySQL 并不是不可想象的。您可以尝试使用一些测试数据,看看是否可以逃脱惩罚。

This isn't unthinkable to use MySQL for this number of rows. You might try it with some test data and see if you can get get away with it.

醉南桥 2024-10-02 04:53:05

您看过 s​​olr 或 lucene 类型的解决方案吗?它不是 SQL 解决方案,但查询语言对于某些类型的使用非常灵活,而且速度非常快。还有一些方法可以将其分布在服务器集群上以提高性能,扩展索引的大小或它可以处理的查询数量,或两者兼而有之。

Have you looked at a solr or lucene type solution? It is not an SQL solution, but the query language is pretty flexible for some types of uses, and it is extremely fast. There are also ways of distributing it over a cluster of servers for improved performance, scaling either the size of the index, or the number of queries it can handle, or both.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文