对 Solr 和超大型数据集空间数据结构的看法
好的,我会尽量保持简单。问题的技术方面是,没问题...
我正在寻找有关 Solr 数据库索引器的一些意见。我有大约 2500 万个项目需要存储在某种空间结构中。这些都是 GIS 记录,坐标存储为纬度/经度。我的一位同事一直在使用 Solr 来完成此任务,但我希望针对几种空间树类型进行一些性能测试。我们目前使用 mySQL 来存储所有数据,使用 Solr 进行索引,我正在使用 PostGIS 来研究 Postgres。我假设 Solr 使用某种多维 B 树,但我可能是错的,并且正在寻找对此的确认。
我还将使用...对此数据集运行测试
- B 树
- B 树(带有莫顿哈希索引)
- B 树(带有希尔伯特哈希索引)
- BSP 树
- 四叉树
- KD-树
- R-Tree
我将为每个测试编写一个 Apache 模块或用 C++ 编写 MySQL/PostgreSQL 扩展,并在我们做出决定后进行实施。
我只是在寻找有关上述实施的意见/建议。
谢谢
Ok so I'll try to keep this simple. The technical aspects of the problem are, no problem...
I'm looking for some opinions on Solr database indexer. I have 25 million approx items that need to be stored in some kind of spatial structure. These are all GIS records with coordinates stored as lat/lon. One of my co-workers has been using Solr for this task but I'm looking to do some performance tests against several spatial tree types. We're currently using mySQL for storing all of our data, Solr for indexing and I'm investigating Postgres with PostGIS. I'm assuming Solr is using some kind of multi-dimensional B-Tree, but I could be wrong and am looking for confirmation on this.
I'm also going to be running tests on this dataset using...
- B-Tree
- B-Tree (with Morton Hash Index)
- B-Tree (with Hilbert Hash Index)
- BSP-Tree
- Quad-Tree
- KD-Tree
- R-Tree
I'm going to be writing either an Apache module or MySQL/PostgreSQL extension in C++ for each of these tests, and following implementation once we make our decision.
I'm just looking for opinions/suggestions on said implementation.
Thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论