MongoDB 对于层次类型查询的适用性

发布于 2024-10-30 23:54:57 字数 748 浏览 4 评论 0原文

我有一个特定的数据操作要求,我已经研究出了如何在 SQL Server 和 PostgreSQL 中实现。然而,我对速度不太满意,所以我正在研究 MongoDB。

描述查询的最佳方式如下。想象一下美国的分层数据:国家、州、县、城市。假设某个特定供应商可以为整个加利福尼亚州提供服务。另一个也许只能为洛杉矶提供服务。可能有数十万个供应商,他们都可以从该层次结构中的某个点提供服务。我不会将其与 Geo 混淆 - 我用它来说明需求。

使用递归查询,可以非常简单地获取可以为特定用户提供服务的所有供应商的列表。如果他位于加利福尼亚州洛杉矶的帕萨迪纳,我们将沿着层次结构向上查找适用的 ID,然后向下查询以查找供应商。

我知道这可以优化。再次强调,这只是一个简单的查询示例。

我知道 MongoDB 是一个文档存储。这非常适合我的其他需求。问题是它有多适合我描述的查询类型? (我知道它没有连接 - 这些是模拟的)。

我知道这是一个“一根绳子有多长”的问题。我只是想知道是否有人有 MongoDB 做此类事情的经验。从 0 到测试可能需要相当长的时间,如果 MongoDB 不适合这样做,我希望节省时间。

示例

当地一家电影商店“A”可以在斯普林菲尔德提供蓝光光盘。一家在全州范围内分销的连锁店“B”可以向整个伊利诺伊州提供蓝光光盘。按需下载商店“C”可以向全美国供应。

如果我们想要获取伊利诺伊州斯普林菲尔德的所有适用电影供应商,答案将是 [A、B、C]。

换句话说,有许多供应商附加在层次结构的不同级别上。

I have a particular data manipulation requirement that I have worked out how to do in SQL Server and PostgreSQL. However, I'm not too happy with the speed, so I am investigating MongoDB.

The best way to describe the query is as follows. Picture the hierarchical data of the USA: Country, State, County, City. Let's say a particular vendor can service the whole of California. Another can perhaps service only Los Angeles. There are potentially hundreds of thousands of vendors and they all can service from some point(s) in this hierarchy down. I am not confusing this with Geo - I am using this to illustrate the need.

Using recursive queries, it is quite simple to get a list of all vendors who could service a particular user. If he were in say Pasadena, Los Angeles, California, we would walk up the hierarchy to get the applicable IDs, then query back down to find the vendors.

I know this can be optimized. Again, this is just a simple query example.

I know MongoDB is a document store. That suits other needs I have very well. The question is how well suited is it to the query type I describe? (I know it doesn't have joins - those are simulated).

I get that this is a "how long is a piece of string" question. I just want to know if anyone has any experience with MongoDB doing this sort of thing. It could take me quite some time to go from 0 to tested, and I'm looking to save time if MongoDB is not suited to this.

EXAMPLE

A local movie store "A" can supply Blu-Rays in Springfield. A chain store "B" with state-wide distribution can supply Blu-Rays to all of IL. And a download-on-demand store "C" can supply to all of the US.

If we wanted to get all applicable movie suppliers for Springfield, IL, the answer would be [A, B, C].

In other words, there are numerous vendors attached at differing levels on the hierarchy.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

故事灯 2024-11-06 23:54:57

我意识到这个问题是大约一年前提出的,但从那时起 MongoDB 就有了针对这个问题的官方支持的解决方案,我只是使用了他们的解决方案。请参阅此处的文档: https://docs .mongodb.com/manual/tutorial/model-tree-structs-with-materialized-paths/

与您的问题最接近的概念称为“部分路径”。

虽然嵌入祖先数据可能会感觉有点繁重;这种方法是解决 MongoDB 问题的最合适方法。到目前为止,我遇到的唯一陷阱是,如果您将所有这些内容存储在一个文档中,那么在处理足够的数据时,截至目前,您可能会达到 16MB 文档大小限制(尽管,只有当您使用此结构来跟踪用户推荐(可能达到数百万)而不是美国城市(根据最新的美国人口普查,超过 26,000 人)时,我才能看到这种情况发生。


参考文献:

http://www.mongodb.org/display /DOCS/Schema+Design

http://www.census。 gov/geo/www/gazetteer/places2k.html


修改:

替换链接:http://www.mongodb.org/display/DOCS/Trees+in+MongoDB

I realize this question was asked nearly a year ago, but since then MongoDB has an officially supported solution for this problem, and I just used their solution. Refer to their documentation here: https://docs.mongodb.com/manual/tutorial/model-tree-structures-with-materialized-paths/

The concept relating closest to your question is named "partial path."

While it may feel a bit heavy to embed ancestor data; this approach is the most suitable way to solve your problem in MongoDB. The only pitfall to this, that I've experienced so far, is that if you're storing all of this in a single document you can hit the, as of this time, 16MB document size limit when working with enough data (although, I can only see this happening if you're using this structure to track user referrals [which could reach millions] rather than US cities [which is upwards of 26,000 according to the latest US Census]).


References:

http://www.mongodb.org/display/DOCS/Schema+Design

http://www.census.gov/geo/www/gazetteer/places2k.html


Modifications:

Replaced link: http://www.mongodb.org/display/DOCS/Trees+in+MongoDB

苦妄 2024-11-06 23:54:57

请注意,这个问题也在 google group 上被问到。请参阅 http://groups.google.com/group/mongodb-user/ browser_thread/thread/5cd5edd549813148 进行讨论。

一种选择是使用数组键。您可以将层次结构存储为
值数组(例如 ['US','CA','Los Angeles'])。那么你可以
根据该数组键中的各个元素查询记录
例如:
首先,存储一些文档,其中的数组值代表
确保

> db.hierarchical.save({ location: ['US','CA','LA'], name: 'foo'} ) 
> db.hierarchical.save({ location: ['US','CA','SF'], name: 'bar'} ) 
> db.hierarchical.save({ location: ['US','MA','BOS'], name: 'baz'} ) 

我们在位置字段上有一个索引,以便我们可以执行
针对其值的快速查询

> db.hierarchical.ensureIndex({'location':1}) 

查找加利福尼亚州的所有记录

> db.hierarchical.find({location: 'CA'}) 
{ "_id" : ObjectId("4d9f69cbf88aea89d1492c55"), "location" : [ "US", "CA", "LA" ], "name" : "foo" } 
{ "_id" : ObjectId("4d9f69dcf88aea89d1492c56"), "location" : [ "US", "CA", "SF" ], "name" : "bar" } 

查找马萨诸塞州的所有记录

> db.hierarchical.find({location: 'MA'}) 
{ "_id" : ObjectId("4d9f6a21f88aea89d1492c5a"), "location" : [ "US", "MA", "BOS" ], "name" : "baz" } 

查找美国的所有记录

> db.hierarchical.find({location: 'US'}) 
{ "_id" : ObjectId("4d9f69cbf88aea89d1492c55"), "location" : [ "US", "CA", "LA" ], "name" : "foo" } 
{ "_id" : ObjectId("4d9f69dcf88aea89d1492c56"), "location" : [ "US", "CA", "SF" ], "name" : "bar" } 
{ "_id" : ObjectId("4d9f6a21f88aea89d1492c5a"), "location" : [ "US", "MA", "BOS" ], "name" : "baz" } 

请注意,在此模型中,数组中的值需要是
独特的。举例来说,如果您在不同的州有“斯普林菲尔德”,
那么你需要做一些额外的工作来区分。

> db.hierarchical.save({location:['US','MA','Springfield'], name: 'one' }) 
> db.hierarchical.save({location:['US','IL','Springfield'], name: 'two' }) 
> db.hierarchical.find({location: 'Springfield'}) 
{ "_id" : ObjectId("4d9f6b7cf88aea89d1492c5b"), "location" : [ "US", "MA", "Springfield"], "name" : "one" } 
{ "_id" : ObjectId("4d9f6b86f88aea89d1492c5c"), "location" : [ "US", "IL", "Springfield"], "name" : "two" } 

您可以通过使用 $all 运算符并指定更多来克服这个问题
层次结构的级别。例如:

> db.hierarchical.find({location: { $all : ['US','MA','Springfield']} }) 
{ "_id" : ObjectId("4d9f6b7cf88aea89d1492c5b"), "location" : [ "US", "MA", "Springfield"], "name" : "one" } 
> db.hierarchical.find({location: { $all : ['US','IL','Springfield']} }) 
{ "_id" : ObjectId("4d9f6b86f88aea89d1492c5c"), "location" : [ "US", "IL", "Springfield"], "name" : "two" } 

Note that this question was also asked on the google group. See http://groups.google.com/group/mongodb-user/browse_thread/thread/5cd5edd549813148 for that disucssion.

One option is to use an array key. You can store the hierarchy as an
array of values (for example ['US','CA','Los Angeles']). Then you can
query against records based on individual elements in that array key
For example:
First, store some documents with the array value representing the
hierarchy

> db.hierarchical.save({ location: ['US','CA','LA'], name: 'foo'} ) 
> db.hierarchical.save({ location: ['US','CA','SF'], name: 'bar'} ) 
> db.hierarchical.save({ location: ['US','MA','BOS'], name: 'baz'} ) 

Make sure we have an index on the location field so we can perform
fast queries against its values

> db.hierarchical.ensureIndex({'location':1}) 

Find all records in California

> db.hierarchical.find({location: 'CA'}) 
{ "_id" : ObjectId("4d9f69cbf88aea89d1492c55"), "location" : [ "US", "CA", "LA" ], "name" : "foo" } 
{ "_id" : ObjectId("4d9f69dcf88aea89d1492c56"), "location" : [ "US", "CA", "SF" ], "name" : "bar" } 

Find all records in Massachusetts

> db.hierarchical.find({location: 'MA'}) 
{ "_id" : ObjectId("4d9f6a21f88aea89d1492c5a"), "location" : [ "US", "MA", "BOS" ], "name" : "baz" } 

Find all records in the US

> db.hierarchical.find({location: 'US'}) 
{ "_id" : ObjectId("4d9f69cbf88aea89d1492c55"), "location" : [ "US", "CA", "LA" ], "name" : "foo" } 
{ "_id" : ObjectId("4d9f69dcf88aea89d1492c56"), "location" : [ "US", "CA", "SF" ], "name" : "bar" } 
{ "_id" : ObjectId("4d9f6a21f88aea89d1492c5a"), "location" : [ "US", "MA", "BOS" ], "name" : "baz" } 

Note that in this model, your values in the array would need to be
unique. So for example, if you had 'springfield' in different states,
then you would need to do some extra work to differentiate.

> db.hierarchical.save({location:['US','MA','Springfield'], name: 'one' }) 
> db.hierarchical.save({location:['US','IL','Springfield'], name: 'two' }) 
> db.hierarchical.find({location: 'Springfield'}) 
{ "_id" : ObjectId("4d9f6b7cf88aea89d1492c5b"), "location" : [ "US", "MA", "Springfield"], "name" : "one" } 
{ "_id" : ObjectId("4d9f6b86f88aea89d1492c5c"), "location" : [ "US", "IL", "Springfield"], "name" : "two" } 

You can overcome this by using the $all operator and specifying more
levels of the hierarchy. For example:

> db.hierarchical.find({location: { $all : ['US','MA','Springfield']} }) 
{ "_id" : ObjectId("4d9f6b7cf88aea89d1492c5b"), "location" : [ "US", "MA", "Springfield"], "name" : "one" } 
> db.hierarchical.find({location: { $all : ['US','IL','Springfield']} }) 
{ "_id" : ObjectId("4d9f6b86f88aea89d1492c5c"), "location" : [ "US", "IL", "Springfield"], "name" : "two" } 
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文