文档数据库:冗余数据、引用等(特别是 MongoDB)

发布于 2024-09-27 20:48:29 字数 398 浏览 7 评论 0原文

我似乎遇到了很多情况,构建数据的适当方法是将其拆分为两个文档。假设这是一个连锁店,您要保存每个顾客访问过的商店。商店和客户需要是独立的数据,因为它们与许多其他事物交互,但我们确实需要将它们关联起来。

因此,简单的答案是将用户的 Id 存储在商店文档中,或者将商店的 Id 存储在用户的文档中。但很多时候,您想要访问 1-2 个其他数据以用于显示目的,因为 Id 没有用。也许是客户名称或商店名称。

  1. 您通常会存储整个文档的副本吗?或者只存储您需要的数据?也许取决于文档的大小与您需要的量。
  2. 您如何处理有重复数据的事实?当数据发生变化时,你会去寻找数据吗?加载时每隔一定时间更新数据?仅当您负担得起陈旧数据时才进行复制?

感谢您的意见和/或任何类型“最佳实践”的链接或至少对这些主题的合理讨论。

It seems like I run into lots of situations where the appropriate way to build my data is to split it into two documents. Let's say it was for a chain of stores and you were saving which stores each customer had visited. Stores and Customers need to be independent pieces of data because they interact with plenty of other things, but we do need to relate them.

So the easy answer is to store the user's Id in the store document, or the store's Id in the user's document. Often times though, you want to access 1-2 other pieces of data for display purposes because Id's aren't useful. Like maybe the customer name, or the store name.

  1. Do you typically store a duplicate of the entire document? Or just store the pieces of data you need? Maybe depends on the size of the doc vs how much of it you need.
  2. How do you handle the fact that you have duplicate data? Do you go hunt down data when it changes? Update the data at some interval when it's loaded? Only duplicate when you can afford stale data?

Would appreciate your input and/or links to any kind of 'best practices' or at least well-reasoned discussion of these topics.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

野侃 2024-10-04 20:48:29

基本上有两种情况:新鲜陈旧

新鲜数据

存储重复数据很容易。维护重复数据是困难的部分。因此,最简单的方法就是避免维护,只需从一开始就不存储任何重复数据即可。如果您需要新鲜数据,这主要有用。仅存储引用,并在需要检索信息时查询集合。

在这种情况下,由于额外的查询,您将产生一些开销。另一种方法是跟踪重复数据的所有位置,并在每次更新时更新所有实例。这也涉及开销,特别是在像您提到的那样的 N 到 M 关系中。因此,无论哪种方式,如果您需要新数据,您都会产生一些开销。你不可能两全其美。

过时的数据

如果您有能力拥有过时的数据,事情就会变得容易得多。为了避免查询开销,您可以存储重复数据。为了避免维护重复数据,您不会存储重复数据。至少不是主动

在这种情况下,您还需要仅存储文档之间的引用。然后使用定期的映射缩减作业来生成重复数据。然后,您可以查询单个 Map-Reduce 结果,而不是单独的集合。这样您就可以避免查询开销,但也不必追踪数据更改。

摘要

只存储对其他文档的引用。如果您能够承受过时的数据,请使用定期的 Map-Reduce 作业来生成重复数据。避免维护重复数据;它很复杂并且容易出错。

There are basically two scenario's: fresh and stale.

Fresh data

Storing duplicate data is easy. Maintaining the duplicate data is the hard part. So the easiest thing to do is to avoid maintenance, by simply not storing any duplicate data to begin with. This is mainly useful if you need fresh data. Only store the references, and query the collections when you need to retrieve information.

In this scenario, you'll have some overhead due to the extra queries. The alternative is to track all locations of duplicate data, and update all instances on each update. This also involves overhead, especially in N-to-M relations like the one you mentioned. So either way, you will have some overhead, if you require fresh data. You can't have the best of both worlds.

Stale data

If you can afford to have stale data, things get a lot easier. To avoid query overhead, you can store duplicate data. To avoid having to maintain duplicate data, you're not going to store duplicate data. At least not actively.

In this scenario you'll also want to store only the references between documents. Then use a periodic map-reduce job to generate the duplicate data. You can then query the single map-reduce result, rather than separate collections. This way you avoid the query overhead, but you also don't have to hunt down data changes.

Summary

Only store references to other documents. If you can afford stale data, use periodic map-reduce jobs to generate duplicate data. Avoid maintaining duplicate data; it's complex and error-prone.

ι不睡觉的鱼゛ 2024-10-04 20:48:29

这里的答案实际上取决于您需要数据的最新程度。

@Niels 这里有一个很好的总结,但我认为公平地说你可以“作弊”。

假设您想要显示用户使用的商店。这里明显的问题是,您无法将商店“嵌入”用户中,因为商店本身太重要了。但您可以做的是将一些存储数据嵌入到用户中。

只需使用您想要显示的内容,例如“商店名称”。因此,您的 User 对象将如下所示:

{
  _id : MongoID(),
  name : "Testy Tester",
  stores : [ 
             { _id : MongoID(), "name" : 'Safeway' },
             { _id : MongoID(), "name" : 'Walmart' },
             { _id : MongoID(), "name" : 'Best Buy' }
            ]
}

通过这种方式,您可以显示典型的“网格”视图,但需要一个链接来获取有关商店的更多数据。

The answer here really depends on how current you need your data to be.

@Niels has a good summary here, but I think it's fair to note that you can "cheat".

Let's say that you want to display the Stores used by a User. The obvious problem here is that you can't "embed" the Store inside the User b/c the Store is too important on its own. But what you can do is embed some Store data in the User.

Just use the stuff you want for display like "Store Name". So your User object would look like this:

{
  _id : MongoID(),
  name : "Testy Tester",
  stores : [ 
             { _id : MongoID(), "name" : 'Safeway' },
             { _id : MongoID(), "name" : 'Walmart' },
             { _id : MongoID(), "name" : 'Best Buy' }
            ]
}

This way you can display the typical "grid" view, but require a link to get more data about the store.

述情 2024-10-04 20:48:29

回答您的直接问题:

  1. 没有重复。
  2. 没有重复项。

;)

您应该拥有的唯一重复项是“简单”值,例如权重(它们可能碰巧相同,但在时间或空间上单独存储并没有更有效)和引用另一个对象的 id(它们是重复值,但比它们替换的重复对象数据小得多且更易于管理)。

现在,回答您的场景:您想要的是多对多关系。这里通常的解决方案是创建第三个“直通”或“桥接”表/集合,可能称为 StoreUsers:

StoreUsers
----------
storeuser_id
store_id
user_id

您可以为商店和用户之间的每个链接添加一条记录,无论是针对不同的商店、不同的用户还是一家商店中有一群用户。然后,您可以为商店或用户独立查找此信息。 MongoDB 也提倡这种方法;它不是特定于 RDBMS 的。

To answer your direct questions:

  1. No duplicates.
  2. No duplicates.

;)

The only duplicates you should ever have are "simple" values like weights (which may happen to be the same, but aren't any more efficient in either time or space to store separately), and ids referencing another object (which are duplicate values, but much smaller and more manageable than the duplicate object data they replace).

Now, to answer your scenario: what you want is a Many-to-Many relationship. The usual solution here is to make a third "through" or "bridge" table/collection, probably called StoreUsers:

StoreUsers
----------
storeuser_id
store_id
user_id

You add a record to this for each link between stores and users, whether it's for a different store, a different user, or a bunch of users in one store. You can then look this up independently, for either the Store, or the User. MongoDB advocates this approach too; it's not RDBMS-specific.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文