MongoDB 子文档数组上的多个索引与单个索引?

发布于 2024-12-11 06:13:09 字数 2042 浏览 0 评论 0原文

想知道哪种技术对我需要跟踪的文档的各种时间戳进行索引是更有效的技术,请记住我的应用程序在写入方面相当繁重,但在阅读方面又足够繁重,如果没有索引,查询就会太慢。

是为每个时间戳设置一个字段并为每个字段建立索引,还是将时间戳及其关联类型存储在数组字段中,并为该数组的每个字段建立索引?

第一个选项,单独的字段,每个字段都有一个索引:

{
    "_id" : "...",
    "Field1.Timestamp" : '2011-01-01 01:00.000',
    "Field2.Timestamp" : '2011-01-01 01:00.000',
    "Field3.Timestamp" : '2011-01-01 01:00.000',
    "Field4.Timestamp" : '2011-01-01 01:00.000',
    "Field5.Timestamp" : '2011-01-01 01:00.000',
    "Field6.Timestamp" : '2011-01-01 01:00.000',
    "Field7.Timestamp" : '2011-01-01 01:00.000',
    "Field8.Timestamp" : '2011-01-01 01:00.000',
    "Field9.Timestamp" : '2011-01-01 01:00.000',
}

db.mycollection.ensureIndex({ "Field1.Timestamp" : 1 });
db.mycollection.ensureIndex({ "Field2.Timestamp" : 1 });
db.mycollection.ensureIndex({ "Field3.Timestamp" : 1 });
db.mycollection.ensureIndex({ "Field4.Timestamp" : 1 });
db.mycollection.ensureIndex({ "Field5.Timestamp" : 1 });
db.mycollection.ensureIndex({ "Field6.Timestamp" : 1 });
db.mycollection.ensureIndex({ "Field7.Timestamp" : 1 });
db.mycollection.ensureIndex({ "Field8.Timestamp" : 1 });
db.mycollection.ensureIndex({ "Field9.Timestamp" : 1 });

然后有一个时间戳及其状态的数组,只有一个索引

{
    "_id" : "...",
    "Timestamps" : [
        { "Type" : "Field1", "Timestamp" : '2011-01-01  01:00.000' },
        { "Type" : "Field2", "Timestamp" : '2011-01-01  01:00.000' },
        { "Type" : "Field3", "Timestamp" : '2011-01-01  01:00.000' },
        { "Type" : "Field4", "Timestamp" : '2011-01-01  01:00.000' },
        { "Type" : "Field5", "Timestamp" : '2011-01-01  01:00.000' },
        { "Type" : "Field6", "Timestamp" : '2011-01-01  01:00.000' },
        { "Type" : "Field7", "Timestamp" : '2011-01-01  01:00.000' },
        { "Type" : "Field8", "Timestamp" : '2011-01-01  01:00.000' },
        { "Type" : "Field9", "Timestamp" : '2011-01-01  01:00.000' },
    ]
}

db.mycollection.ensureIndex({ "Timestamps.Type" : 1, "Timestamps.Timestamp" : 1 });

我在这里离题了吗?或者哪一个是更好的方法

Wondering which would be the more efficient technique for indexing my document's various timestamps that I need to keep track of, keeping in mind my application is fairly heavy on writing, but heavy enough on reading that without the indexes, the queries are too slow.

Is it better to have a field for each timestamp, and index each field, or store the timestamps and their associated type in an array field, and index each field of that array?

First option, separate fields, and an index for each:

{
    "_id" : "...",
    "Field1.Timestamp" : '2011-01-01 01:00.000',
    "Field2.Timestamp" : '2011-01-01 01:00.000',
    "Field3.Timestamp" : '2011-01-01 01:00.000',
    "Field4.Timestamp" : '2011-01-01 01:00.000',
    "Field5.Timestamp" : '2011-01-01 01:00.000',
    "Field6.Timestamp" : '2011-01-01 01:00.000',
    "Field7.Timestamp" : '2011-01-01 01:00.000',
    "Field8.Timestamp" : '2011-01-01 01:00.000',
    "Field9.Timestamp" : '2011-01-01 01:00.000',
}

db.mycollection.ensureIndex({ "Field1.Timestamp" : 1 });
db.mycollection.ensureIndex({ "Field2.Timestamp" : 1 });
db.mycollection.ensureIndex({ "Field3.Timestamp" : 1 });
db.mycollection.ensureIndex({ "Field4.Timestamp" : 1 });
db.mycollection.ensureIndex({ "Field5.Timestamp" : 1 });
db.mycollection.ensureIndex({ "Field6.Timestamp" : 1 });
db.mycollection.ensureIndex({ "Field7.Timestamp" : 1 });
db.mycollection.ensureIndex({ "Field8.Timestamp" : 1 });
db.mycollection.ensureIndex({ "Field9.Timestamp" : 1 });

Then there's an array of the timestamps and their status, with only a single index

{
    "_id" : "...",
    "Timestamps" : [
        { "Type" : "Field1", "Timestamp" : '2011-01-01  01:00.000' },
        { "Type" : "Field2", "Timestamp" : '2011-01-01  01:00.000' },
        { "Type" : "Field3", "Timestamp" : '2011-01-01  01:00.000' },
        { "Type" : "Field4", "Timestamp" : '2011-01-01  01:00.000' },
        { "Type" : "Field5", "Timestamp" : '2011-01-01  01:00.000' },
        { "Type" : "Field6", "Timestamp" : '2011-01-01  01:00.000' },
        { "Type" : "Field7", "Timestamp" : '2011-01-01  01:00.000' },
        { "Type" : "Field8", "Timestamp" : '2011-01-01  01:00.000' },
        { "Type" : "Field9", "Timestamp" : '2011-01-01  01:00.000' },
    ]
}

db.mycollection.ensureIndex({ "Timestamps.Type" : 1, "Timestamps.Timestamp" : 1 });

Am I way off the mark here? or which would be the better way

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

沧桑㈠ 2024-12-18 06:13:09

这基本上可以归结为,10 个大小为 N 的索引是否比一个大小为 N * 10 的索引更有效。如果您纯粹考虑读取,那么单独的索引应该总是更快。相关的 b 树遍历将检查较小的键集等。

不过,有几点需要考虑:

  • 数组字段上的索引基本上单独索引每个数组元素。因此,在 B 树遍历期间,查找开销最多将是 1-2 个额外步骤,这对性能的影响可以忽略不计。换句话说,它们的速度几乎一样快。
  • 拥有 10 个索引可能意味着每次更新/插入都需要更新多个索引(取决于您的索引是否共享一个字段或者您是否一次更新超过 1 个时间戳)。这是一个重要的性能考虑因素。
  • 使用数组索引可以更轻松地添加附加时间戳(例如 Timestamp10)。
  • 每个数据库可以使用的命名空间数量 (24k) 是有限制的,每个索引占用一个。如果您为每个字段创建单独的索引,这可能会成为一个问题。
  • 最重要的是,数组索引更加简单,并且将简化您的代码,从而简化可维护性。考虑到有限的性能差异,我想说这是使用数组索引的最强烈动机。

This basically boils down to if 10 index of size N are more efficient than one index of size N * 10. If you purely look at reads then the seperate indexes should always be faster. The associated b-tree walks will examine a smaller keyset etc.

There are a couple of points to consider though :

  • Indexes on array fields basically index each array element seperately. As such the lookup overhead will at most be 1-2 additional steps during the b-tree walk which is a negligible performance hit. In other words, they'll be almost as fast.
  • Having 10 indexes may mean each update/insert will require more than one index to be updated (depending on if your indexes share a field or if you update more than 1 timestamp at a time). This is a significant performance consideration.
  • Using an array index makes it a bit easier to add additional timestamps (e.g. Timestamp10).
  • There is a limit to the number of namespaces you can use per database (24k) and each index takes up one. If you make a seperate index per field this might become an issue.
  • Most importantly, the array index is way more straightforward and will simplify your code and thus maintainability. Given the limited performance differences I'd say this is the strongest motivation to go for an array index here.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文