OpenSearch / ElasticSearch 索引映射

发布于 2025-01-12 15:14:29 字数 1057 浏览 1 评论 0原文

我有一个系统,可以获取事件的多个分数,并且我们使用 opensearch(以前的弹性搜索)来获取平均值。

例如,输入类似于:

// event 1
{
  id: "foo1",
  timestamp: "some-iso8601-timestamp",
  scores: [
    { name: "arbitrary-name-1", value: 80 },
    { name: "arbitrary-name-2", value: 55 },
    { name: "arbitrary-name-3", value: 30 },
  ]
}

// event 2
{
  id: "foo2",
  timestamp: "some-iso8601-timestamp",
  scores: [
    { name: "arbitrary-name-1", value: 90 },
    { name: "arbitrary-name-2", value: 65 },
    { name: "arbitrary-name-3", value: 40 },
  ]
}

分数名称是任意的,并且可能会不时更改。

我们最终希望查询数据以获得平均分数值:

[
  { name: "arbitrary-name-1", value: 85 },
  { name: "arbitrary-name-2", value: 60 },
  { name: "arbitrary-name-3", value: 35 },
]

但是,到目前为止,我们能够实现这一目标的唯一方法是插入多个文档,每个文档对应每个事件中的每个分数名称/值对。这看起来很浪费。当前的搜索是按分数名称和时间戳间隔对文档进行分组,然后对每个桶中的分数进行加权平均。

有没有一种方法可以插入数据以允许这种查询模式发生,只需将每个事件/记录一个文档添加到 opensearch 中(而不是每个事件/记录每个分数一个文档)?看起来怎么样?

谢谢!

I have a system that ingests multiple scores for events and we use opensearch (previously elastic search) for getting the averages.

For example, an input would be similar to:

// event 1
{
  id: "foo1",
  timestamp: "some-iso8601-timestamp",
  scores: [
    { name: "arbitrary-name-1", value: 80 },
    { name: "arbitrary-name-2", value: 55 },
    { name: "arbitrary-name-3", value: 30 },
  ]
}

// event 2
{
  id: "foo2",
  timestamp: "some-iso8601-timestamp",
  scores: [
    { name: "arbitrary-name-1", value: 90 },
    { name: "arbitrary-name-2", value: 65 },
    { name: "arbitrary-name-3", value: 40 },
  ]
}

The score name are arbitrary and subject to change from time to time.

We ultimately would like to query the data to get the average scores values:

[
  { name: "arbitrary-name-1", value: 85 },
  { name: "arbitrary-name-2", value: 60 },
  { name: "arbitrary-name-3", value: 35 },
]

However, the only way we have been able to achieve this so far has been to insert multiple documents, one for each score name/value pair in each event. This seems wasteful. The search in place currently is to group the documents by score name and timestamp intervals, then to perform a weighted average of the scores in each bucket.

Is there a way the data can be inserted to allow this query pattern to take place by only adding one document into opensearch per event/record (rather than one document per score per event/record)? How might that look?

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

⒈起吃苦の倖褔 2025-01-19 15:14:29

这是你想要做的吗?
我有点困惑。 ^^

DELETE /71397606

PUT /71397606
{
  "mappings": {
    "properties": {
      "id": {
        "type": "text"
      },
      "scores": {
        "type": "nested",
        "properties": {
          "name": {
            "type": "keyword"
          },
          "value": {
            "type": "long"
          }
        }
      },
      "timestamp": {
        "type": "text"
      }
    }
  }
}

POST /_bulk
{"index":{"_index":"71397606"}}
{"id":"foo1","timestamp":"some-iso8601-timestamp","scores":[{"name":"arbitrary-name-1","value":80},{"name":"arbitrary-name-2","value":55},{"name":"arbitrary-name-3","value":30}]}
{"index":{"_index":"71397606"}}
{"id":"foo2","timestamp":"some-iso8601-timestamp","scores":[{"name":"arbitrary-name-1","value":90},{"name":"arbitrary-name-2","value":65},{"name":"arbitrary-name-3","value":40}]}
{"index":{"_index":"71397606"}}
{"id":"foo2","timestamp":"some-iso8601-timestamp","scores":[{"name":"arbitrary-name-1","value":85},{"name":"arbitrary-name-x","value":65},{"name":"arbitrary-name-y","value":40}]}

GET /71397606/_search
{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggs": {
    "nested": {
      "nested": {
        "path": "scores"
      },
      "aggs": {
        "pername": {
          "terms": {
            "field": "scores.name",
            "size": 10
          },
          "aggs": {
            "avg": {
              "avg": {
                "field": "scores.value"
              }
            }
          }
        }
      }
    }
  }
}

PS:
如果没有你能举个例子吗?

Is it what you were trying to do ?
I got a bit confused. ^^

DELETE /71397606

PUT /71397606
{
  "mappings": {
    "properties": {
      "id": {
        "type": "text"
      },
      "scores": {
        "type": "nested",
        "properties": {
          "name": {
            "type": "keyword"
          },
          "value": {
            "type": "long"
          }
        }
      },
      "timestamp": {
        "type": "text"
      }
    }
  }
}

POST /_bulk
{"index":{"_index":"71397606"}}
{"id":"foo1","timestamp":"some-iso8601-timestamp","scores":[{"name":"arbitrary-name-1","value":80},{"name":"arbitrary-name-2","value":55},{"name":"arbitrary-name-3","value":30}]}
{"index":{"_index":"71397606"}}
{"id":"foo2","timestamp":"some-iso8601-timestamp","scores":[{"name":"arbitrary-name-1","value":90},{"name":"arbitrary-name-2","value":65},{"name":"arbitrary-name-3","value":40}]}
{"index":{"_index":"71397606"}}
{"id":"foo2","timestamp":"some-iso8601-timestamp","scores":[{"name":"arbitrary-name-1","value":85},{"name":"arbitrary-name-x","value":65},{"name":"arbitrary-name-y","value":40}]}

GET /71397606/_search
{
  "size": 0,
  "query": {
    "match_all": {}
  },
  "aggs": {
    "nested": {
      "nested": {
        "path": "scores"
      },
      "aggs": {
        "pername": {
          "terms": {
            "field": "scores.name",
            "size": 10
          },
          "aggs": {
            "avg": {
              "avg": {
                "field": "scores.value"
              }
            }
          }
        }
      }
    }
  }
}

ps:
If not could you give an example ?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文