使用OpenSearch Python Bulk API将数据插入多个索引

发布于 2025-02-07 14:09:22 字数 1441 浏览 0 评论 0 原文

本文档显示了如何使用curl中的POST请求插入多个索引的批量数据:

如果我有这种格式的数据,则

[
{ "index": { "_index": "index-2022-06-08", "_id": "<id>" } }
{ "A JSON": "document" }
{ "index": { "_index": "index-2022-06-09", "_id": "<id>" } }
{ "A JSON": "document" }
{ "index": { "_index": "index-2022-06-10", "_id": "<id>" } }
{ "A JSON": "document" }
]

批量请求应从“ _ index”中获取索引名称:“ Index-2022-06-- 08“

我试图使用opensearch-py库进行相同的操作,但是我找不到任何示例代码段这样做。我正在使用这种格式发送AWS Lambda的请求。

client = OpenSearch(
            hosts = [{'host': host, 'port': 443}],
            http_auth = awsauth,
            use_ssl = True,
            verify_certs = True,
            connection_class = RequestsHttpConnection
            )
        
        resp = helpers.bulk(client, logs, index= index_name, max_retries = 3)

在这里,我已经提到index_name作为批量请求中的参数,因此它不会从数据本身中获取index_name。如果我不提及参数中的index_name,则会丢失错误4xx index_name。

我还在研究批量API源代码: https://github.com/opensearch-project/opensearch-py/blob/main/main/opensearchpy/helpers/actions.py#l373

看来Index_name似乎不是强制性参数。

谁能帮助我,我想念什么?

This document shows how bulk data with multiple index can be inserted using POST request in curl: https://opensearch.org/docs/latest/opensearch/index-data/

If I have data in this format,

[
{ "index": { "_index": "index-2022-06-08", "_id": "<id>" } }
{ "A JSON": "document" }
{ "index": { "_index": "index-2022-06-09", "_id": "<id>" } }
{ "A JSON": "document" }
{ "index": { "_index": "index-2022-06-10", "_id": "<id>" } }
{ "A JSON": "document" }
]

Bulk request should take the index name from "_index": "index-2022-06-08"

I was trying to use OpenSearch-py library to do the same but I can't find any example snippet does that. I am using this format to send request from AWS Lambda.

client = OpenSearch(
            hosts = [{'host': host, 'port': 443}],
            http_auth = awsauth,
            use_ssl = True,
            verify_certs = True,
            connection_class = RequestsHttpConnection
            )
        
        resp = helpers.bulk(client, logs, index= index_name, max_retries = 3)

Here, I've to mention index_name as a parameter in bulk request so it's not taking index_name from data itself. If I don't mention index_name in parameter, I get error 4xx index_name missing.

I was also looking into bulk api source code: https://github.com/opensearch-project/opensearch-py/blob/main/opensearchpy/helpers/actions.py#L373

It doesn't look like index_name is a mandatory parameter.

Can anyone help me with what am I missing?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

霓裳挽歌倾城醉 2025-02-14 14:09:22

我遇到了同一问题,并在。当在 _Source-structure 中提供文档时,搜索 - 端点返回其工作。

批量方法的呼叫:

resp = helpers.bulk(
    self.opensearch,
    actions,
    max_retries=3,
)

其中操作是这样的字典列表:

[{
    '_op_type': 'update',
    '_index': 'index-name',
    '_id': 42,
    '_source': {
        "title": "Hello World!",
        "body": "..."
    }
}]

_OP_TYPE 可以用作定义操作的附加字段( index ,<代码>更新 delete ,...)应为文档调用。

希望这有助于遇到同一问题的任何人!

I came across the same issue and found the solution in the elasticsearch.py bulk-helpers documentation. When the documents are provided in the _source-structure that the search-endpoint returns it works.

Call of the bulk-method:

resp = helpers.bulk(
    self.opensearch,
    actions,
    max_retries=3,
)

Where actions is a list of dictionaries like this:

[{
    '_op_type': 'update',
    '_index': 'index-name',
    '_id': 42,
    '_source': {
        "title": "Hello World!",
        "body": "..."
    }
}]

_op_type can be used as an additional field to define the action(index, update, delete,...) that should be invoked for the document.

Hope this helps anyone coming across the same issue!

貪欢 2025-02-14 14:09:22

使用以下代码希望您可以使用批量方法索引,有两种方法用于索引文档在OpenSearch中

from opensearchpy import OpenSearch, helpers
from opensearchpy.helpers import bulk

client = OpenSearch(
    hosts=[{"host": "localhost", "port": 9200}],
    http_auth=("admin", "admin"),
    use_ssl=True,
    verify_certs=False,
    ssl_assert_hostname=False,
    ssl_show_warn=False,
)

inputtobeindexed = [
    {"index": {"_index": "index-2022-06-08", "_id": "<id>"}},
    {"A JSON": "document"},
    {"index": {"_index": "index-2022-06-09", "_id": "<id>"}},
    {"A JSON": "document"},
    {"index": {"_index": "index-2022-06-10", "_id": "<id>"}},
    {"A JSON": "document"},
]

search_index_name = "yourindexname"
bulk_data = [
    {"_index": search_index_name, "_id": i, "_source": doc} for i, doc in enumerate()
]

bulk(client, bulk_data)

Using the below code hope you can index using bulk method, there are two methods for indexing the documents in the opensearch one of the method is bulk method

from opensearchpy import OpenSearch, helpers
from opensearchpy.helpers import bulk

client = OpenSearch(
    hosts=[{"host": "localhost", "port": 9200}],
    http_auth=("admin", "admin"),
    use_ssl=True,
    verify_certs=False,
    ssl_assert_hostname=False,
    ssl_show_warn=False,
)

inputtobeindexed = [
    {"index": {"_index": "index-2022-06-08", "_id": "<id>"}},
    {"A JSON": "document"},
    {"index": {"_index": "index-2022-06-09", "_id": "<id>"}},
    {"A JSON": "document"},
    {"index": {"_index": "index-2022-06-10", "_id": "<id>"}},
    {"A JSON": "document"},
]

search_index_name = "yourindexname"
bulk_data = [
    {"_index": search_index_name, "_id": i, "_source": doc} for i, doc in enumerate()
]

bulk(client, bulk_data)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文