弹性搜索如何找到couchdb附件文件中的单词?

发布于 2024-11-30 07:02:51 字数 3312 浏览 2 评论 0原文

您好,请给我指示。/ 我正在使用elasticsearch 0.17.6和couchdb 1.1.0

我在couchdb上创建了两个文档: 每个文档都有字符串字段:名称、消息。第一个附加了文本文件“test.txt”,第二个则没有。 CouchDB 生成的 JSon 代码如下所示:

{
  "_id": "ID1",
  "_rev": "6-e1ab4c5c65b98e9a0d91e5c8fc1629bb",
  "name": "Document1",
  "message": "Evaluate Elastic Search",
  "_attachments":   {
     "test.txt": {
       "content_type": "text/plain",
       "revpos": 5,
       "digest": "md5-REzvAVEZoSV69SLI/vaflQ==",
       "length": 86,
       "stub": true
     }
  }
}

{

 "_id": "ID2",
 "_rev": "2-72142ec18248cedb4dba67305d136aa8",
 "name": "Document2",
 "message": "test Elastic Search"
}

这两个文档位于名为 my_test_couch_db 的数据库中,

我使用 Elasticsearch (ES) 使用插件:river 和 mapper-attachments 来索引这些文档。对于每个给定的文本,我希望 ES 不仅可以在文档字段中找到相应的文本,还可以在附件 *.txt 文件中找到相应的文本。但这是不可能的。我尝试了很多方法:我手动创建索引,映射(自动和手动),配置河流等,但ES只能找到文档字段中的单词,它无法找到*.txt附件文件中的单词。我按照网站 http://www.elasticsearch.org 的指示进行操作,但它也不起作用。

感谢您的回答。

这是我的命令:

curl -X PUT "localhost:9200/test_idx_1"

curl -X PUT "localhost:9200/test_idx_1/test_mapping_1/_mapping" -d '{
  "test_mapping_1": {
    "properties": {
      "_attachments": {
        "type": "attachment",
        "index": "yes"
      }
    }
  }
}'

curl -XPUT 'http://localhost:9200/_river/test_river_1/_meta' -d '{
  "type": "couchdb",
  "couchdb": {
    "host": "localhost",
    "port": 5984,
    "db": "my_test_couch_db",
    "filter": null
  },
  "index": {
    "index": "test_idx_1",
    "type": "test_mapping_1"
  }
}'

然后,我尝试搜索

curl -XPOST 'http://localhost:9200/my_test_couch_db/my_test_couch_db/_search'

(两个文档都找到得很好)

curl -XPOST 'http://localhost:9200/my_test_couch_db/my_test_couch_db/_search' -d '{
  "query": {
    "text": {
      "_all": "test"
    }
  }
}'

这是输出

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.081366636,
    "hits": [
      {
        "_index": "my_test_couch_db",
        "_type": "my_test_couch_db",
        "_id": "ID2",
        "_score": 0.081366636,
        "_source": {
          "message": "test Elastic Search",
          "_rev": "2-72142ec18248cedb4dba67305d136aa8",
          "_id": "ID2",
          "name": "Document2"
        }
      }
    ]
  }
}

如您所见,ES只能在消息字段中找到单词“test”,他们在*中找不到这个单词.text 附件文件。

我尝试其他查询:

curl -XPOST 'http://localhost:9200/my_test_couch_db/my_test_couch_db/_search' -d '{
  "query": {
    "text": {
      "_attachments": "test"
    }
  }
}'

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

curl -XPOST 'http://localhost:9200/my_test_couch_db/my_test_couch_db/_search' -d '{
  "query": {
    "text": {
      "_attachments.fields.file": "test"
    }
  }
}'

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

输出什么也没有。我尝试了其他映射,但它也不起作用。

为什么会这样以及如何解决这个问题?

Hi Please give me the indication./
I am using the elasticsearch 0.17.6 and couchdb 1.1.0

I have created two documents on couchdb:
Each document have the string fields: name, message. The first one is attached by a text file "test.txt" and the second one is not. The JSon code generated by CouchDB is like that:

{
  "_id": "ID1",
  "_rev": "6-e1ab4c5c65b98e9a0d91e5c8fc1629bb",
  "name": "Document1",
  "message": "Evaluate Elastic Search",
  "_attachments":   {
     "test.txt": {
       "content_type": "text/plain",
       "revpos": 5,
       "digest": "md5-REzvAVEZoSV69SLI/vaflQ==",
       "length": 86,
       "stub": true
     }
  }
}

{

 "_id": "ID2",
 "_rev": "2-72142ec18248cedb4dba67305d136aa8",
 "name": "Document2",
 "message": "test Elastic Search"
}

These two documents are in a database called my_test_couch_db

I have use Elasticsearch (ES) to index these documents using plugins: river and mapper-attachments. For each given text, I expect that ES can find, not only corresponding text in document's fields, but also in the attachment *.txt file. But it is impossible. I try many ways:I have created index manually, mapping (automatically and manually), configure river, etc. but ES can only find words in document's fields, it cannot find the ones in *.txt attachment files. I follow the indication of site http://www.elasticsearch.org but it doesnot work, either.

Thanks for your answers.

Here is my commands:

curl -X PUT "localhost:9200/test_idx_1"

curl -X PUT "localhost:9200/test_idx_1/test_mapping_1/_mapping" -d '{
  "test_mapping_1": {
    "properties": {
      "_attachments": {
        "type": "attachment",
        "index": "yes"
      }
    }
  }
}'

curl -XPUT 'http://localhost:9200/_river/test_river_1/_meta' -d '{
  "type": "couchdb",
  "couchdb": {
    "host": "localhost",
    "port": 5984,
    "db": "my_test_couch_db",
    "filter": null
  },
  "index": {
    "index": "test_idx_1",
    "type": "test_mapping_1"
  }
}'

then, I try to search

curl -XPOST 'http://localhost:9200/my_test_couch_db/my_test_couch_db/_search'

(two documents are find very well )

curl -XPOST 'http://localhost:9200/my_test_couch_db/my_test_couch_db/_search' -d '{
  "query": {
    "text": {
      "_all": "test"
    }
  }
}'

Here is the output

{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 1,
    "max_score": 0.081366636,
    "hits": [
      {
        "_index": "my_test_couch_db",
        "_type": "my_test_couch_db",
        "_id": "ID2",
        "_score": 0.081366636,
        "_source": {
          "message": "test Elastic Search",
          "_rev": "2-72142ec18248cedb4dba67305d136aa8",
          "_id": "ID2",
          "name": "Document2"
        }
      }
    ]
  }
}

As you see, the ES can only find the word "test" in the message field, they cannot find this word in the *.text attachment files.

I try the other queries:

curl -XPOST 'http://localhost:9200/my_test_couch_db/my_test_couch_db/_search' -d '{
  "query": {
    "text": {
      "_attachments": "test"
    }
  }
}'

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

curl -XPOST 'http://localhost:9200/my_test_couch_db/my_test_couch_db/_search' -d '{
  "query": {
    "text": {
      "_attachments.fields.file": "test"
    }
  }
}'

{
  "took": 0,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": null,
    "hits": []
  }
}

The output is nothing. I try other mappings but it doesn't work, either.

Why is that and how to solve this problem?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

爱你不解释 2024-12-07 07:02:51

couchDb River 尚未加载附件。
我已经更新了它,但仍在等待用户它工作正常。

请参阅 https://github.com/dadoonet/elasticsearch-river-couchdb/tree/附件
您可以在这里尝试:https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/elasticsearch-river-couchdb-1.2.0-SNAPSHOT.zip

如果它适合你,我可以创建拉取请求。

Attachment are not yet loaded by couchDb river.
I have updated it but still waiting for users that it works fine.

See https://github.com/dadoonet/elasticsearch-river-couchdb/tree/attachments
You can try it here : https://github.com/downloads/dadoonet/elasticsearch-river-couchdb/elasticsearch-river-couchdb-1.2.0-SNAPSHOT.zip

If it works fine for you, I can create the pull request.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文