Elasticsearch查询返回所有记录

发布于 2024-12-26 19:54:34 字数 196 浏览 5 评论 0原文

我在 Elasticsearch 中有一个小型数据库,出于测试目的,我想拉回所有记录。我正在尝试使用以下形式的 URL...

http://localhost:9200/foo/_search?pretty=true&q={'matchAll':{''}}

有人可以给我您用来完成此操作的 URL 吗?

I have a small database in Elasticsearch and for testing purposes would like to pull all records back. I am attempting to use a URL of the form...

http://localhost:9200/foo/_search?pretty=true&q={'matchAll':{''}}

Can someone give me the URL you would use to accomplish this, please?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(29

桃酥萝莉 2025-01-02 19:54:34

我认为支持 lucene 语法,因此:

http://localhost:9200/foo/_search?pretty=true&q=*:*

大小默认为 10,因此您可能还需要 & ;size=BIGNUMBER 获取超过 10 个项目。 (其中 BIGNUMBER 等于您认为大于数据集的数字)

但是,elasticsearch 文档 建议 对于大型结果集,使用扫描搜索类型。

EG:

curl -XGET 'localhost:9200/foo/_search?search_type=scan&scroll=10m&size=50' -d '
{
    "query" : {
        "match_all" : {}
    }
}'

然后按照上面的文档链接建议继续请求。

编辑:scan 在 2.1.0 中已弃用。

与按 _doc 排序的常规 scroll 请求相比,scan 没有提供任何优势。 弹性文档链接(被@christophe-roussy发现)

I think lucene syntax is supported so:

http://localhost:9200/foo/_search?pretty=true&q=*:*

size defaults to 10, so you may also need &size=BIGNUMBER to get more than 10 items. (where BIGNUMBER equals a number you believe is bigger than your dataset)

BUT, elasticsearch documentation suggests for large result sets, using the scan search type.

EG:

curl -XGET 'localhost:9200/foo/_search?search_type=scan&scroll=10m&size=50' -d '
{
    "query" : {
        "match_all" : {}
    }
}'

and then keep requesting as per the documentation link above suggests.

EDIT: scan Deprecated in 2.1.0.

scan does not provide any benefits over a regular scroll request sorted by _doc. link to elastic docs (spotted by @christophe-roussy)

唐婉 2025-01-02 19:54:34
http://127.0.0.1:9200/foo/_search/?size=1000&pretty=1
                                   ^

注意大小参数,它将每个分片显示的命中数从默认值 (10) 增加到 1000。

http://www.elasticsearch.org/guide/en/ elasticsearch/reference/current/search-request-from-size.html

http://127.0.0.1:9200/foo/_search/?size=1000&pretty=1
                                   ^

Note the size param, which increases the hits displayed from the default (10) to 1000 per shard.

http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html

握住我的手 2025-01-02 19:54:34

elasticsearch(ES) 支持 GET 或 POST 请求从 ES 集群索引获取数据。

当我们执行 GET 操作时:

http://localhost:9200/[your index name]/_search?size=[no of records you want]&q=*:*

当我们执行 POST 操作时:

http://localhost:9200/[your_index_name]/_search
{
  "size": [your value] //default 10
  "from": [your start index] //default 0
  "query":
   {
    "match_all": {}
   }
}   

我建议使用带 elasticsearch 的 UI 插件 http: //mobz.github.io/elasticsearch-head/
这将帮助您更好地了解您创建的索引并测试您的索引。

elasticsearch(ES) supports both a GET or a POST request for getting the data from the ES cluster index.

When we do a GET:

http://localhost:9200/[your index name]/_search?size=[no of records you want]&q=*:*

When we do a POST:

http://localhost:9200/[your_index_name]/_search
{
  "size": [your value] //default 10
  "from": [your start index] //default 0
  "query":
   {
    "match_all": {}
   }
}   

I would suggest to use a UI plugin with elasticsearch http://mobz.github.io/elasticsearch-head/
This will help you get a better feeling of the indices you create and also test your indices.

瑶笙 2025-01-02 19:54:34

注意:答案与旧版本的 Elasticsearch 0.90 相关。此后发布的版本具有更新的语法。请参阅其他答案,这些答案可能会为您正在寻找的最新答案提供更准确的答案。

下面的查询将返回您想要返回的 NO_OF_RESULTS。

curl -XGET 'localhost:9200/foo/_search?size=NO_OF_RESULTS' -d '
{
"query" : {
    "match_all" : {}
  }
}'

现在,这里的问题是您希望返回所有记录。因此,在编写查询之前,您自然不会知道 NO_OF_RESULTS 的值。

我们如何知道您的文档中有多少条记录?只需在下面键入查询

curl -XGET 'localhost:9200/foo/_search' -d '

即可获得如下所示的结果。

 {
hits" : {
  "total" :       2357,
  "hits" : [
    {
      ..................

结果总计告诉您文档中有多少条记录。因此,这是了解 NO_OF RESULTS 值的好方法

curl -XGET 'localhost:9200/_search' -d ' 

搜索所有索引中的所有类型 搜索

curl -XGET 'localhost:9200/foo/_search' -d '

foo 索引中的所有类型

curl -XGET 'localhost:9200/foo1,foo2/_search' -d '

搜索 foo1 和 foo2 索引中的所有类型

curl -XGET 'localhost:9200/f*/_search

搜索以以下开头的任何索引中的所有类型f

curl -XGET 'localhost:9200/_all/type1,type2/_search' -d '

在所有索引中搜索类型 user 和 tweet

Note: The answer relates to an older version of Elasticsearch 0.90. Versions released since then have an updated syntax. Please refer to other answers that may provide a more accurate answer to the latest answer that you are looking for.

The query below would return the NO_OF_RESULTS you would like to be returned..

curl -XGET 'localhost:9200/foo/_search?size=NO_OF_RESULTS' -d '
{
"query" : {
    "match_all" : {}
  }
}'

Now, the question here is that you want all the records to be returned. So naturally, before writing a query, you wont know the value of NO_OF_RESULTS.

How do we know how many records exist in your document? Simply type the query below

curl -XGET 'localhost:9200/foo/_search' -d '

This would give you a result that looks like the one below

 {
hits" : {
  "total" :       2357,
  "hits" : [
    {
      ..................

The result total tells you how many records are available in your document. So, that's a nice way to know the value of NO_OF RESULTS

curl -XGET 'localhost:9200/_search' -d ' 

Search all types in all indices

curl -XGET 'localhost:9200/foo/_search' -d '

Search all types in the foo index

curl -XGET 'localhost:9200/foo1,foo2/_search' -d '

Search all types in the foo1 and foo2 indices

curl -XGET 'localhost:9200/f*/_search

Search all types in any indices beginning with f

curl -XGET 'localhost:9200/_all/type1,type2/_search' -d '

Search types user and tweet in all indices

清醇 2025-01-02 19:54:34

这是我使用python客户端找到的最佳解决方案

  # Initialize the scroll
  page = es.search(
  index = 'yourIndex',
  doc_type = 'yourType',
  scroll = '2m',
  search_type = 'scan',
  size = 1000,
  body = {
    # Your query's body
    })
  sid = page['_scroll_id']
  scroll_size = page['hits']['total']

  # Start scrolling
  while (scroll_size > 0):
    print "Scrolling..."
    page = es.scroll(scroll_id = sid, scroll = '2m')
    # Update the scroll ID
    sid = page['_scroll_id']
    # Get the number of results that we returned in the last scroll
    scroll_size = len(page['hits']['hits'])
    print "scroll size: " + str(scroll_size)
    # Do something with the obtained page

https://gist.github.com/drorata/146ce50807d16fd4a6aa

使用 java 客户端

import static org.elasticsearch.index.query.QueryBuilders.*;

QueryBuilder qb = termQuery("multi", "test");

SearchResponse scrollResp = client.prepareSearch(test)
        .addSort(FieldSortBuilder.DOC_FIELD_NAME, SortOrder.ASC)
        .setScroll(new TimeValue(60000))
        .setQuery(qb)
        .setSize(100).execute().actionGet(); //100 hits per shard will be returned for each scroll
//Scroll until no hits are returned
do {
    for (SearchHit hit : scrollResp.getHits().getHits()) {
        //Handle the hit...
    }

    scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
} while(scrollResp.getHits().getHits().length != 0); // Zero hits mark the end of the scroll and the while loop.

https://www.elastic.co /guide/en/elasticsearch/client/java-api/current/java-search-scrolling.html

This is the best solution I found using python client

  # Initialize the scroll
  page = es.search(
  index = 'yourIndex',
  doc_type = 'yourType',
  scroll = '2m',
  search_type = 'scan',
  size = 1000,
  body = {
    # Your query's body
    })
  sid = page['_scroll_id']
  scroll_size = page['hits']['total']

  # Start scrolling
  while (scroll_size > 0):
    print "Scrolling..."
    page = es.scroll(scroll_id = sid, scroll = '2m')
    # Update the scroll ID
    sid = page['_scroll_id']
    # Get the number of results that we returned in the last scroll
    scroll_size = len(page['hits']['hits'])
    print "scroll size: " + str(scroll_size)
    # Do something with the obtained page

https://gist.github.com/drorata/146ce50807d16fd4a6aa

Using java client

import static org.elasticsearch.index.query.QueryBuilders.*;

QueryBuilder qb = termQuery("multi", "test");

SearchResponse scrollResp = client.prepareSearch(test)
        .addSort(FieldSortBuilder.DOC_FIELD_NAME, SortOrder.ASC)
        .setScroll(new TimeValue(60000))
        .setQuery(qb)
        .setSize(100).execute().actionGet(); //100 hits per shard will be returned for each scroll
//Scroll until no hits are returned
do {
    for (SearchHit hit : scrollResp.getHits().getHits()) {
        //Handle the hit...
    }

    scrollResp = client.prepareSearchScroll(scrollResp.getScrollId()).setScroll(new TimeValue(60000)).execute().actionGet();
} while(scrollResp.getHits().getHits().length != 0); // Zero hits mark the end of the scroll and the while loop.

https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-search-scrolling.html

眼泪淡了忧伤 2025-01-02 19:54:34

如果数据集较小(例如 1K 条记录),您只需指定大小即可:

curl localhost:9200/foo_index/_search?size=1000

匹配所有查询 不是必需的,因为它是隐式的。

如果您有一个中等大小的数据集,例如 1M 条记录,您可能没有足够的内存来加载它,因此您需要一个 滚动

滚动就像数据库中的游标。在 Elasticsearch 中,它会记住您离开的位置并保持相同的索引视图(即防止搜索者带着 刷新,防止合并片段)。

从 API 角度来看,您必须向第一个请求添加滚动参数:

curl 'localhost:9200/foo_index/_search?size=100&scroll=1m&pretty'

您将返回第一页和滚动 ID:

{
  "_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAADEWbmJlSmxjb2hSU0tMZk12aEx2c0EzUQ==",
  "took" : 0,
...

请记住,您返回的滚动 ID 和超时对于下一页均有效。这里的一个常见错误是指定一个非常大的超时(scroll 的值),这将覆盖整个数据集(例如 1M 记录)而不是一页(例如 100 条记录)的处理。

要获取下一页,请填写最后一个滚动 ID 和一个超时,该超时应持续到获取下一页为止:

curl -XPOST -H 'Content-Type: application/json' 'localhost:9200/_search/scroll' -d '{
  "scroll": "1m",
  "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAADAWbmJlSmxjb2hSU0tMZk12aEx2c0EzUQ=="
}'

如果您有很多要导出(例如 1B 文档),您将需要并行化。这可以通过 切片滚动来完成。假设您想在 10 个线程上导出。第一个线程将发出这样的请求:

curl -XPOST -H 'Content-Type: application/json' 'localhost:9200/test/_search?scroll=1m&size=100' -d '{
  "slice": {
    "id": 0, 
    "max": 10 
  }
}'

您将返回第一页和滚动 ID,与正常的滚动请求完全相同。您将像普通卷轴一样使用它,只不过您获得的是 1/10 的数据。

其他线程也会做同样的事情,除了 id 将为 1, 2, 3...

If it's a small dataset (e.g. 1K records), you can simply specify size:

curl localhost:9200/foo_index/_search?size=1000

The match all query isn't needed, as it's implicit.

If you have a medium-sized dataset, like 1M records, you may not have enough memory to load it, so you need a scroll.

A scroll is like a cursor in a DB. In Elasticsearch, it remembers where you left off and keeps the same view of the index (i.e. prevents the searcher from going away with a refresh, prevents segments from merging).

API-wise, you have to add a scroll parameter to the first request:

curl 'localhost:9200/foo_index/_search?size=100&scroll=1m&pretty'

You get back the first page and a scroll ID:

{
  "_scroll_id" : "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAADEWbmJlSmxjb2hSU0tMZk12aEx2c0EzUQ==",
  "took" : 0,
...

Remember that both the scroll ID you get back and the timeout are valid for the next page. A common mistake here is to specify a very large timeout (value of scroll), that would cover for processing the whole dataset (e.g. 1M records) instead of one page (e.g. 100 records).

To get the next page, fill in the last scroll ID and a timeout that should last until fetching the following page:

curl -XPOST -H 'Content-Type: application/json' 'localhost:9200/_search/scroll' -d '{
  "scroll": "1m",
  "scroll_id": "DXF1ZXJ5QW5kRmV0Y2gBAAAAAAAAADAWbmJlSmxjb2hSU0tMZk12aEx2c0EzUQ=="
}'

If you have a lot to export (e.g. 1B documents), you'll want to parallelise. This can be done via sliced scroll. Say you want to export on 10 threads. The first thread would issue a request like this:

curl -XPOST -H 'Content-Type: application/json' 'localhost:9200/test/_search?scroll=1m&size=100' -d '{
  "slice": {
    "id": 0, 
    "max": 10 
  }
}'

You get back the first page and a scroll ID, exactly like a normal scroll request. You'd consume it exactly like a regular scroll, except that you get 1/10th of the data.

Other threads would do the same, except that id would be 1, 2, 3...

梦里兽 2025-01-02 19:54:34

如果你想提取数千条记录,那么......一些人给出了使用“scroll”的正确答案(注意:有些人还建议使用“search_type=scan”。这已被弃用,并在 v5.0 中删除。您不需要它)

从“搜索”查询开始,但指定“滚动”参数(此处我使用 1 分钟超时):

curl -XGET 'http://ip1:9200/myindex/_search?scroll=1m' -d '
{
    "query": {
            "match_all" : {}
    }
}
'

这包括您的第一“批次”点击。但我们还没有完成。上面的curl命令的输出将是这样的:

{"_scroll_id":"c2Nhbjs1OzUyNjE6NU4tU3BrWi1UWkNIWVNBZW43bXV3Zzs1Mzc3OkhUQ0g3VGllU2FhemJVNlM5d2t0alE7NTI2Mjo1Ti1TcGtaLVRaQ0hZU0FlbjdtdXdnOzUzNzg6SFRDSDdUaWVTYWF6YlU2Uzl3a3RqUTs1MjYzOjVOLVNwa1otVFpDSFlTQWVuN211d2c7MTt0b3RhbF9oaXRzOjIyNjAxMzU3Ow==","took":109,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":22601357,"max_score":0.0,"hits":[]}}

手边有_scroll_id很重要,因为接下来您应该运行以下命令:

    curl -XGET  'localhost:9200/_search/scroll'  -d'
    {
        "scroll" : "1m", 
        "scroll_id" : "c2Nhbjs2OzM0NDg1ODpzRlBLc0FXNlNyNm5JWUc1" 
    }
    '

但是,传递scroll_id并不是设计为手动完成的事情。最好的选择是编写代码来完成它。例如在java中:

    private TransportClient client = null;
    private Settings settings = ImmutableSettings.settingsBuilder()
                  .put(CLUSTER_NAME,"cluster-test").build();
    private SearchResponse scrollResp  = null;

    this.client = new TransportClient(settings);
    this.client.addTransportAddress(new InetSocketTransportAddress("ip", port));

    QueryBuilder queryBuilder = QueryBuilders.matchAllQuery();
    scrollResp = client.prepareSearch(index).setSearchType(SearchType.SCAN)
                 .setScroll(new TimeValue(60000))                            
                 .setQuery(queryBuilder)
                 .setSize(100).execute().actionGet();

    scrollResp = client.prepareSearchScroll(scrollResp.getScrollId())
                .setScroll(new TimeValue(timeVal))
                .execute()
                .actionGet();

现在LOOP最后一个命令使用SearchResponse来提取数据。

If you want to pull many thousands of records then... a few people gave the right answer of using 'scroll' (Note: Some people also suggested using "search_type=scan". This was deprecated, and in v5.0 removed. You don't need it)

Start with a 'search' query, but specifying a 'scroll' parameter (here I'm using a 1 minute timeout):

curl -XGET 'http://ip1:9200/myindex/_search?scroll=1m' -d '
{
    "query": {
            "match_all" : {}
    }
}
'

That includes your first 'batch' of hits. But we are not done here. The output of the above curl command would be something like this:

{"_scroll_id":"c2Nhbjs1OzUyNjE6NU4tU3BrWi1UWkNIWVNBZW43bXV3Zzs1Mzc3OkhUQ0g3VGllU2FhemJVNlM5d2t0alE7NTI2Mjo1Ti1TcGtaLVRaQ0hZU0FlbjdtdXdnOzUzNzg6SFRDSDdUaWVTYWF6YlU2Uzl3a3RqUTs1MjYzOjVOLVNwa1otVFpDSFlTQWVuN211d2c7MTt0b3RhbF9oaXRzOjIyNjAxMzU3Ow==","took":109,"timed_out":false,"_shards":{"total":5,"successful":5,"failed":0},"hits":{"total":22601357,"max_score":0.0,"hits":[]}}

It's important to have _scroll_id handy as next you should run the following command:

    curl -XGET  'localhost:9200/_search/scroll'  -d'
    {
        "scroll" : "1m", 
        "scroll_id" : "c2Nhbjs2OzM0NDg1ODpzRlBLc0FXNlNyNm5JWUc1" 
    }
    '

However, passing the scroll_id around is not something designed to be done manually. Your best bet is to write code to do it. e.g. in java:

    private TransportClient client = null;
    private Settings settings = ImmutableSettings.settingsBuilder()
                  .put(CLUSTER_NAME,"cluster-test").build();
    private SearchResponse scrollResp  = null;

    this.client = new TransportClient(settings);
    this.client.addTransportAddress(new InetSocketTransportAddress("ip", port));

    QueryBuilder queryBuilder = QueryBuilders.matchAllQuery();
    scrollResp = client.prepareSearch(index).setSearchType(SearchType.SCAN)
                 .setScroll(new TimeValue(60000))                            
                 .setQuery(queryBuilder)
                 .setSize(100).execute().actionGet();

    scrollResp = client.prepareSearchScroll(scrollResp.getScrollId())
                .setScroll(new TimeValue(timeVal))
                .execute()
                .actionGet();

Now LOOP on the last command use SearchResponse to extract the data.

筑梦 2025-01-02 19:54:34

如果您只是添加一些大数字作为大小,Elasticsearch 会显着变慢,获取所有文档的一种方法是使用扫描和滚动 ID。

https://www.elastic.co/guide /en/elasticsearch/reference/current/search-request-scroll.html

在 Elasticsearch v7.2 中,您可以这样做:

POST /foo/_search?scroll=1m
{
    "size": 100,
    "query": {
        "match_all": {}
    }
}

结果将包含一个 _scroll_id,您必须查询该 _scroll_id 才能获取下一个 100块。

POST /_search/scroll 
{
    "scroll" : "1m", 
    "scroll_id" : "<YOUR SCROLL ID>" 
}

Elasticsearch will get significant slower if you just add some big number as size, one method to use to get all documents is using scan and scroll ids.

https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html

In Elasticsearch v7.2, you do it like this:

POST /foo/_search?scroll=1m
{
    "size": 100,
    "query": {
        "match_all": {}
    }
}

The results from this would contain a _scroll_id which you have to query to get the next 100 chunk.

POST /_search/scroll 
{
    "scroll" : "1m", 
    "scroll_id" : "<YOUR SCROLL ID>" 
}
弥繁 2025-01-02 19:54:34

实际上,您不需要将正文传递给 match_all,可以通过对以下 URL 的 GET 请求来完成。这是最简单的形式。

http://localhost:9200/foo/_search

You actually don't need to pass a body to match_all, it can be done with a GET request to the following URL. This is the simplest form.

http://localhost:9200/foo/_search

人海汹涌 2025-01-02 19:54:34

使用 server:9200/_stats 还可以获取有关所有别名的统计信息..例如每个别名的大小和元素数量,这非常有用并且提供了有用的信息

use server:9200/_stats also to get statistics about all your aliases.. like size and number of elements per alias, that's very useful and provides helpful information

风蛊 2025-01-02 19:54:34

调整大小的最佳方法是在 URL 前面使用 size=number

Curl -XGET "http://localhost:9200/logstash-*/_search?size=50&pretty"

注意:此大小中可以定义的最大值是 10000。对于任何超过一万的值,它希望您使用滚动功能,这将最大限度地减少对性能影响的机会。

The best way to adjust the size is using size=number in front of the URL

Curl -XGET "http://localhost:9200/logstash-*/_search?size=50&pretty"

Note: maximum value which can be defined in this size is 10000. For any value above ten thousand it expects you to use scroll function which would minimise any chances of impacts to performance.

梦里的微风 2025-01-02 19:54:34

您可以使用 _count< /a> 用于获取 size 参数值的 API:

http://localhost:9200/foo/_count?q=<your query>

返回 {count:X, ...}。提取值“X”,然后执行实际查询:

http://localhost:9200/foo/_search?q=<your query>&size=X

You can use the _count API to get the value for the size parameter:

http://localhost:9200/foo/_count?q=<your query>

Returns {count:X, ...}. Extract value 'X' and then do the actual query:

http://localhost:9200/foo/_search?q=<your query>&size=X
若能看破又如何 2025-01-02 19:54:34

使用kibana控制台和my_index作为索引来搜索以下内容可以贡献。要求索引只返回索引的4个字段,您还可以添加size来指示您希望索引返回多少个文档。从 ES 7.6 开始,您应该使用 _source 而不是过滤器,它会响应更快。

GET /address/_search
 {
   "_source": ["streetaddress","city","state","postcode"],
   "size": 100,
   "query":{
   "match_all":{ }
    }   
 }

Using kibana console and my_index as the index to search the following can be contributed. Asking the index to only return 4 fields of the index, you can also add size to indicate how many documents that you want to be returned by the index. As of ES 7.6 you should use _source rather than filter it will respond faster.

GET /address/_search
 {
   "_source": ["streetaddress","city","state","postcode"],
   "size": 100,
   "query":{
   "match_all":{ }
    }   
 }
冷情 2025-01-02 19:54:34

简单的!您可以使用 sizefrom 参数!

http://localhost:9200/[your index name]/_search?size=1000&from=0

然后逐渐更改from,直到获得所有数据。

Simple! You can use size and from parameter!

http://localhost:9200/[your index name]/_search?size=1000&from=0

then you change the from gradually until you get all of the data.

陌路终见情 2025-01-02 19:54:34

来自 Kibana DevTools 的:

GET my_index_name/_search
{
  "query": {
    "match_all": {}
  }
}

From Kibana DevTools its:

GET my_index_name/_search
{
  "query": {
    "match_all": {}
  }
}
软糖 2025-01-02 19:54:34

http://localhost:9200/foo/_search/?大小 =1000&pretty=1

您需要指定尺寸查询参数,默认值为 10

http://localhost:9200/foo/_search/?size=1000&pretty=1

you will need to specify size query parameter as the default is 10

千鲤 2025-01-02 19:54:34

size 参数将显示的点击数从默认值 (10) 增加到 500。

http://localhost:9200/[indexName]/_search?pretty=true&size=500&q=*:*

逐步更改 from 以获取所有数据。

http://localhost:9200/[indexName]/_search?size=500&from=0

size param increases the hits displayed from from the default(10) to 500.

http://localhost:9200/[indexName]/_search?pretty=true&size=500&q=*:*

Change the from step by step to get all the data.

http://localhost:9200/[indexName]/_search?size=500&from=0
零度° 2025-01-02 19:54:34

使用 python 包 elasticsearch-dsl 的简单解决方案:

from elasticsearch_dsl import Search
from elasticsearch_dsl import connections

connections.create_connection(hosts=['localhost'])

s = Search(index="foo")
response = s.scan()

count = 0
for hit in response:
    # print(hit.to_dict())  # be careful, it will printout every hit in your index
    count += 1

print(count)

另请参阅 https://elasticsearch-dsl.readthedocs.io/en/latest/api.html#elasticsearch_dsl.Search.scan

A simple solution using the python package elasticsearch-dsl:

from elasticsearch_dsl import Search
from elasticsearch_dsl import connections

connections.create_connection(hosts=['localhost'])

s = Search(index="foo")
response = s.scan()

count = 0
for hit in response:
    # print(hit.to_dict())  # be careful, it will printout every hit in your index
    count += 1

print(count)

See also https://elasticsearch-dsl.readthedocs.io/en/latest/api.html#elasticsearch_dsl.Search.scan .

反目相谮 2025-01-02 19:54:34

对于 Elasticsearch 6.x

请求:GET /foo/_search?pretty=true

响应:点击中 ->总计,给出文档的数量

    {
      "took": 1,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": 1001,
        "max_score": 1,
        "hits": [
          {

For Elasticsearch 6.x

Request: GET /foo/_search?pretty=true

Response: In Hits-> total, give the count of the docs

    {
      "took": 1,
      "timed_out": false,
      "_shards": {
        "total": 5,
        "successful": 5,
        "skipped": 0,
        "failed": 0
      },
      "hits": {
        "total": 1001,
        "max_score": 1,
        "hits": [
          {
转身以后 2025-01-02 19:54:34
curl -X GET 'localhost:9200/foo/_search?q=*&pretty' 
curl -X GET 'localhost:9200/foo/_search?q=*&pretty' 
剩一世无双 2025-01-02 19:54:34

默认情况下,Elasticsearch 返回 10 条记录,因此应明确提供大小。

根据请求添加大小以获得所需的记录数。

http://{host}:9200/{index_name}/_search?pretty=true&size=(记录数)

注意:
最大页面大小不能超过index.max_result_window索引设置,默认为10,000。

By default Elasticsearch return 10 records so size should be provided explicitly.

Add size with request to get desire number of records.

http://{host}:9200/{index_name}/_search?pretty=true&size=(number of records)

Note :
Max page size can not be more than index.max_result_window index setting which defaults to 10,000.

人心善变 2025-01-02 19:54:34

要从所有索引返回所有记录,您可以执行以下操作:

curl -XGET http://35.195.120.21:9200/_all/_search?size=50&pretty

输出:

  "took" : 866,
  "timed_out" : false,
  "_shards" : {
    "total" : 25,
    "successful" : 25,
    "failed" : 0
  },
  "hits" : {
    "total" : 512034694,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "grafana-dash",
      "_type" : "dashboard",
      "_id" : "test",
      "_score" : 1.0,
       ...

To return all records from all indices you can do:

curl -XGET http://35.195.120.21:9200/_all/_search?size=50&pretty

Output:

  "took" : 866,
  "timed_out" : false,
  "_shards" : {
    "total" : 25,
    "successful" : 25,
    "failed" : 0
  },
  "hits" : {
    "total" : 512034694,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "grafana-dash",
      "_type" : "dashboard",
      "_id" : "test",
      "_score" : 1.0,
       ...
别挽留 2025-01-02 19:54:34

通过提供大小,elasticSearch 将返回的最大结果是 10000

curl -XGET 'localhost:9200/index/type/_search?scroll=1m' -d '
{
   "size":10000,
   "query" : {
   "match_all" : {}
    }
}'

之后,您必须使用 Scroll API 来获取结果并获取 _scroll_id 值并将该值放入 scroll_id

curl -XGET  'localhost:9200/_search/scroll'  -d'
{
   "scroll" : "1m", 
   "scroll_id" : "" 
}'

The maximum result which will return by elasticSearch is 10000 by providing the size

curl -XGET 'localhost:9200/index/type/_search?scroll=1m' -d '
{
   "size":10000,
   "query" : {
   "match_all" : {}
    }
}'

After that, you have to use Scroll API for getting the result and get the _scroll_id value and put this value in scroll_id

curl -XGET  'localhost:9200/_search/scroll'  -d'
{
   "scroll" : "1m", 
   "scroll_id" : "" 
}'
冷…雨湿花 2025-01-02 19:54:34

如果仍然有人像我一样在某些用例中寻找从 Elasticsearch 检索的所有数据,那么这就是我所做的。此外,所有数据意味着所有索引和所有文档类型。我正在使用 Elasticsearch 6.3

curl -X GET "localhost:9200/_search?pretty=true" -H 'Content-Type: application/json' -d'
{
    "query": {
        "match_all": {}
    }
}
'

Elasticsearch参考

If still someone is looking for all the data to be retrieved from Elasticsearch like me for some usecases, here is what I did. Moreover, all the data means, all the indexes and all the documents types. I'm using Elasticsearch 6.3

curl -X GET "localhost:9200/_search?pretty=true" -H 'Content-Type: application/json' -d'
{
    "query": {
        "match_all": {}
    }
}
'

Elasticsearch reference

安稳善良 2025-01-02 19:54:34

官方文档给出了这个问题的答案!您可以在此处。

{
  "query": { "match_all": {} },
  "size": 1
}

您只需将大小 (1) 替换为您想要查看的结果数量即可!

The official documentation provides the answer to this question! you can find it here.

{
  "query": { "match_all": {} },
  "size": 1
}

You simply replace size (1) with the number of results you want to see!

被翻牌 2025-01-02 19:54:34

这是完成你想要的查询,
(我建议使用 Kibana,因为它有助于更​​好地理解查询)

GET my_index_name/my_type_name/_search
{
   "query":{
      "match_all":{}
   },
   size : 20,
   from : 3
}

要获取必须使用“match_all”查询的所有记录。

size 是您要获取的记录数(某种限制)。
默认情况下,ES只会返回10条记录

,类似于skip,跳过前3条记录。

如果您想准确获取所有记录,只需使用“total”字段中的值即可
从 Kibana 中执行此查询并将其与“size”一起使用后的结果中。

this is the query to accomplish what you want,
(I am suggesting to use Kibana, as it helps to understand queries better)

GET my_index_name/my_type_name/_search
{
   "query":{
      "match_all":{}
   },
   size : 20,
   from : 3
}

to get all records you have to use "match_all" query.

size is the no of records you want to fetch (kind of limit).
by default, ES will only return 10 records

from is like skip, skip first 3 records.

If you want to fetch exactly all the records, just use the value from the "total" field
from the result once you hit this query from Kibana and the use it with "size".

情丝乱 2025-01-02 19:54:34

除了 @Akira Sendoh 之外,没有人回答过如何实际获取所有文档。但即使该解决方案也会使我的 ES 6.3 服务崩溃而没有日志。使用低级 elasticsearch-py 库对我有用的唯一方法是通过 使用 scroll() api 的扫描助手

from elasticsearch.helpers import scan

doc_generator = scan(
    es_obj,
    query={"query": {"match_all": {}}},
    index="my-index",
)

# use the generator to iterate, dont try to make a list or you will get out of RAM
for doc in doc_generator:
    # use it somehow

然而,现在更干净的方法似乎是通过 elasticsearch-dsl 图书馆,提供更抽象、更清晰的调用,例如: http://elasticsearch- dsl.readthedocs.io/en/latest/search_dsl.html#hits

None except @Akira Sendoh has answered how to actually get ALL docs. But even that solution crashes my ES 6.3 service without logs. The only thing that worked for me using the low-level elasticsearch-py library was through scan helper that uses scroll() api:

from elasticsearch.helpers import scan

doc_generator = scan(
    es_obj,
    query={"query": {"match_all": {}}},
    index="my-index",
)

# use the generator to iterate, dont try to make a list or you will get out of RAM
for doc in doc_generator:
    # use it somehow

However, the cleaner way nowadays seems to be through elasticsearch-dsl library, that offers more abstract, cleaner calls, e.g: http://elasticsearch-dsl.readthedocs.io/en/latest/search_dsl.html#hits

怪我入戏太深 2025-01-02 19:54:34

使用 Elasticsearch 7.5.1,

http://${HOST}:9200/${INDEX}/_search?pretty=true&q=*:*&scroll=10m&size=5000

您还可以使用 &size=${number} 指定数组的大小

如果您不知道索引,

http://${HOST}:9200/_cat/indices?v

Using Elasticsearch 7.5.1

http://${HOST}:9200/${INDEX}/_search?pretty=true&q=*:*&scroll=10m&size=5000

in case you can also specify the size of your array with &size=${number}

in case you don't know you index

http://${HOST}:9200/_cat/indices?v
橙幽之幻 2025-01-02 19:54:34

您可以使用 size=0 这将返回所有文档
例子

curl -XGET 'localhost:9200/index/type/_search' -d '
{
   size:0,
   "query" : {
   "match_all" : {}
    }
}'

You can use size=0 this will return you all the documents
example

curl -XGET 'localhost:9200/index/type/_search' -d '
{
   size:0,
   "query" : {
   "match_all" : {}
    }
}'
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文