Elasticsearch查询返回所有记录
我在 Elasticsearch 中有一个小型数据库,出于测试目的,我想拉回所有记录。我正在尝试使用以下形式的 URL...
http://localhost:9200/foo/_search?pretty=true&q={'matchAll':{''}}
有人可以给我您用来完成此操作的 URL 吗?
I have a small database in Elasticsearch and for testing purposes would like to pull all records back. I am attempting to use a URL of the form...
http://localhost:9200/foo/_search?pretty=true&q={'matchAll':{''}}
Can someone give me the URL you would use to accomplish this, please?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(29)
我认为支持 lucene 语法,因此:
http://localhost:9200/foo/_search?pretty=true&q=*:*
大小默认为 10,因此您可能还需要
& ;size=BIGNUMBER
获取超过 10 个项目。 (其中 BIGNUMBER 等于您认为大于数据集的数字)但是,elasticsearch 文档 建议 对于大型结果集,使用扫描搜索类型。
EG:
然后按照上面的文档链接建议继续请求。
编辑:
scan
在 2.1.0 中已弃用。与按
_doc
排序的常规scroll
请求相比,scan
没有提供任何优势。 弹性文档链接(被@christophe-roussy发现)I think lucene syntax is supported so:
http://localhost:9200/foo/_search?pretty=true&q=*:*
size defaults to 10, so you may also need
&size=BIGNUMBER
to get more than 10 items. (where BIGNUMBER equals a number you believe is bigger than your dataset)BUT, elasticsearch documentation suggests for large result sets, using the scan search type.
EG:
and then keep requesting as per the documentation link above suggests.
EDIT:
scan
Deprecated in 2.1.0.scan
does not provide any benefits over a regularscroll
request sorted by_doc
. link to elastic docs (spotted by @christophe-roussy)注意大小参数,它将每个分片显示的命中数从默认值 (10) 增加到 1000。
http://www.elasticsearch.org/guide/en/ elasticsearch/reference/current/search-request-from-size.html
Note the size param, which increases the hits displayed from the default (10) to 1000 per shard.
http://www.elasticsearch.org/guide/en/elasticsearch/reference/current/search-request-from-size.html
elasticsearch(ES) 支持 GET 或 POST 请求从 ES 集群索引获取数据。
当我们执行 GET 操作时:
当我们执行 POST 操作时:
我建议使用带 elasticsearch 的 UI 插件 http: //mobz.github.io/elasticsearch-head/
这将帮助您更好地了解您创建的索引并测试您的索引。
elasticsearch(ES) supports both a GET or a POST request for getting the data from the ES cluster index.
When we do a GET:
When we do a POST:
I would suggest to use a UI plugin with elasticsearch http://mobz.github.io/elasticsearch-head/
This will help you get a better feeling of the indices you create and also test your indices.
下面的查询将返回您想要返回的 NO_OF_RESULTS。
现在,这里的问题是您希望返回所有记录。因此,在编写查询之前,您自然不会知道 NO_OF_RESULTS 的值。
我们如何知道您的文档中有多少条记录?只需在下面键入查询
即可获得如下所示的结果。
结果总计告诉您文档中有多少条记录。因此,这是了解 NO_OF RESULTS 值的好方法
搜索所有索引中的所有类型 搜索
foo 索引中的所有类型
搜索 foo1 和 foo2 索引中的所有类型
搜索以以下开头的任何索引中的所有类型f
在所有索引中搜索类型 user 和 tweet
The query below would return the NO_OF_RESULTS you would like to be returned..
Now, the question here is that you want all the records to be returned. So naturally, before writing a query, you wont know the value of NO_OF_RESULTS.
How do we know how many records exist in your document? Simply type the query below
This would give you a result that looks like the one below
The result total tells you how many records are available in your document. So, that's a nice way to know the value of NO_OF RESULTS
Search all types in all indices
Search all types in the foo index
Search all types in the foo1 and foo2 indices
Search all types in any indices beginning with f
Search types user and tweet in all indices
这是我使用python客户端找到的最佳解决方案
https://gist.github.com/drorata/146ce50807d16fd4a6aa
使用 java 客户端
https://www.elastic.co /guide/en/elasticsearch/client/java-api/current/java-search-scrolling.html
This is the best solution I found using python client
https://gist.github.com/drorata/146ce50807d16fd4a6aa
Using java client
https://www.elastic.co/guide/en/elasticsearch/client/java-api/current/java-search-scrolling.html
如果数据集较小(例如 1K 条记录),您只需指定
大小
即可:匹配所有查询 不是必需的,因为它是隐式的。
如果您有一个中等大小的数据集,例如 1M 条记录,您可能没有足够的内存来加载它,因此您需要一个 滚动。
滚动就像数据库中的游标。在 Elasticsearch 中,它会记住您离开的位置并保持相同的索引视图(即防止搜索者带着 刷新,防止合并片段)。
从 API 角度来看,您必须向第一个请求添加滚动参数:
您将返回第一页和滚动 ID:
请记住,您返回的滚动 ID 和超时对于下一页均有效。这里的一个常见错误是指定一个非常大的超时(
scroll
的值),这将覆盖整个数据集(例如 1M 记录)而不是一页(例如 100 条记录)的处理。要获取下一页,请填写最后一个滚动 ID 和一个超时,该超时应持续到获取下一页为止:
如果您有很多要导出(例如 1B 文档),您将需要并行化。这可以通过 切片滚动来完成。假设您想在 10 个线程上导出。第一个线程将发出这样的请求:
您将返回第一页和滚动 ID,与正常的滚动请求完全相同。您将像普通卷轴一样使用它,只不过您获得的是 1/10 的数据。
其他线程也会做同样的事情,除了
id
将为 1, 2, 3...If it's a small dataset (e.g. 1K records), you can simply specify
size
:The match all query isn't needed, as it's implicit.
If you have a medium-sized dataset, like 1M records, you may not have enough memory to load it, so you need a scroll.
A scroll is like a cursor in a DB. In Elasticsearch, it remembers where you left off and keeps the same view of the index (i.e. prevents the searcher from going away with a refresh, prevents segments from merging).
API-wise, you have to add a scroll parameter to the first request:
You get back the first page and a scroll ID:
Remember that both the scroll ID you get back and the timeout are valid for the next page. A common mistake here is to specify a very large timeout (value of
scroll
), that would cover for processing the whole dataset (e.g. 1M records) instead of one page (e.g. 100 records).To get the next page, fill in the last scroll ID and a timeout that should last until fetching the following page:
If you have a lot to export (e.g. 1B documents), you'll want to parallelise. This can be done via sliced scroll. Say you want to export on 10 threads. The first thread would issue a request like this:
You get back the first page and a scroll ID, exactly like a normal scroll request. You'd consume it exactly like a regular scroll, except that you get 1/10th of the data.
Other threads would do the same, except that
id
would be 1, 2, 3...如果你想提取数千条记录,那么......一些人给出了使用“scroll”的正确答案(注意:有些人还建议使用“search_type=scan”。这已被弃用,并在 v5.0 中删除。您不需要它)
从“搜索”查询开始,但指定“滚动”参数(此处我使用 1 分钟超时):
这包括您的第一“批次”点击。但我们还没有完成。上面的curl命令的输出将是这样的:
手边有_scroll_id很重要,因为接下来您应该运行以下命令:
但是,传递scroll_id并不是设计为手动完成的事情。最好的选择是编写代码来完成它。例如在java中:
现在LOOP最后一个命令使用SearchResponse来提取数据。
If you want to pull many thousands of records then... a few people gave the right answer of using 'scroll' (Note: Some people also suggested using "search_type=scan". This was deprecated, and in v5.0 removed. You don't need it)
Start with a 'search' query, but specifying a 'scroll' parameter (here I'm using a 1 minute timeout):
That includes your first 'batch' of hits. But we are not done here. The output of the above curl command would be something like this:
It's important to have _scroll_id handy as next you should run the following command:
However, passing the scroll_id around is not something designed to be done manually. Your best bet is to write code to do it. e.g. in java:
Now LOOP on the last command use SearchResponse to extract the data.
如果您只是添加一些大数字作为大小,Elasticsearch 会显着变慢,获取所有文档的一种方法是使用扫描和滚动 ID。
https://www.elastic.co/guide /en/elasticsearch/reference/current/search-request-scroll.html
在 Elasticsearch v7.2 中,您可以这样做:
结果将包含一个 _scroll_id,您必须查询该 _scroll_id 才能获取下一个 100块。
Elasticsearch will get significant slower if you just add some big number as size, one method to use to get all documents is using scan and scroll ids.
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-scroll.html
In Elasticsearch v7.2, you do it like this:
The results from this would contain a _scroll_id which you have to query to get the next 100 chunk.
实际上,您不需要将正文传递给
match_all
,可以通过对以下 URL 的 GET 请求来完成。这是最简单的形式。http://localhost:9200/foo/_search
You actually don't need to pass a body to
match_all
, it can be done with a GET request to the following URL. This is the simplest form.http://localhost:9200/foo/_search
使用
server:9200/_stats
还可以获取有关所有别名的统计信息..例如每个别名的大小和元素数量,这非常有用并且提供了有用的信息use
server:9200/_stats
also to get statistics about all your aliases.. like size and number of elements per alias, that's very useful and provides helpful information调整大小的最佳方法是在 URL 前面使用 size=number
注意:此大小中可以定义的最大值是 10000。对于任何超过一万的值,它希望您使用滚动功能,这将最大限度地减少对性能影响的机会。
The best way to adjust the size is using size=number in front of the URL
Note: maximum value which can be defined in this size is 10000. For any value above ten thousand it expects you to use scroll function which would minimise any chances of impacts to performance.
您可以使用
_count
< /a> 用于获取size
参数值的 API:返回
{count:X, ...}
。提取值“X”,然后执行实际查询:You can use the
_count
API to get the value for thesize
parameter:Returns
{count:X, ...}
. Extract value 'X' and then do the actual query:使用kibana控制台和my_index作为索引来搜索以下内容可以贡献。要求索引只返回索引的4个字段,您还可以添加size来指示您希望索引返回多少个文档。从 ES 7.6 开始,您应该使用 _source 而不是过滤器,它会响应更快。
Using kibana console and my_index as the index to search the following can be contributed. Asking the index to only return 4 fields of the index, you can also add size to indicate how many documents that you want to be returned by the index. As of ES 7.6 you should use _source rather than filter it will respond faster.
简单的!您可以使用
size
和from
参数!然后逐渐更改
from
,直到获得所有数据。Simple! You can use
size
andfrom
parameter!then you change the
from
gradually until you get all of the data.来自 Kibana DevTools 的:
From Kibana DevTools its:
http://localhost:9200/foo/_search/?大小 =1000&pretty=1
您需要指定尺寸查询参数,默认值为 10
http://localhost:9200/foo/_search/?size=1000&pretty=1
you will need to specify size query parameter as the default is 10
size 参数将显示的点击数从默认值 (10) 增加到 500。
逐步更改 from 以获取所有数据。
size param increases the hits displayed from from the default(10) to 500.
Change the from step by step to get all the data.
使用 python 包 elasticsearch-dsl 的简单解决方案:
另请参阅 https://elasticsearch-dsl.readthedocs.io/en/latest/api.html#elasticsearch_dsl.Search.scan 。
A simple solution using the python package elasticsearch-dsl:
See also https://elasticsearch-dsl.readthedocs.io/en/latest/api.html#elasticsearch_dsl.Search.scan .
对于 Elasticsearch 6.x
请求:
GET /foo/_search?pretty=true
响应:点击中 ->总计,给出文档的数量
For Elasticsearch 6.x
Request:
GET /foo/_search?pretty=true
Response: In Hits-> total, give the count of the docs
默认情况下,Elasticsearch 返回 10 条记录,因此应明确提供大小。
根据请求添加大小以获得所需的记录数。
http://{host}:9200/{index_name}/_search?pretty=true&size=(记录数)
注意:
最大页面大小不能超过index.max_result_window索引设置,默认为10,000。
By default Elasticsearch return 10 records so size should be provided explicitly.
Add size with request to get desire number of records.
http://{host}:9200/{index_name}/_search?pretty=true&size=(number of records)
Note :
Max page size can not be more than index.max_result_window index setting which defaults to 10,000.
要从所有索引返回所有记录,您可以执行以下操作:
curl -XGET http://35.195.120.21:9200/_all/_search?size=50&pretty
输出:
To return all records from all indices you can do:
curl -XGET http://35.195.120.21:9200/_all/_search?size=50&pretty
Output:
通过提供大小,elasticSearch 将返回的最大结果是 10000
之后,您必须使用 Scroll API 来获取结果并获取 _scroll_id 值并将该值放入 scroll_id
The maximum result which will return by elasticSearch is 10000 by providing the size
After that, you have to use Scroll API for getting the result and get the _scroll_id value and put this value in scroll_id
如果仍然有人像我一样在某些用例中寻找从 Elasticsearch 检索的所有数据,那么这就是我所做的。此外,所有数据意味着所有索引和所有文档类型。我正在使用 Elasticsearch 6.3
Elasticsearch参考
If still someone is looking for all the data to be retrieved from Elasticsearch like me for some usecases, here is what I did. Moreover, all the data means, all the indexes and all the documents types. I'm using Elasticsearch 6.3
Elasticsearch reference
官方文档给出了这个问题的答案!您可以在此处。
您只需将大小 (1) 替换为您想要查看的结果数量即可!
The official documentation provides the answer to this question! you can find it here.
You simply replace size (1) with the number of results you want to see!
这是完成你想要的查询,
(我建议使用 Kibana,因为它有助于更好地理解查询)
要获取必须使用“match_all”查询的所有记录。
size 是您要获取的记录数(某种限制)。
默认情况下,ES只会返回10条记录
,类似于skip,跳过前3条记录。
如果您想准确获取所有记录,只需使用“total”字段中的值即可
从 Kibana 中执行此查询并将其与“size”一起使用后的结果中。
this is the query to accomplish what you want,
(I am suggesting to use Kibana, as it helps to understand queries better)
to get all records you have to use "match_all" query.
size is the no of records you want to fetch (kind of limit).
by default, ES will only return 10 records
from is like skip, skip first 3 records.
If you want to fetch exactly all the records, just use the value from the "total" field
from the result once you hit this query from Kibana and the use it with "size".
除了 @Akira Sendoh 之外,没有人回答过如何实际获取所有文档。但即使该解决方案也会使我的 ES 6.3 服务崩溃而没有日志。使用低级
elasticsearch-py
库对我有用的唯一方法是通过 使用scroll()
api 的扫描助手:然而,现在更干净的方法似乎是通过
elasticsearch-dsl
图书馆,提供更抽象、更清晰的调用,例如: http://elasticsearch- dsl.readthedocs.io/en/latest/search_dsl.html#hitsNone except @Akira Sendoh has answered how to actually get ALL docs. But even that solution crashes my ES 6.3 service without logs. The only thing that worked for me using the low-level
elasticsearch-py
library was through scan helper that usesscroll()
api:However, the cleaner way nowadays seems to be through
elasticsearch-dsl
library, that offers more abstract, cleaner calls, e.g: http://elasticsearch-dsl.readthedocs.io/en/latest/search_dsl.html#hits使用 Elasticsearch 7.5.1,
您还可以使用 &size=${number} 指定数组的大小
如果您不知道索引,
Using Elasticsearch 7.5.1
in case you can also specify the size of your array with &size=${number}
in case you don't know you index
您可以使用 size=0 这将返回所有文档
例子
You can use size=0 this will return you all the documents
example