Akka流停止处理数据

发布于 2025-02-10 09:27:43 字数 2027 浏览 3 评论 0原文

当我运行下面的流时，一旦流运行，就不会收到任何后续数据。

    final long HOUR = 3600000;
    final long PAST_HOUR = System.currentTimeMillis()-HOUR;

private final static ActorSystem actorSystem = ActorSystem.create(Behaviors.empty(), "as");

protected static ElasticsearchParams constructElasticsearchParams(
        String indexName, String typeName, ApiVersion apiVersion) {
    if (apiVersion == ApiVersion.V5) {
        return ElasticsearchParams.V5(indexName, typeName);
    } else if (apiVersion == ApiVersion.V7) {
        return ElasticsearchParams.V7(indexName);
    }
    else {
        throw new IllegalArgumentException("API version " + apiVersion + " is not supported");
    }
}

    String queryStr = "{ \"bool\": {  \"must\" : [{\"range\" : {"+
            "\"timestamp\" : { "+
            "\"gte\" : "+PAST_HOUR
            +" }} }]}} ";

    ElasticsearchConnectionSettings connectionSettings =
            ElasticsearchConnectionSettings.create("****")
                    .withCredentials("****", "****");

    ElasticsearchSourceSettings sourceSettings =
            ElasticsearchSourceSettings.create(connectionSettings)
                    .withApiVersion(ApiVersion.V7);

    Source<ReadResult<Stats>, NotUsed> dataSource =
            ElasticsearchSource.typed(
                    constructElasticsearchParams("data", "_doc", ApiVersion.V7),
                    queryStr,
                    sourceSettings,
                    Stats.class);

    dataSource.buffer(10000, OverflowStrategy.backpressure());
    dataSource.backpressureTimeout(Duration.ofSeconds(1));

    dataSource
            .log("error")
            .runWith(Sink.foreach(a -> System.out.println(a)), actorSystem);

产生输出：

ReadResult(id=1656107389556,source=Stats(size=0.09471),version=)

数据不断地写入索引数据，但是一旦开始，该流就不会对其进行处理。该流不应该从上游来源处理数据吗？在这种情况下，上游源是一个名为数据的弹性索引。

我尝试修改查询以匹配所有文档：

String queryStr =  "{\"match_all\": {}}";

但是结果相同。

原文

When I run the below stream it does not receive any subsequent data once the stream runs.

    final long HOUR = 3600000;
    final long PAST_HOUR = System.currentTimeMillis()-HOUR;

private final static ActorSystem actorSystem = ActorSystem.create(Behaviors.empty(), "as");

protected static ElasticsearchParams constructElasticsearchParams(
        String indexName, String typeName, ApiVersion apiVersion) {
    if (apiVersion == ApiVersion.V5) {
        return ElasticsearchParams.V5(indexName, typeName);
    } else if (apiVersion == ApiVersion.V7) {
        return ElasticsearchParams.V7(indexName);
    }
    else {
        throw new IllegalArgumentException("API version " + apiVersion + " is not supported");
    }
}

    String queryStr = "{ \"bool\": {  \"must\" : [{\"range\" : {"+
            "\"timestamp\" : { "+
            "\"gte\" : "+PAST_HOUR
            +" }} }]}} ";

    ElasticsearchConnectionSettings connectionSettings =
            ElasticsearchConnectionSettings.create("****")
                    .withCredentials("****", "****");

    ElasticsearchSourceSettings sourceSettings =
            ElasticsearchSourceSettings.create(connectionSettings)
                    .withApiVersion(ApiVersion.V7);

    Source<ReadResult<Stats>, NotUsed> dataSource =
            ElasticsearchSource.typed(
                    constructElasticsearchParams("data", "_doc", ApiVersion.V7),
                    queryStr,
                    sourceSettings,
                    Stats.class);

    dataSource.buffer(10000, OverflowStrategy.backpressure());
    dataSource.backpressureTimeout(Duration.ofSeconds(1));

    dataSource
            .log("error")
            .runWith(Sink.foreach(a -> System.out.println(a)), actorSystem);

produces output :

ReadResult(id=1656107389556,source=Stats(size=0.09471),version=)

Data is continually being written to the index data but the stream does not process it once it has started. Shouldn't the stream continually process data from the upstream source? In this case, the upstream source is an Elastic index named data.

I've tried amending the query to match all documents :

String queryStr =  "{\"match_all\": {}}";

but the same result.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

紅太極 2025-02-17 09:27:43

Elasticsearch源不会连续运行。它启动搜索，管理分页（使用批量API）并流式播放；当Elasticsearch报告不再完成结果时，它将完成。

您可以做类似的事情

Source.repeat(Done).flatMapConcat(done -> ElasticsearchSource.typed(...))

，这些事情将在上一个结束后立即进行新的搜索。请注意，下游过滤重复物将是责任。

The Elasticsearch source does not run continuously. It initiates a search, manages pagination (using the bulk API) and streams results; when Elasticsearch reports no more results it completes.

You could do something like

Source.repeat(Done).flatMapConcat(done -> ElasticsearchSource.typed(...))

Which will run a new search immediately after the previous one finishes. Note that it would be the responsibility of the downstream to filter out duplicates.

回复收藏 0 原文

~没有更多了~