spring amqp中使用spring webclient的正确方法
我有下面的技术堆栈,用于使用来自rabbitmq的消息的spring amqp应用程序 -
Spring boot 2.2.6.RELEASE
Reactor Netty 0.9.12.RELEASE
Reactor Core 3.3.10.RELEASE
应用程序部署在4核RHEL上。
以下是rabbitmq使用的一些配置
@Bean
public CachingConnectionFactory connectionFactory() {
CachingConnectionFactory cachingConnectionFactory = new CachingConnectionFactory();
cachingConnectionFactory.setHost(<<HOST NAME>>);
cachingConnectionFactory.setUsername(<<USERNAME>>);
cachingConnectionFactory.setPassword(<<PASSWORD>>);
cachingConnectionFactory.setChannelCacheSize(50);
return cachingConnectionFactory;
}
@Bean
public SimpleRabbitListenerContainerFactory rabbitListenerContainerFactory() {
SimpleRabbitListenerContainerFactory factory = new SimpleRabbitListenerContainerFactory();
factory.setConnectionFactory(connectionFactory());
factory.setMaxConcurrentConsumers(50);
factory.setMessageConverter(new Jackson2JsonMessageConverter());
factory.setDefaultRequeueRejected(false); /** DLQ is in place **/
return factory;
}
消费者使用spring webclient以同步模式进行下游API调用。下面是 Webclient 的配置。
@Bean
public WebClient webClient() {
ConnectionProvider connectionProvider = ConnectionProvider
.builder("fixed")
.lifo()
.pendingAcquireTimeout(Duration.ofMillis(200000))
.maxConnections(16)
.pendingAcquireMaxCount(3000)
.maxIdleTime(Duration.ofMillis(290000))
.build();
HttpClient client = HttpClient.create(connectionProvider);
client.tcpConfiguration(<<connection timeout, read timeout, write timeout is set here....>>);
Webclient.Builder builder =
Webclient.builder().baseUrl(<<base URL>>).clientConnector(new ReactorClientHttpConnector(client));
return builder.build();
}
这个 webclient 自动装配到 @Service 类中
@Autowired
private Webclient webClient;
,并在下面的两个地方使用。第一个地方是一个调用 -
public DownstreamStatusEnum downstream(String messageid, String payload, String contentType) {
return call(messageid,payload,contentType);
}
private DownstreamStatusEnum call(String messageid, String payload, String contentType) {
DownstreamResponse response = sendRequest(messageid,payload,contentType).**block()**;
return response;
}
private Mono<DownstreamResponse> sendRequest(String messageid, String payload, String contentType) {
return webClient
.method(POST)
.uri(<<URI>>)
.contentType(MediaType.valueOf(contentType))
.body(BodyInserters.fromValue(payload))
.exchange()
.flatMap(response -> response.bodyToMono(DownstreamResponse.class));
}
其他地方需要并行下游调用,并已按如下方式实现
private Flux<DownstreamResponse> getValues (List<DownstreamRequest> reqList, String messageid) {
return Flux
.fromIterable(reqList)
.parallel()
.runOn(Schedulers.elastic())
.flatMap(s -> {
return webClient
.method(POST)
.uri(<<downstream url>>)
.body(BodyInserters.fromValue(s))
.exchange()
.flatMap(response -> {
if(response.statusCode().isError()) {
return Mono.just(new DownstreamResponse());
}
return response.bodyToMono(DownstreamResponse.class);
});
}).sequential();
}
public List<DownstreamResponse> updateValue (List<DownstreamRequest> reqList,String messageid) {
return getValues(reqList,messageid).collectList().**block()**;
}
该应用程序在过去一年左右的时间里一直运行良好。最近,我们发现一个问题,一个或多个消费者似乎陷入了未确认状态的默认预取 (250) 消息数量。解决该问题的唯一方法是重新启动应用程序。
我们最近没有进行任何代码更改。最近也没有任何基础设施变化。
发生这种情况时,我们进行线程转储。观察到的模式是相似的。大多数消费者线程处于 TIMED_WAITING 状态,而一两个消费者显示为 WAITING 状态且堆栈如下 -
"org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#0-13" waiting for condition ...
java.lang.Thread.State: WAITING (parking)
- parking to wait for ......
at .......
at .......
at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(......
at reactor.core.publisher.Mono.block(....
at .........WebClientServiceImpl.call(...
另请参阅下文 -
"org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#0-13" waiting for condition ...
java.lang.Thread.State: WAITING (parking)
- parking to wait for ......
at .......
at .......
at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(......
at reactor.core.publisher.Mono.block(....
at .........WebClientServiceImpl.updateValue(...
不确定此线程转储是否显示消费者线程实际上卡在此处 “阻止”呼叫。
请帮助告知这里可能存在什么问题以及需要采取哪些步骤来解决此问题。早些时候我们认为这可能是rabbitmq/spring aqmp的一些问题,但基于线程转储,看起来像是webclient“阻止”调用的问题。
添加 Blockhound 时,它在日志文件中的堆栈跟踪下方打印 -
Error has been observed at following site(s)
Checkpoint Request to POST https://....... [DefaultWebClient]
Stack Trace:
at java.lang.Object.wait
......
at java.net.InetAddress.checkLookupTable
at java.net.InetAddress.getAddressFromNameService
......
at io.netty.util.internal.SocketUtils$8.run
......
at io.netty.resolver.DefaultNameResolver.doResolve
I have below tech stack for a spring amqp application consuming messages from rabbitmq -
Spring boot 2.2.6.RELEASE
Reactor Netty 0.9.12.RELEASE
Reactor Core 3.3.10.RELEASE
Application is deployed on 4 core RHEL.
Below are some of the configurations being used for rabbitmq
@Bean
public CachingConnectionFactory connectionFactory() {
CachingConnectionFactory cachingConnectionFactory = new CachingConnectionFactory();
cachingConnectionFactory.setHost(<<HOST NAME>>);
cachingConnectionFactory.setUsername(<<USERNAME>>);
cachingConnectionFactory.setPassword(<<PASSWORD>>);
cachingConnectionFactory.setChannelCacheSize(50);
return cachingConnectionFactory;
}
@Bean
public SimpleRabbitListenerContainerFactory rabbitListenerContainerFactory() {
SimpleRabbitListenerContainerFactory factory = new SimpleRabbitListenerContainerFactory();
factory.setConnectionFactory(connectionFactory());
factory.setMaxConcurrentConsumers(50);
factory.setMessageConverter(new Jackson2JsonMessageConverter());
factory.setDefaultRequeueRejected(false); /** DLQ is in place **/
return factory;
}
The consumers make downstream API calls using spring webclient in synchronous mode. Below is configuration for Webclient
@Bean
public WebClient webClient() {
ConnectionProvider connectionProvider = ConnectionProvider
.builder("fixed")
.lifo()
.pendingAcquireTimeout(Duration.ofMillis(200000))
.maxConnections(16)
.pendingAcquireMaxCount(3000)
.maxIdleTime(Duration.ofMillis(290000))
.build();
HttpClient client = HttpClient.create(connectionProvider);
client.tcpConfiguration(<<connection timeout, read timeout, write timeout is set here....>>);
Webclient.Builder builder =
Webclient.builder().baseUrl(<<base URL>>).clientConnector(new ReactorClientHttpConnector(client));
return builder.build();
}
This webclient is autowired into a @Service class as
@Autowired
private Webclient webClient;
and used as below in two places. First place is one call -
public DownstreamStatusEnum downstream(String messageid, String payload, String contentType) {
return call(messageid,payload,contentType);
}
private DownstreamStatusEnum call(String messageid, String payload, String contentType) {
DownstreamResponse response = sendRequest(messageid,payload,contentType).**block()**;
return response;
}
private Mono<DownstreamResponse> sendRequest(String messageid, String payload, String contentType) {
return webClient
.method(POST)
.uri(<<URI>>)
.contentType(MediaType.valueOf(contentType))
.body(BodyInserters.fromValue(payload))
.exchange()
.flatMap(response -> response.bodyToMono(DownstreamResponse.class));
}
Other place requires parallel downstream calls and has been implemented as below
private Flux<DownstreamResponse> getValues (List<DownstreamRequest> reqList, String messageid) {
return Flux
.fromIterable(reqList)
.parallel()
.runOn(Schedulers.elastic())
.flatMap(s -> {
return webClient
.method(POST)
.uri(<<downstream url>>)
.body(BodyInserters.fromValue(s))
.exchange()
.flatMap(response -> {
if(response.statusCode().isError()) {
return Mono.just(new DownstreamResponse());
}
return response.bodyToMono(DownstreamResponse.class);
});
}).sequential();
}
public List<DownstreamResponse> updateValue (List<DownstreamRequest> reqList,String messageid) {
return getValues(reqList,messageid).collectList().**block()**;
}
The application has been working fine for past one year or so. Of late, we are seeing an issue whereby one or more consumers seem to just get stuck with the default prefetch (250) number of messages in unack status. The only way to fix the issue is to restart app.
We have not done any code changes recently. Also there have been no infra changes recently either.
When this happens, we took thread dumps. The pattern observed is similar. Most of the consumer threads are in TIMED_WAITING status while one or two consumers show in WAITING state with below stacks -
"org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#0-13" waiting for condition ...
java.lang.Thread.State: WAITING (parking)
- parking to wait for ......
at .......
at .......
at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(......
at reactor.core.publisher.Mono.block(....
at .........WebClientServiceImpl.call(...
Also see below -
"org.springframework.amqp.rabbit.RabbitListenerEndpointContainer#0-13" waiting for condition ...
java.lang.Thread.State: WAITING (parking)
- parking to wait for ......
at .......
at .......
at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(......
at reactor.core.publisher.Mono.block(....
at .........WebClientServiceImpl.updateValue(...
Not exactly sure if this thread dump is showing that consumer threads are actually stuck at this
"block" call.
Please help advise what could be the issue here and what steps need to be taken to fix this. Earlier we thought it may be some issue with rabbitmq/spring aqmp but based on thread dump, looks like issue with webclient "block" call.
On adding Blockhound, it is printing below stacktrace in log file -
Error has been observed at following site(s)
Checkpoint Request to POST https://....... [DefaultWebClient]
Stack Trace:
at java.lang.Object.wait
......
at java.net.InetAddress.checkLookupTable
at java.net.InetAddress.getAddressFromNameService
......
at io.netty.util.internal.SocketUtils$8.run
......
at io.netty.resolver.DefaultNameResolver.doResolve
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
抱歉,刚刚意识到并行通量调用中的 flatMap 实际上如下所示
所以在错误情况下,我认为底层连接没有被正确释放。当我像下面这样更新它时,它似乎已经解决了问题 -
Sorry, just realized that the flatMap in parallel flux call was actually like below
So in error scenarios, I think the underlying connection was not being properly released. When I updated it like below, it seemed to have fixed the issue -