我很难确定我应用程序最大百分比的以下延迟模式的潜在问题:
这是一个盖特图表,显示了4分钟的负载测试。前两分钟是相同场景的热身(这就是为什么它没有延迟图)。
在多个测试运行中,有两个带有几乎相同坡度的三角形(有时更多)可以清晰可见且可重现,无论我们在负载均衡器后面部署了多少个应用程序实例:
我正在寻找更多的调查途径,因为我很难搜索这种模式 - 它罢工了我特别奇怪,这个三角形不是“填充”的,而只是由钉子组成。此外,三角形感觉“倒置”:如果这将是一种不断增加的载荷的情况(不是),我希望看到这种三角形以倒置斜坡表现出来 - 这个斜率没有任何意义大部头书。
技术环境:
- 这是针对AWS中使用PostgreSQL数据库的Spring Boot应用程序,
- 在我们的Kubernetes群集中部署了6个POD,
- 我们的gatling Test 使用了该测试的自动缩放(请参阅下面的答案) ,事实证明,这是一个谎言)
- kubernetes入口配置是IS,如果我阅读了,这意味着对每个上游的野生关系默认值正确正确,
- 每个pod的数据库和CPU都没有最大化,
- 我们的负载测试计算机的网络上行链路没有最大化,并且该机器除运行负载测试
- (请求 / sec)上的其他其他内容几乎是恒定的,并且在测量
- 垃圾收集活动
期间进行热身 /在我们进行一些申请端优化以请求潜伏期之前,请勿在测量垃圾收集活动期间进行更改:
I am having a hard time to identify the underlying issue for the following latency pattern for the max percentile of my application:

This is a gatling chart that shows 4 minutes of load testing. The first two minutes are warmup of the same scenario (thats why it has no latency graph).
Two triangles (sometimes more) with a nearly identical slope are clearly visible and reproducible across multiple test runs, no matter how many application instances we deploy behind our load balancer:

I am looking for more paths to investigate as I have a hard time googling for this pattern - it strikes me as particularly odd that this triangle is not "filled" but just consists of spikes. Furthermore the triangle feels "inverted": if this would be a scenario with ever-increasing load (which it isn't) I would expect to see this kind of triangle manifest with an inverted slope - this slope just doesn't make any sense to me.
Technical context:
- This is for a Spring Boot application with a PostgreSQL database in AWS
- There are 6 pods deployed in our Kubernetes cluster, auto-scaling was disabled for this test
- Keep-alive is used by our Gatling test (see answer below, turns out this was a lie)
- Kubernetes ingress configuration is left as-is which implicates keep-alive to each upstream if I read the defaults correctly
- Both the database and CPU per pod are not maxed out
- The network uplink of our load testing machine is not maxed out and the machine does nothing else besides running the load test
- The load (requests / sec) on the application is nearly constant and not changing after the warmup / during the measurement
- Garbage collection activity is low
Here is another image to demonstrate the "triangle" before we made some application-side optimizations to request latency:

发布评论
评论(1)
事实证明,这是一个由两部分组成的问题:
ascontextlement()
)修复,而这并不能解释它确实解决了我们遇到的主要问题,而模式已经消失了。 。
This turned out to be a two-part issue:
asContextElement()
)While this does not explain the more than peculiar shape of the latency pattern it did resolve the main issues we had and the pattern is gone.