Spring Boot 应用程序在负载下出现全面延迟
我正在 Java 8 (1.8.0_161-b12 / HotSpot 64 25.161-b12) 的生产环境中运行带有 Thymeleaf (2.1.4) 的旧版 Spring Boot (1.4.4) 应用程序。当应用程序经历中等负载(1K 请求/分钟)时,所有应用程序请求都会经历延迟(> 1 分钟),并且应用程序实例上的 CPU 会达到最大值。
如何准确查明性能瓶颈发生在哪里?如果这是代码/设计问题,我如何确定,或者我是否只需要分配更多资源?
我尝试过的事情:
- 扩展数据库并增加连接池大小( CPU 可以忽略不计)
- 禁用模板/表达式(Thymeleaf)和查询缓存(Hibernate/couchbase)并经历相同的延迟。
- 扩展集群 (3x t2.large)
其他需要注意的事项:
- JVM 堆未配置,但似乎未接近最大堆大小。
- JVM 监控显示常规(分配失败)垃圾回收在加载期间频繁发生 ParallelGCFailedAllocation
- 发生了一些需要黑白 1-3 秒的
Full GC
事件。 - 峰值延迟期间的堆转储显示超过 172 个阻塞线程,这些线程似乎正在等待 Thymeleaf 缓存 PUT 请求?!
"http-nio-8080-exec-168" - Thread t@488
java.lang.Thread.State: BLOCKED
at org.thymeleaf.cache.StandardCache$CacheDataContainer.putWithoutTracing(StandardCache.java:384)
- waiting to lock <4e409b1c> (a [Ljava.lang.Object;) owned by "http-nio-8080-exec-170" t@490
at org.thymeleaf.cache.StandardCache$CacheDataContainer.put(StandardCache.java:368)
at org.thymeleaf.cache.StandardCache.put(StandardCache.java:141)
at org.thymeleaf.spring4.expression.SpelVariableExpressionEvaluator.getExpression(SpelVariableExpressionEvaluator.java:184)
at org.thymeleaf.spring4.expression.SpelVariableExpressionEvaluator.evaluate(SpelVariableExpressionEvaluator.java:129)
at org.thymeleaf.standard.expression.VariableExpression.executeVariable(VariableExpression.java:154)
at org.thymeleaf.standard.expression.SimpleExpression.executeSimple(SimpleExpression.java:59)
at org.thymeleaf.standard.expression.Expression.execute(Expression.java:103)
at org.thymeleaf.standard.expression.Expression.execute(Expression.java:133)
at org.thymeleaf.standard.expression.Expression.execute(Expression.java:120)
at org.thymeleaf.standard.processor.attr.AbstractStandardLocalVariableDefinitionAttrProcessor.getNewLocalVariables(AbstractStandardLocalVariableDefinitionAttrProcessor.java:89)
at org.thymeleaf.processor.attr.AbstractLocalVariableDefinitionAttrProcessor.processAttribute(AbstractLocalVariableDefinitionAttrProcessor.java:62)
at org.thymeleaf.processor.attr.AbstractAttrProcessor.doProcess(AbstractAttrProcessor.java:87)
at org.thymeleaf.processor.AbstractProcessor.process(AbstractProcessor.java:212)
at org.thymeleaf.dom.Node.applyNextProcessor(Node.java:1017)
at org.thymeleaf.dom.Node.processNode(Node.java:972)
at org.thymeleaf.dom.NestableNode.computeNextChild(NestableNode.java:695)
at org.thymeleaf.dom.NestableNode.doAdditionalProcess(NestableNode.java:668)
at org.thymeleaf.dom.Node.processNode(Node.java:990)
at org.thymeleaf.dom.NestableNode.computeNextChild(NestableNode.java:695)
at org.thymeleaf.dom.NestableNode.doAdditionalProcess(NestableNode.java:668)
at org.thymeleaf.dom.Node.processNode(Node.java:990)
at org.thymeleaf.dom.NestableNode.computeNextChild(NestableNode.java:695)
at org.thymeleaf.dom.NestableNode.doAdditionalProcess(NestableNode.java:668)
at org.thymeleaf.dom.Node.processNode(Node.java:990)
at org.thymeleaf.dom.NestableNode.computeNextChild(NestableNode.java:695)
at org.thymeleaf.dom.NestableNode.doAdditionalProcess(NestableNode.java:668)
at org.thymeleaf.dom.Node.processNode(Node.java:990)
at org.thymeleaf.dom.NestableNode.computeNextChild(NestableNode.java:695)
at org.thymeleaf.dom.NestableNode.doAdditionalProcess(NestableNode.java:668)
at org.thymeleaf.dom.Node.processNode(Node.java:990)
at org.thymeleaf.dom.Document.process(Document.java:93)
at org.thymeleaf.TemplateEngine.process(TemplateEngine.java:1155)
at org.thymeleaf.TemplateEngine.process(TemplateEngine.java:1060)
at org.thymeleaf.TemplateEngine.process(TemplateEngine.java:1011)
at org.thymeleaf.spring4.view.ThymeleafView.renderFragment(ThymeleafView.java:335)
at org.thymeleaf.spring4.view.ThymeleafView.render(ThymeleafView.java:190)
at org.springframework.web.servlet.DispatcherServlet.render(DispatcherServlet.java:1271)
由于应用程序有很多层,有没有办法找出罪魁祸首?
更新:
好的,在挖掘更多内容之后,还有更多内容:
- 负载测试中另一个点期间的另一个线程转储显示了对数据库的休眠调用时的一堆阻塞线程。
java.lang.Thread.State: BLOCKED
at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:543)
- waiting to lock <3d1c45eb> (a java.lang.Object) owned by "http-nio-8080-exec-270" t@823
at sun.security.provider.NativePRNG$RandomIO.access$400(NativePRNG.java:331)
at sun.security.provider.NativePRNG.engineNextBytes(NativePRNG.java:220)
at java.security.SecureRandom.nextBytes(SecureRandom.java:468)
at sun.security.ssl.CipherBox.createExplicitNonce(CipherBox.java:1015)
at sun.security.ssl.OutputRecord.encrypt(OutputRecord.java:248)
at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:869)
at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:857)
at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
- locked <7721ca63> (a sun.security.ssl.AppOutputStream)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
- locked <2135117a> (a java.io.BufferedOutputStream)
at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3725)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2506)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2677)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2549)
- locked <5c36dc6f> (a com.mysql.jdbc.JDBC4Connection)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1861)
- locked <5c36dc6f> (a com.mysql.jdbc.JDBC4Connection)
at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1962)
- locked <5c36dc6f> (a com.mysql.jdbc.JDBC4Connection)
at sun.reflect.GeneratedMethodAccessor254.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.github.gquintana.metrics.proxy.MethodInvocation.proceed(MethodInvocation.java:75)
at com.github.gquintana.metrics.sql.PreparedStatementProxyHandler.execute(PreparedStatementProxyHandler.java:53)
at com.github.gquintana.metrics.sql.AbstractStatementProxyHandler.invoke(AbstractStatementProxyHandler.java:50)
at com.github.gquintana.metrics.proxy.ProxyHandler.invoke(ProxyHandler.java:56)
at com.sun.proxy.$Proxy143.executeQuery(Unknown Source)
at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.extract(ResultSetReturnImpl.java:79)
at org.hibernate.loader.Loader.getResultSet(Loader.java:2062)
at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1859)
at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1838)
at org.hibernate.loader.Loader.doQuery(Loader.java:906)
at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:348)
at org.hibernate.loader.Loader.doList(Loader.java:2550)
at org.hibernate.loader.Loader.doList(Loader.java:2536)
at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2366)
at org.hibernate.loader.Loader.list(Loader.java:2361)
at org.hibernate.loader.hql.QueryLoader.list(QueryLoader.java:495)
at org.hibernate.hql.internal.ast.QueryTranslatorImpl.list(QueryTranslatorImpl.java:357)
at org.hibernate.engine.query.spi.HQLQueryPlan.performList(HQLQueryPlan.java:198)
at org.hibernate.internal.SessionImpl.list(SessionImpl.java:1230)
at org.hibernate.internal.QueryImpl.list(QueryImpl.java:101)
at org.hibernate.ejb.QueryImpl.getResultList(QueryImpl.java:268)
at org.hibernate.ejb.criteria.CriteriaQueryCompiler$3.getResultList(CriteriaQueryCompiler.java:254)
at org.broadleafcommerce.common.i18n.dao.TranslationDaoImpl.readTranslation(TranslationDaoImpl.java:262)
at org.broadleafcommerce.common.i18n.service.TranslationServiceImpl.getTranslatedValue(TranslationServiceImpl.java:194)
at org.broadleafcommerce.common.i18n.service.DynamicTranslationProvider.getValue(DynamicTranslationProvider.java:47)
at org.broadleafcommerce.core.catalog.domain.CategoryImpl.getName(CategoryImpl.java:489)
at org.<myorg>.CatalogService.lambda$getSeries1ActiveTechniques$11(CatalogService.java:95)
通常,此调用 getSeries1ActiveTechniques
位于 hibernate 查询缓存后面,但禁用它后,SQL 数据库的流量会增加。
虽然我知道这个设计需要重构,但我仍然不清楚这种减速是否是由于整体 CPU 的负载、整体 GC 影响 CPU,还是这两个单独的争用导致 GC 和高 CPU 造成的?
- 测试期间的 CPU 采样显示:
热点显示某些调用的 CPU 占用率较高;跳出的是 org.apache.catalina.webresources.JarWarResourceSet.getArchiveEntries ,它可能是一个 已知问题?
但是我如何知道这些 CPU 时间是否确实具有代表性,或者这些方法是否只是由于 GC 或 apache/jdbc 级别的线程池限制而导致更普遍的速度减慢?
I'm running a legacy Spring Boot (1.4.4) application with Thymeleaf (2.1.4) in production on Java 8 (1.8.0_161-b12 / HotSpot 64 25.161-b12). When the application experiences moderate load (1K reqs/min), all app requests experience latency (> 1min), and CPU on the app instances max out.
How can I pinpoint exactly where this performance bottleneck is occurring? If it is a code / design issue how can I be certain, or if I just need to allocate more resources?
Things I've tried:
- Scaling up the DB and increase connection pool size ( CPU is negligible )
- Disabled the template/expression (Thymeleaf) and query cache (Hibernate/couchbase) and experienced same latency.
- Scale up cluster (3x t2.large)
Other things to note:
- JVM heap is not configured, but doesn't appear to approach max heap size.
- JVM monitoring shows regular (Allocation Failed) garbage collection during load w/ ParallelGCFailedAllocation occurring frequently
- There's a few
Full GC
events occurring that take b/w 1-3 seconds. - A heap dump during peak latency reveals over 172 blocked threads that appear to be waiting on Thymeleaf cache PUT requests?!
"http-nio-8080-exec-168" - Thread t@488
java.lang.Thread.State: BLOCKED
at org.thymeleaf.cache.StandardCache$CacheDataContainer.putWithoutTracing(StandardCache.java:384)
- waiting to lock <4e409b1c> (a [Ljava.lang.Object;) owned by "http-nio-8080-exec-170" t@490
at org.thymeleaf.cache.StandardCache$CacheDataContainer.put(StandardCache.java:368)
at org.thymeleaf.cache.StandardCache.put(StandardCache.java:141)
at org.thymeleaf.spring4.expression.SpelVariableExpressionEvaluator.getExpression(SpelVariableExpressionEvaluator.java:184)
at org.thymeleaf.spring4.expression.SpelVariableExpressionEvaluator.evaluate(SpelVariableExpressionEvaluator.java:129)
at org.thymeleaf.standard.expression.VariableExpression.executeVariable(VariableExpression.java:154)
at org.thymeleaf.standard.expression.SimpleExpression.executeSimple(SimpleExpression.java:59)
at org.thymeleaf.standard.expression.Expression.execute(Expression.java:103)
at org.thymeleaf.standard.expression.Expression.execute(Expression.java:133)
at org.thymeleaf.standard.expression.Expression.execute(Expression.java:120)
at org.thymeleaf.standard.processor.attr.AbstractStandardLocalVariableDefinitionAttrProcessor.getNewLocalVariables(AbstractStandardLocalVariableDefinitionAttrProcessor.java:89)
at org.thymeleaf.processor.attr.AbstractLocalVariableDefinitionAttrProcessor.processAttribute(AbstractLocalVariableDefinitionAttrProcessor.java:62)
at org.thymeleaf.processor.attr.AbstractAttrProcessor.doProcess(AbstractAttrProcessor.java:87)
at org.thymeleaf.processor.AbstractProcessor.process(AbstractProcessor.java:212)
at org.thymeleaf.dom.Node.applyNextProcessor(Node.java:1017)
at org.thymeleaf.dom.Node.processNode(Node.java:972)
at org.thymeleaf.dom.NestableNode.computeNextChild(NestableNode.java:695)
at org.thymeleaf.dom.NestableNode.doAdditionalProcess(NestableNode.java:668)
at org.thymeleaf.dom.Node.processNode(Node.java:990)
at org.thymeleaf.dom.NestableNode.computeNextChild(NestableNode.java:695)
at org.thymeleaf.dom.NestableNode.doAdditionalProcess(NestableNode.java:668)
at org.thymeleaf.dom.Node.processNode(Node.java:990)
at org.thymeleaf.dom.NestableNode.computeNextChild(NestableNode.java:695)
at org.thymeleaf.dom.NestableNode.doAdditionalProcess(NestableNode.java:668)
at org.thymeleaf.dom.Node.processNode(Node.java:990)
at org.thymeleaf.dom.NestableNode.computeNextChild(NestableNode.java:695)
at org.thymeleaf.dom.NestableNode.doAdditionalProcess(NestableNode.java:668)
at org.thymeleaf.dom.Node.processNode(Node.java:990)
at org.thymeleaf.dom.NestableNode.computeNextChild(NestableNode.java:695)
at org.thymeleaf.dom.NestableNode.doAdditionalProcess(NestableNode.java:668)
at org.thymeleaf.dom.Node.processNode(Node.java:990)
at org.thymeleaf.dom.Document.process(Document.java:93)
at org.thymeleaf.TemplateEngine.process(TemplateEngine.java:1155)
at org.thymeleaf.TemplateEngine.process(TemplateEngine.java:1060)
at org.thymeleaf.TemplateEngine.process(TemplateEngine.java:1011)
at org.thymeleaf.spring4.view.ThymeleafView.renderFragment(ThymeleafView.java:335)
at org.thymeleaf.spring4.view.ThymeleafView.render(ThymeleafView.java:190)
at org.springframework.web.servlet.DispatcherServlet.render(DispatcherServlet.java:1271)
Since the application has many layers, is there a way to nail down the culprit?
UPDATE:
OK so after digging some more, a few more things:
- Another thread dump during another point in the load testing shows a bunch of blocked threads on hibernate calls to the database.
java.lang.Thread.State: BLOCKED
at sun.security.provider.NativePRNG$RandomIO.implNextBytes(NativePRNG.java:543)
- waiting to lock <3d1c45eb> (a java.lang.Object) owned by "http-nio-8080-exec-270" t@823
at sun.security.provider.NativePRNG$RandomIO.access$400(NativePRNG.java:331)
at sun.security.provider.NativePRNG.engineNextBytes(NativePRNG.java:220)
at java.security.SecureRandom.nextBytes(SecureRandom.java:468)
at sun.security.ssl.CipherBox.createExplicitNonce(CipherBox.java:1015)
at sun.security.ssl.OutputRecord.encrypt(OutputRecord.java:248)
at sun.security.ssl.SSLSocketImpl.writeRecordInternal(SSLSocketImpl.java:869)
at sun.security.ssl.SSLSocketImpl.writeRecord(SSLSocketImpl.java:857)
at sun.security.ssl.AppOutputStream.write(AppOutputStream.java:123)
- locked <7721ca63> (a sun.security.ssl.AppOutputStream)
at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140)
- locked <2135117a> (a java.io.BufferedOutputStream)
at com.mysql.jdbc.MysqlIO.send(MysqlIO.java:3725)
at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2506)
at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2677)
at com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2549)
- locked <5c36dc6f> (a com.mysql.jdbc.JDBC4Connection)
at com.mysql.jdbc.PreparedStatement.executeInternal(PreparedStatement.java:1861)
- locked <5c36dc6f> (a com.mysql.jdbc.JDBC4Connection)
at com.mysql.jdbc.PreparedStatement.executeQuery(PreparedStatement.java:1962)
- locked <5c36dc6f> (a com.mysql.jdbc.JDBC4Connection)
at sun.reflect.GeneratedMethodAccessor254.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.github.gquintana.metrics.proxy.MethodInvocation.proceed(MethodInvocation.java:75)
at com.github.gquintana.metrics.sql.PreparedStatementProxyHandler.execute(PreparedStatementProxyHandler.java:53)
at com.github.gquintana.metrics.sql.AbstractStatementProxyHandler.invoke(AbstractStatementProxyHandler.java:50)
at com.github.gquintana.metrics.proxy.ProxyHandler.invoke(ProxyHandler.java:56)
at com.sun.proxy.$Proxy143.executeQuery(Unknown Source)
at org.hibernate.engine.jdbc.internal.ResultSetReturnImpl.extract(ResultSetReturnImpl.java:79)
at org.hibernate.loader.Loader.getResultSet(Loader.java:2062)
at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1859)
at org.hibernate.loader.Loader.executeQueryStatement(Loader.java:1838)
at org.hibernate.loader.Loader.doQuery(Loader.java:906)
at org.hibernate.loader.Loader.doQueryAndInitializeNonLazyCollections(Loader.java:348)
at org.hibernate.loader.Loader.doList(Loader.java:2550)
at org.hibernate.loader.Loader.doList(Loader.java:2536)
at org.hibernate.loader.Loader.listIgnoreQueryCache(Loader.java:2366)
at org.hibernate.loader.Loader.list(Loader.java:2361)
at org.hibernate.loader.hql.QueryLoader.list(QueryLoader.java:495)
at org.hibernate.hql.internal.ast.QueryTranslatorImpl.list(QueryTranslatorImpl.java:357)
at org.hibernate.engine.query.spi.HQLQueryPlan.performList(HQLQueryPlan.java:198)
at org.hibernate.internal.SessionImpl.list(SessionImpl.java:1230)
at org.hibernate.internal.QueryImpl.list(QueryImpl.java:101)
at org.hibernate.ejb.QueryImpl.getResultList(QueryImpl.java:268)
at org.hibernate.ejb.criteria.CriteriaQueryCompiler$3.getResultList(CriteriaQueryCompiler.java:254)
at org.broadleafcommerce.common.i18n.dao.TranslationDaoImpl.readTranslation(TranslationDaoImpl.java:262)
at org.broadleafcommerce.common.i18n.service.TranslationServiceImpl.getTranslatedValue(TranslationServiceImpl.java:194)
at org.broadleafcommerce.common.i18n.service.DynamicTranslationProvider.getValue(DynamicTranslationProvider.java:47)
at org.broadleafcommerce.core.catalog.domain.CategoryImpl.getName(CategoryImpl.java:489)
at org.<myorg>.CatalogService.lambda$getSeries1ActiveTechniques$11(CatalogService.java:95)
Normally this call getSeries1ActiveTechniques
is behind a hibernate query cache, but with it disabled, there's a lot more traffic to the SQL database.
While I'm aware this design needs refactored, I'm still unclear as to whether this slowdown is due specifically from load via overall CPU, overall GC impacting CPU, or are these two separate contentions causing GC and high CPU?
- CPU sampling during the test shows:
Hotspots show high CPU for certain calls; jumping out is org.apache.catalina.webresources.JarWarResourceSet.getArchiveEntries
which might be a known issue with embedded tomcat?
But how do I know if these CPU times are actually representative or if these methods are just suffering from greater general slowdown from GC or threadpool limiting at apache/jdbc level?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您会发现免费使用较新版本的 Java 可以获得更好的性能,这将有助于解决您报告的一些 GC 延迟问题。如果延迟是一个问题,那么您应该考虑升级到 Java 11 或 Java 17。
但是,您突出显示的一组锁才是真正的问题。 Thymeleaf 缓存看起来每个请求都持有一个全局(或至少是高度竞争的)锁,并且所有内容都在其后面序列化。目前尚不清楚您所依赖的库是否有升级,这可能会解决该问题(2.1.6 是最新的,但也有 3.x 系列);或者,它们可能具有您可以选择/配置的不同缓存实现。看来 StandardCache 是罪魁祸首;也许您可以使用其他替代方案?
您可以尝试禁用缓存以查看是否会删除锁,尽管拥有缓存确实有意义。
https://www.thymeleaf.org/doc/tutorials/ 2.1/usingthymeleaf.html#template-cache
You’ll find that you’ll get better performance with newer versions of Java for free, which will help with some of the GC delays you are reporting. If latency is an issue then you should consider upgrading to either Java 11, or Java 17.
However the set of locks you have highlighted is the real problem. The Thymeleaf cache looks like it’s holding a global (or at least highly contended) lock per request, And everything is serialising behind that. It’s not clear if there is an upgrade to the library you depend on which might have fixed the issue (2.1.6 is latest, but there’s a 3.x series as well); alternatively they may have different cache implementations that you can choose/configure. It looks like the StandardCache is the culprit; maybe there an alternative you can use?
You could try and disable caching to e see if that removes the locks, though having a cache does make sense.
https://www.thymeleaf.org/doc/tutorials/2.1/usingthymeleaf.html#template-cache