生产环境抓取进程僵死原因咨询

发布于 2021-11-27 15:18:50 字数 6309 浏览 850 评论 7

@黄亿华 你好,想跟你请教个问题:

在线上环境发现了一个问题,让人琢磨不透,特来请教。

抓取逻辑是这样,两个消息监听器(抓取进程)分别部署在不同的服务器上,使用的是redisscheduler。因网络异常,导致连不上数据库(需将抓取内容持久化到数据库),且持续时间蛮长的。

经查看日志发现,网络正常后其中一个监听进程恢复了正常,即能继续完成抓取。而另一台服务器上的监听进程不能恢复,即停止了,虽然进程还在且spider仍为Running状态(由日志中得知)。

比较了两个监听进程的堆栈信息(jstack -l pid),发现正常恢复的那个进程有如下两个线程:

"pool-333-thread-1" prio=10 tid=0x00007f6fa8004800 nid=0x1840 waiting on condition [0x00007f6fcd4a6000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00000000f836fb80> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at com.alibaba.druid.pool.DruidDataSource.takeLast(DruidDataSource.java:1431)
at com.alibaba.druid.pool.DruidDataSource.getConnectionInternal(DruidDataSource.java:1076)
at com.alibaba.druid.pool.DruidDataSource.getConnectionDirect(DruidDataSource.java:941)
at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:921)
at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:911)
at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:98)
at org.springframework.jdbc.datasource.DataSourceUtils.doGetConnection(DataSourceUtils.java:111)
at org.springframework.jdbc.datasource.DataSourceUtils.getConnection(DataSourceUtils.java:77)
at org.mybatis.spring.transaction.SpringManagedTransaction.openConnection(SpringManagedTransaction.java:80)
at org.mybatis.spring.transaction.SpringManagedTransaction.getConnection(SpringManagedTransaction.java:66)
at org.apache.ibatis.executor.BaseExecutor.getConnection(BaseExecutor.java:271)
at org.apache.ibatis.executor.SimpleExecutor.prepareStatement(SimpleExecutor.java:69)
at org.apache.ibatis.executor.SimpleExecutor.doUpdate(SimpleExecutor.java:44)
at org.apache.ibatis.executor.BaseExecutor.update(BaseExecutor.java:100)
at org.apache.ibatis.executor.CachingExecutor.update(CachingExecutor.java:75)
at org.apache.ibatis.session.defaults.DefaultSqlSession.update(DefaultSqlSession.java:148)
at sun.reflect.GeneratedMethodAccessor14.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.mybatis.spring.SqlSessionTemplate$SqlSessionInterceptor.invoke(SqlSessionTemplate.java:354)
at com.sun.proxy.$Proxy6.update(Unknown Source)
at org.mybatis.spring.SqlSessionTemplate.update(SqlSessionTemplate.java:250)
at org.apache.ibatis.binding.MapperMethod.execute(MapperMethod.java:49)
at org.apache.ibatis.binding.MapperProxy.invoke(MapperProxy.java:43)
at com.sun.proxy.$Proxy12.updateYoukuVideoUrl(Unknown Source)
at com.tcl.recipevideohunter.service.RecipeVideoService.updateYoukuVideoUrl(RecipeVideoService.java:95)
at com.tcl.recipevideohunter.service.RecipeVideoService$$FastClassByCGLIB$$a1b882c0.invoke(<generated>)
at org.springframework.cglib.proxy.MethodProxy.invoke(MethodProxy.java:204)
at org.springframework.aop.framework.CglibAopProxy$DynamicAdvisedInterceptor.intercept(CglibAopProxy.java:627)
at com.tcl.recipevideohunter.service.RecipeVideoService$$EnhancerByCGLIB$$2e3f9ae0.updateYoukuVideoUrl(<generated>)
at com.tcl.recipevideohunter.pipeline.YoukuRecipeVideoUrlUpdatePipeline.process(YoukuRecipeVideoUrlUpdatePipeline.java:37)
at us.codecraft.webmagic.Spider.processRequest(Spider.java:444)
at us.codecraft.webmagic.Spider$1.run(Spider.java:338)
at us.codecraft.webmagic.thread.CountableThreadPool$1.run(CountableThreadPool.java:74)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


   Locked ownable synchronizers:
- <0x00000000fa5e03a8> (a java.util.concurrent.ThreadPoolExecutor$Worker)


"Thread-342" prio=10 tid=0x00007f6fc0009000 nid=0x183f waiting on condition [0x00007f6fcd6a8000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00000000fa5e2360> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at us.codecraft.webmagic.thread.CountableThreadPool.execute(CountableThreadPool.java:61)
at us.codecraft.webmagic.Spider.run(Spider.java:334)
at java.lang.Thread.run(Thread.java:745)


   Locked ownable synchronizers:
- None


而另外一个只有上述中的一个线程存在,且内容还大不同:

"pool-332-thread-1" prio=10 tid=0x00007f78b8001800 nid=0x437a waiting on condition [0x00007f78e5a59000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00000000f870bb68> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at java.util.concurrent.LinkedBlockingQueue.take(LinkedBlockingQueue.java:442)
at java.util.concurrent.ThreadPoolExecutor.getTask(ThreadPoolExecutor.java:1068)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1130)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)


   Locked ownable synchronizers:
- None


==========================

请问可知其中一个进程僵死的原因是?可有办法避免它? 



如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

孤独患者 2021-12-03 18:05:44
socketTimeout 加上 就好
清晨说ぺ晚安 2021-12-03 17:12:30

经深入分析spider run方法源代码终于知道为什么其中一个抓取进程只会不停连接mysql服务器(即始终在pipeline.process中出不来),而没有JedisConnectionException异常(即没有调用外面的scheduler.poll).

使用的数据源是com.alibaba.druid.pool.DruidDataSource.一旦网络异常,导致获取connection时被block了.如下所示:

"pool-1-thread-1" prio=10 tid=0x00007f40ac002000 nid=0x19c4 waiting on condition [0x00007f40ce6f2000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00000000db0f4318> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)
at com.alibaba.druid.pool.DruidDataSource.takeLast(DruidDataSource.java:1431)
at com.alibaba.druid.pool.DruidDataSource.getConnectionInternal(DruidDataSource.java:1076)
at com.alibaba.druid.pool.DruidDataSource.getConnectionDirect(DruidDataSource.java:941)
at com.alibaba.druid.pool.DruidDataSource.getConnection(DruidDataSource.java:921)

即pipeline.process便不能正常结束,于是CountableThreadPool中74行的 runnable.run();结束不了.

于是Spider run线程就阻塞在 CountableThreadPool中61行的condition.await();处了.见线程快照:

"Thread-1" prio=10 tid=0x00007f40f09b8800 nid=0x19c3 waiting on condition [0x00007f40ceb02000]
   java.lang.Thread.State: WAITING (parking)
at sun.misc.Unsafe.park(Native Method)
- parking to wait for  <0x00000000db090540> (a java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject)
at java.util.concurrent.locks.LockSupport.park(LockSupport.java:186)
at java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2043)

海之角 2021-12-03 14:14:34

你这里不是无穷递归调用, 在Exception中的调用如果再出错就直接抛给上层函数了.因为这里没有再次catch,没有cactch和资源相关的并发操作可能是线程崩溃的原因, 慎重使用concurrent并发容器, 我以前用过感觉这个东西不太稳定, 不如老的可靠好用.

剑心龙吟 2021-12-03 06:09:03

引用来自“黄亿华”的评论

"pool-332-thread-1" 是JDK ThreadPoolExecutor没有任务时显示的线程,光看这个无法推断出什么原因。

有可能是scheduler拿不到新的内容了。

平定天下 2021-12-02 20:50:57

"pool-332-thread-1" 是JDK ThreadPoolExecutor没有任务时显示的线程,光看这个无法推断出什么原因。

有可能是scheduler拿不到新的内容了。

深巷少女 2021-12-02 19:42:29

什么原因使其退出的呢?

眼眸里的那抹悲凉 2021-11-29 05:38:45

一边两条线程,一边只有一条线程,这不是告诉你其中一条线程被退出了吗。。。

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文