代理程序崩溃后重新启动

发布于 2024-10-21 11:57:34 字数 1271 浏览 5 评论 0原文

考虑一个分布式银行应用程序，其中分布式代理机器修改全局变量的值：说“余额”，

因此，代理的请求排队。请求的形式是代表特定代理将值添加到全局变量。因此，代理的代码的形式如下：

  agent
    {
     look_queue(); // take a look at the leftmost request on queue without dequeuing

     lock_global_variable(balance,agent_machine_id);    
     /////////////////////  **POINT A**
     modify(balance,value);
     unlock_global_variable(balance,agent_machine_id);  
     /////////////////// **POINT B**
     dequeue();      //  once transaction is complete, request can be dequeued
    }

现在，如果代理的代码在 B 点崩溃，那么显然不应再次处理请求，否则对于同一请求，变量将被修改两次。为了避免这种情况，我们可以使代码原子化，因此：

agent
{
 look_queue(); // take a look at the leftmost request on queue without dequeuing

 *atomic*
 {   
  lock_global_variable(balance,agent_machine_id); 
  modify(balance,value);
  unlock_global_variable(balance,agent_machine_id);
  dequeue();      //  once transaction is complete, request can be dequeued
 }
}

我正在寻找这些问题的答案：

如何识别代码中需要“自动”原子执行的点？
如果代码在执行过程中崩溃，“记录事务和变量值”会有多大帮助？还有其他方法可以解决代理崩溃的问题吗？
同样，日志记录无法扩展到具有大量变量的大型应用程序。在这种情况下我们能做什么——而不是从头开始执行？
一般来说，在代理协同工作的情况下如何识别此类原子块。如果一个代理失败，其他代理必须等待它重新启动？软件测试如何帮助我们识别潜在的情况，其中如果代理崩溃，就会观察到不一致的程序状态。
如何让原子块更细粒度，减少性能瓶颈？

原文

Consider a distributed bank application, wherein distributed agent machines modify the value of a global variable : say "balance"

So, the agent's requests are queued. A request is of the form wherein value is added to the global variable on behalf of the particular agent. So,the code for the agent is of the form :

  agent
    {
     look_queue(); // take a look at the leftmost request on queue without dequeuing

     lock_global_variable(balance,agent_machine_id);    
     /////////////////////  **POINT A**
     modify(balance,value);
     unlock_global_variable(balance,agent_machine_id);  
     /////////////////// **POINT B**
     dequeue();      //  once transaction is complete, request can be dequeued
    }

Now, if an agent's code crashes at POINT B, then obviously the request should not be processed again, otherwise the variable will be modified twice for the same request. To avoid this, we can make the code atomic, thus :

agent
{
 look_queue(); // take a look at the leftmost request on queue without dequeuing

 *atomic*
 {   
  lock_global_variable(balance,agent_machine_id); 
  modify(balance,value);
  unlock_global_variable(balance,agent_machine_id);
  dequeue();      //  once transaction is complete, request can be dequeued
 }
}

I am looking for answers to these questions :

How to identify points in code which need to be executed atomically 'automatically' ?
IF the code crashes during executing, how much will "logging the transaction and variable values" help ? Are there other approaches for solving the problem of crashed agents ?
Again,logging is not scalable to big applications with large number of variables. What can we in those case - instead of restarting execution from scratch ?
In general,how can identify such atomic blocks in case of agents that work together. If one agent fails, others have to wait for it to restart ? How can software testing help us in identifying potential cases, wherein if an agent crashes, an inconsistent program state is observed.
How to make the atomic blocks more fine-grained, to reduce performance bottlenecks ?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

哎呦我呸! 2024-10-28 11:57:34

Q> 如何识别代码中需要“自动”原子执行的点？
A> 任何时候，当有任何有状态的东西在不同的上下文之间共享时（不一定所有各方都需要是变异者，至少有一个就足够了）。就您而言，不同代理之间共享余额。

Q> 如果代码在执行过程中崩溃，“记录事务和变量值”会有多大帮助？还有其他方法可以解决代理崩溃的问题吗？
A> 它可以提供帮助，但成本很高。您需要回滚 X 条目、重播场景等。更好的方法是使其成为全事务型或具有有效的自动回滚场景。

Q> 同样，日志记录无法扩展到具有大量变量的大型应用程序。在这种情况下我们能做什么 - 而不是从头开始执行？
A> 在某些情况下，您可以放宽一致性。例如， CopyOnWriteArrayList 执行以下操作：并发 write-behind 并在数据可用后为新读取器打开数据。如果写入失败，它可以安全地丢弃该数据。还有比较和交换。另请参阅上一个问题的链接。

Q> 一般来说，在代理协同工作的情况下如何识别此类原子块。
A> 查看您的第一个问题。

Q> 如果一个代理失败，其他代理必须等待它重新启动？
A> 大多数策略/API 定义关键部分执行的最大超时，否则系统可能会陷入永久死锁。

Q> 软件测试如何帮助我们识别潜在的情况，其中如果代理崩溃，就会观察到不一致的程序状态。
A> 在某种程度上可以。然而，测试并发代码需要的技能与编写代码本身一样多，甚至更多。

Q> 如何让原子块更细粒度，减少性能瓶颈？
A> 你自己已经回答了这个问题 :) 如果一个原子操作需要修改 10 个不同的共享状态变量，那么除了尝试将外部合约下推以使其需要修改之外，你无能为力更多的。这几乎就是数据库不像 NoSQL 存储那样可扩展的原因——它们可能需要修改依赖的外键、执行触发器等。或者尝试提高不变性。

如果您是 Java 程序员，我绝对建议您阅读这个书。我确信其他语言也有很好的对应物。

Q> How to identify points in code which need to be executed atomically 'automatically' ?
A> Any time, when there's anything stateful shared across different contexts (not necessarily all parties need to be mutators, enough to have at least one). In your case, there's balance that is shared between different agents.

Q> IF the code crashes during executing, how much will "logging the transaction and variable values" help ? Are there other approaches for solving the problem of crashed agents ?
A> It can help, but it has high costs attached. You need to rollback X entries, replay the scenario, etc. Better approach is to either make it all-transactional or have effective automatic rollback scenario.

Q> Again, logging is not scalable to big applications with large number of variables. What can we in those case - instead of restarting execution from scratch ?
A> In some cases you can relax consistency. For example, CopyOnWriteArrayList does a concurrent write-behind and switches data on for new readers after when it becomes available. If write fails, it can safely discard that data. There's also compare and swap. Also see the link for the previous question.

Q> In general,how can identify such atomic blocks in case of agents that work together.
A> See your first question.

Q> If one agent fails, others have to wait for it to restart ?
A> Most of the policies/APIs define maximum timeouts for critical section execution, otherwise risking the system to end up in a perpetual deadlock.

Q> How can software testing help us in identifying potential cases, wherein if an agent crashes, an inconsistent program state is observed.
A> It can to a fair degree. However testing concurrent code requires as much skills as to write the code itself, if not more.

Q> How to make the atomic blocks more fine-grained, to reduce performance bottlenecks?
A> You have answered the question yourself :) If one atomic operation needs to modify 10 different shared state variables, there's nothing much you can do apart from trying to push the external contract down so it needs to modify more. This is pretty much the reason why databases are not as scalable as NoSQL stores - they might need to modify depending foreign keys, execute triggers, etc. Or try to promote immutability.

If you were Java programmer, I would definitely recommend reading this book. I'm sure there are good counterparts for other languages, too.

回复收藏 0 原文

~没有更多了~