在 Haskell 中管理有状态计算系统
因此,我有一个链接在一起的有状态处理器系统。例如,处理器可能会输出最后 10 个输入的平均值。它需要状态来计算这个平均值。
我想向系统提交值并获取输出。我也想跳回来,恢复过去随时的状态。 IE。我通过系统运行了 1000 个值。现在我想将系统“移”回到发送第 500 个值后的状态。然后我想从那时起再次“重播”系统。
我还需要能够将历史状态保存到磁盘,以便我可以在将来的某个时间再次恢复整个事情(并且仍然可以回退和重播功能)。当然,我需要使用千兆字节的数据来完成此操作,并且速度非常快:)
我一直在使用闭包来保持状态来接近它。但我想知道使用 monad 是否更有意义。我只读过 3 或 4 个 monad 的类比,所以还不太理解它们,所以请随意教育我。
如果每个处理器以保留其历史记录的方式修改其在 monad 中的状态,并且将其与每个处理步骤的 id 绑定在一起。然后,monad 能够以某种方式将其状态切换到过去的步骤 ID,并在该状态下以 monad 运行系统。并且 monad 将具有某种机制来将自身序列化(反序列化)以进行存储。
(考虑到数据的大小......它实际上不应该全部都在内存中,这意味着 monad 需要映射到磁盘、缓存等......)
是否有现有的库/机制/已经完成或协助完成我想做的事情的方法/概念?
So, I have a system of stateful processors that are chained together. For example, a processor might output the average of its last 10 inputs. It requires state to calculate this average.
I would like to submit values to the system, and get the outputs. I also would like to jump back and restore the state at any time in the past. Ie. I run 1000 values through the system. Now I want to "move" the system back to exactly as it was after I had sent the 500th value through. Then I want to "replay" the system from that point again.
I also need to be able to persist the historical state to disk so I can restore the whole thing again some time in the future (and still have the move back and replay functions work). And of course, I need to do this with gigabytes of data, and have it be extremely fast :)
I had been approaching it using closures to hold state. But I'm wondering if it would make more sense to use a monad. I have only read through 3 or 4 analogies for monads so don't understand them well yet, so feel free to educate me.
If each processor modifies its state in the monad in such a way that its history is kept and it is tied to an id for each processing step. And then somehow the monad is able to switch its state to a past step id and run the system with the monad in that state. And the monad would have some mechanism for (de)serializing itself for storage.
(and given the size of the data... it really shouldn't even all be in memory, which would mean the monad would need to be mapped to disk, cached, etc...)
Is there an existing library/mechanism/approach/concept that has already been done to accomplish or assist in accomplishing what I'm trying to do?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
首先,听起来您所拥有的不仅仅是“有状态处理器”,而是类似于有限状态机和/或传感器。这可能是一个很好的研究起点。
当然,这里最简单的方法是简单地保留所有先前状态的日志。但由于听起来您拥有大量数据,因此所需的存储很容易变得令人望而却步。我建议考虑如何以可以避免这种情况的方式构建处理器,例如:
显式状态是您的朋友。函数是表示活动状态机的便捷方法,但它们不能以任何简单的方式序列化。您希望将(基本上静态的)处理器网络与每个处理器用于计算下一步的一系列内部状态完全分离。
如果您的大多数处理器类似于有限状态传感器,并且您需要拥有接受各种类型的输入并产生不同类型的输出的处理器,那么您可能想要的实际上是 具有基于
Arrow
s 的结构,它为您提供了组成事物的抽象从某种意义上说,“类似功能”,例如,将一个处理器的输入连接到另一处理器的输出。此外,只要您避免使用 ArrowApply 类并确保您的状态机模型仅返回输出值和新状态,就可以保证避免隐式状态,因为(与函数不同)< code>箭头不会自动更高阶。
考虑到处理器网络的静态表示,它应该提供一个增量输入/输出系统来读取数据、序列化/反序列化状态以及写入任何输出也不是太困难。
作为一个快速、粗略的起点,下面是我上面概述的可能是最简单版本的示例,暂时忽略日志记录问题:
它是一个简单的通用处理器,具有
s,输入类型为
a
,输出类型为b
。 runTransducer 函数推送输入列表,手动更新状态值,并收集输出列表。PS——既然你问的是单子,你可能想知道我给出的例子是否是单子。事实上,它是多个常见单子的组合,尽管具体是哪一个取决于你如何看待它。 但是,我故意避免将其视为单子!问题是,单子仅捕获在某种意义上非常强大的抽象,但同样的能力也使它们在某些方面更能抵抗优化和静态分析。需要排除的主要问题是采用其他处理器作为输入并运行它们的处理器,这(正如您可以想象的)可以创建几乎无法分析的复杂逻辑。
因此,虽然处理器可能是单子,并且在某种逻辑意义上本质上是单子,但假装它们不是单子可能更有用;施加人为限制以使静态分析更简单。
First of all, it sounds like what you have are not just "stateful processors" but something like finite-state machines and/or transducers. This is probably a good place to start for research.
The simplest approach here, of course, is to simply keep a log of all prior states. But since it sounds like you have a great deal of data, the storage needed could easily become prohibitive. I would recommend thinking about how you might construct your processors in a way that could avoid this, e.g.:
Explicit state is your friend. Functions are a convenient way to represent active state machines, but they can't be serialized in any simple way. You want a clean separation of a (basically static) network of processors vs. a series of internal states used by each processor to calculate the next step.
If most of your processors resemble finite state transducers, and you need to have processors that take inputs of various types and produce different types of outputs, what you probably want is actually something with a structure based on
Arrow
s, which gives you an abstraction for things that compose "like functions" in some sense, e.g., connecting the input of one processor to the output of another.Furthermore, as long as you avoid the
ArrowApply
class and make sure that your state machine model only returns an output value and a new state, you'll be guaranteed to avoid implicit state because (unlike functions)Arrow
s aren't automatically higher-order.Given a static representation of your processor network, it shouldn't be too difficult to also provide an incremental input/output system that would read the data, serialize/deserialize the state, and write any output.
As a quick, rough starting point, here's an example of probably the simplest version of what I've outlined above, ignoring the logging issue for the moment:
It's a simple, generic processor with explicit internal state of type
s
, input of typea
, and output of typeb
. TherunTransducer
function shoves a list of inputs through, updating the state value manually, and collects a list of outputs.P.S. -- since you were asking about monads, you might want to know if the example I gave is one. In fact, it's a combination of multiple common monads, though which ones depends on how you look at it. However, I've deliberately avoided treating it as a monad! The thing is, monads capture only abstractions that are in some sense very powerful, but that same power also makes them more resistant in some ways to optimization and static analysis. The main thing that needs to be ruled out is processors that take other processors as input and run them, which (as you can imagine) can create convoluted logic that's nearly impossible to analyze.
So, while the processors probably could be monads, and in some logical sense intrinsically are, it may be more useful to pretend that they aren't; imposing an artificial limitation in order to make static analysis simpler.