C 数据流库
如何在 C 中进行数据流(管道和过滤器、流处理、基于流)?而 UNIX 管道则不然。
我最近遇到了 stream.py。
流是具有管道机制的可迭代对象,可实现数据流编程和轻松并行化。
这个想法是获取一个将一个可迭代对象转换为另一个可迭代对象的函数的输出,并将其作为另一个此类函数的输入。虽然您已经可以使用函数组合来做到这一点,但该包通过重载 >> 为其提供了一种优雅的表示法。运算符。
我想在 C 中复制此类功能的简单版本。我特别喜欢 >>
运算符的重载,以避免函数组合混乱。维基百科从新闻组指向此提示发表于 1990 年。
为什么是 C?因为我希望能够在微控制器和其他高级语言(Max、Pd*、Python)的 C 扩展中执行此操作。
*(讽刺的是,Max 和 Pd 是用 C 语言编写的,专门用于此目的 - 我正在寻找准系统的东西)
How can I do dataflow (pipes and filters, stream processing, flow based) in C? And not with UNIX pipes.
I recently came across stream.py.
Streams are iterables with a pipelining mechanism to enable data-flow programming and easy parallelization.
The idea is to take the output of a function that turns an iterable into another iterable and plug that as the input of another such function. While you can already do this using function composition, this package provides an elegant notation for it by overloading the >> operator.
I would like to duplicate a simple version of this kind of functionality in C. I particularly like the overloading of the >>
operator to avoid function composition mess. Wikipedia points to this hint from a Usenet post in 1990.
Why C? Because I would like to be able to do this on microcontrollers and in C extensions for other high level languages (Max, Pd*, Python).
* (ironic given that Max and Pd were written, in C, specifically for this purpose – I'm looking for something barebones)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我知道,这不是一个好的答案,但您应该制作自己的简单数据流框架。
我(和我的一个朋友一起)编写了一个原型 DF 服务器,它有几个尚未实现的功能:它只能在消息中传递 Integer 和 Trigger 数据,并且不支持并行性。我刚刚跳过了这项工作:组件的生产者端口有一个指向消费者端口的函数指针列表,这些函数指针是在初始化时设置的,并且它们调用它(如果列表不为空)。因此,当事件触发时,组件会执行数据流图的树状遍历。当他们使用整数和触发器时,速度非常快。
另外,我编写了一个奇怪的组件,它有一个消费者和一个生产者端口,它只是简单地传递数据 - 但在另一个线程中。它的消费者例程很快完成,因为它只是放入数据并为生产者端线程设置一个标志。很脏,但它满足了我的需要:它分离了树上小道的漫长过程。
因此,正如您可能认识到的那样,它是一个用于快速任务的低流量异步系统,其中图形大小并不重要。
不幸的是,您的问题与我的问题有很多不同,就像许多数据流系统可能与另一个不同一样,您需要一个同步、并行、流处理解决方案。
我认为,DF 服务器中最大的问题是调度程序。并发、冲突、线程、优先级……正如我所说,我只是跳过了问题,没有解决。你也应该跳过它。而且您还应该跳过其他问题。
Dispatcher
对于同步 DF 架构,除特殊情况外,所有组件必须每个周期运行一次。他们有一个简单的前提条件:输入数据可用吗?因此,您应该只扫描组件,然后将它们传递给空闲的调用者线程(如果数据可用)。处理完所有这些之后,您将剩下 N 个尚未处理的组件。您应该再次处理该列表。第二次处理后,您将剩下 M 个。如果N == M,则循环结束。
我认为,如果组件数量低于 100,某种相同的东西将会起作用。
绑定 是
的,最好的绑定方式是可视化编程。在完成编辑器之前,类似配置的代码应该使用 insetad,例如:
它很容易编写,可读性很好,还有其他愿望吗?
消息
您应该在组件的端口之间传递纯原始数据包。您只需要一个绑定列表,其中包含生产者和消费者端口的指针对,并包含“调度程序”使用的已处理标志。
调用问题
问题是生产者不应该调用消费者端口,而应该调用组件;所有组件(类)变量和触发都在组件中。因此,生产者应该直接调用组件的公共入口点,将消费者的 ID 传递给它,或者应该调用端口,该端口应该调用它所属组件的任何方法。
因此,如果您可以忍受一些限制,我建议您继续编写您的精简版框架。这是一项很好的任务,但是编写小组件并看看它们如何智能地连接在一起构建一个伟大的应用程序是终极的乐趣。
如果您还有其他问题,请随时提问,我经常在这里扫描“数据流”关键字。
也许,您可以为您的程序找出一个更简单的数据流模型。
I know, that it's not a good answer, but you should make your own simple dataflow framework.
I've written a prototype DF server (together with a friend of mine), which have several unimplemented features yet: it can only pass Integer and Trigger data in messages, and it does not supports paralellism. I've just skipped this work: the components' producer ports have a list of function pointers to consumer ports, which are set up upon the initialization, and they call it (if the list is not empty). So, when an event fires, the components perform a tree-like walk-thru of the dataflow graph. As they work with Integers and Triggers, it's extremly quick.
Also, I've written a strange component, which have one consumer and one producer port, it just simply passes the data thru - but in another thread. It's consumer routine finishes quickly, as it just puts the data and sets a flag to the producer-side thread. Dirty, but it suits my needs: it detaches long processes of the tree-walk.
So, as you may recognized, it's a low-traffic asynchronous system for quick tasks, where the graph size does not matter.
Unfortunatelly, your problem differs as many points from mine, just as many one dataflow system can differ from another, you need a synchronous, paralell, stream handling solution.
I think, the biggest issue in a DF server is the dispatcher. Concurrency, collision, threads, priority... as I said, I've just skipped the problem, not solved. You should skip it, too. And you also should skip other problems.
Dispatcher
In case of a synchronous DF architecture, all the components must run once per cycle, except special cases. They have a simple precondition: is the input data available? So, you should just to scan thru the components, and pass them to a free caller thread, if data is available. After processing all of them, you will have N remaining components, which haven't processed. You should process the list again. After the second processing you will have M remainings. If N == M, the cycle is over.
I think some kind of same stuff will work, if the number of components is below only 100.
Binding
Yep, the best way of binding is the visual programming. Until finishing the editor, config-like code should used insetad, something like:
It's easy to write, well-readable, other wish?
Message
You should pass pure raw packets among components' ports. You need only a list of bindings, which contain pairs of pointers of producer and consumer ports, and contains the processed flag, which the "dispatcher" uses.
Calling issue
The problem is that producer should not call the consumer port, but the component; all component (class) variables and firings are in the component. So, the producer should call the component's common entry point directly, passing the consumer's ID to it, or it should call the port, which should call any method of the component which it belongs.
So, if you can live with some restrictions, I say, go ahead, and write your lite framework. It's a good task, but writing small components and see, how smart can they wired together building a great app is the ultimate fun.
If you have further questions, feel free to ask, I often scan the "dataflow" keyword here.
Possibly, you can figure out a more simple dataflowish model for your program.
这很酷:http://code.google.com/p/libconcurrency/
This is cool: http://code.google.com/p/libconcurrency/
我不知道有任何图书馆有这样的用途。我的朋友在实验室作业中实现了类似的东西。此类系统的主要问题是性能低下(如果长管道中的函数很小,那真的很糟糕)和潜在需要实现调度(检测死锁并提高优先级以避免管道缓冲区过载)。
从我处理类似数据的经验来看,错误处理是相当繁重的。由于管道中的函数对上下文知之甚少(有意为了可重用性),它们无法生成合理的错误消息。人们可以实现内嵌错误处理——将错误作为数据沿着管道传递——但这需要在整个地方进行特殊处理,特别是在输出上,因为流不可能与错误对应的输入相关联。
考虑到该方法的已知性能问题,我很难想象它如何适合微控制器。就性能而言,没有什么比普通函数更好的了:可以为数据管道中的每条路径创建一个函数。
也许您可以寻找一些 Petri net 实现(模拟器或代码生成器),因为它们是一个流的理论基础。
I'm not aware of any library for such purpose. Friend of mine implemented something similar in versity as a lab assignment. Main problems of such systems is low performance (really bad if functions in long pipe-lines are smallish) and potential need to implement scheduling (detecting dead-locks and boosting priority to avoid overload of pipe buffer).
From my experience with similar data processing, error handling is quite burdensome. Since functions in the pipeline know little of the context (intentionally, for reusability) they can't produce sensible error message. One can implement in-line error handling - passing errors down the pipe as data - but that would require special handling all over the place, especially on the output as it is not possible with streams to correlate to what input the error corresponds.
Considering known performance problems of the approach, it is hard for me to imagine how that would fit microcontrollers. Performance-wise, nothing beats a plain function: one can create a function for every path through the data pipe-line.
Probably you can look for some Petri net implementation (simulator or code generator), as they are one of the theoretical base for streams.