处理与 tbb 连续的大数据
我正在开发 C++ 应用程序来处理大量报价数据,例如。 (MSFT、AMZN 等)待定。我想知道我将如何构建它。我一直在研究parallel_for、pipeline和concurrent_queue。
该过程基本上会解析数据、处理数据并输出到文件。解析和处理可以并行完成,但每个符号的输出应该按顺序进行。
Eg. Input: - Msg #1 - AMZN #1 - Msg #2 - AMZN #2 - Msg #3 - IBM #1 - Msg #4 - AMZN #3 - Msg #5 - CSCO #1 - Msg $6 - IBM #2
我想使用无锁解决方案或最小锁定,但似乎我已保留在并发队列中以保持顺序。
任何想法都会有帮助
谢谢, 大卫
I'm working on c++ app to process large amounts of quote data eg. (MSFT, AMZN, etc) with tbb. And was wondering how I would structure it. I'm been looking at parallel_for and pipeline and concurrent_queue.
The process would basically parse the data, process it and output to file. Parsing and processing can be done in parallel, but output should be in order for each symbol.
Eg. Input: - Msg #1 - AMZN #1 - Msg #2 - AMZN #2 - Msg #3 - IBM #1 - Msg #4 - AMZN #3 - Msg #5 - CSCO #1 - Msg $6 - IBM #2
I would like to use lock-free solution or minimum locking, but it seems like I have keep in concurrent_queue to keep the order.
Any ideas would be helpful
Thanks,
David
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如果您使用管道模式(
tbb::pipeline
类或tbb::parallel_pipeline()
函数),您可以使用有序过滤器来确保输出准确地出现在与收到输入的顺序相同。并且您不需要在代码中加任何锁来进行订购。If you use the pipeline pattern (
tbb::pipeline
class ortbb::parallel_pipeline()
function), you can use ordered filters to ensure the output will appear in exactly the same order as the input was received. And you will not need any locks in your code for ordering.您的报价数据是否有时间戳或序列号
否则,从生产者线程添加序列号,并在解析数据后根据序列号对数据进行排序 - 然后可以批量或在写入文件之前进行重新排序。
Does your quote data either have a timestamp or a sequence number
Otherwise add a sequence number from the producer thread and sort the data based on squence number after parsing it - the resorting can be done then either in a batch or just before the writing of the files.
您可以创建一个输出结构(散列或列表),其中键是显示元素的位置(第一个、第二个……),值是要显示的数据。然后,当所有元素准备就绪时,您可以按所需顺序输出结构。
这样你就不用关心哪个线程先完成。
You can create an output structure (hash or list) where a key is a position of the displayed element (1st, 2nd, ...) and the value is the data to be displayed. Then when all the elements are ready, you can output the structure in the desired order.
This way you don't care about which thread finishes first.