使用大文件作为消息负载进行发布/订阅
我们现有的系统可以持续处理大量文件。粗略地说,每天大约有 300 万个文件,大小从几千字节到超过 50 MB。这些文件从接收到完成使用会经历几个不同的处理阶段,具体取决于它们所采用的路径。由于这些文件的内容和格式,它们无法分解为更小的块。
目前,这些文件移动的工作流程是严格的,并且由具有固定输入和输出的代码决定(在许多情况下,一个订阅者成为一组新文件的发布者)。然而,缺乏灵活性开始给我们带来问题,因此我正在寻找某种发布/订阅解决方案来处理新的需求。
大多数传统的发布/订阅解决方案都在实际有效负载中包含数据,但潜在的大文件大小超出了许多消息传递平台的限制。此外,我们还拥有多个平台:文件根据其路径在 Linux 和 Windows 层中进行。
有没有人有任何考虑到以下目标的设计和/或实施建议?
1. pub 和 sub 的多平台(Linux 和 Windows)
2. 持久存储/存储转发支持
3. 可以处理大型事件负载,并在所有订阅者都得到服务后进行适当的清理
4.路由/工作流通过配置完成
5.订阅者可以根据不断变化的条件订阅一组经过过滤的已发布事件(例如,只给我特定类型的文件)
我已经深入研究了许多服务总线和 MQ 实现,但还没有完全了解能够确定足够的设计方法来正确评估哪些工具最有意义。感谢您的任何意见。
We have an existing system that processes a lot of files on an ongoing basis. Roughly speaking, about 3 million files a day that can range in size from a few kilobytes to in excess of 50 MB. These files go through a few different stages of processing from the time they are received to when they are finished being consumed, depending on the path they take. Due to the content and format of these files, they can NOT be broken up into smaller chunks.
Currently, the workflow these files move through is rigid and dictated by the code with fixed inputs and outputs (in many cases, where one subscriber becomes the publisher for a new set of files). This lack of flexibility is starting to cause us issues however so I'm looking at some kind of pub/sub solution for being able to handle new requirements.
Most traditional pub/sub solutions have the data within the actual payload, but the large potential file sizes exceed the limits of many messaging platforms. Furthermore, we have multiple platforms in play: files progress through both Linux and Windows tiers depending on their path.
Does anyone have any design and/or implementation recommendations with the following goals in mind?
1. Multiplatform for both pub and sub (Linux and Windows)
2. Persistent storage/store-and-forward support
3. Can handle large event payloads and appropriately cleans up once all subscribers have been serviced
4. Routing/workflow is done via configuration
5. Subscribers can subscribe to a filtered set of published events based on changing criteria (e.g. only give me files of a specific type)
I've done a bunch of digging into a number of service bus and MQ implementations, but haven't quite been able to firm up enough of a design approach to properly evaluate what tools make the most sense. Thanks for any input.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
A1。我在之前的工作中开发了类似的系统。
我们没有在消息中传递多 MB 的有效负载,而是将其存储在文件服务器上,并且只传递 UNC 文件名(消息传递是 Java RMI,但几乎任何东西都可以工作)。
A2。我最近开始使用Windows Communication Foundation。对我来说幸运的是,我只支持Windows,而且我不需要这么大的消息。然而,文档称该协议是独立于平台的,并且可以选择使用其 流式消息传输功能。
在这两种情况下,我认为您必须在自己的代码中满足#4 和#5 要求。
A1. I developed similar system on my previous job.
We didn't pass the multi-MB payload inside the message, instead we stored it on the file server, and only passed the UNC file name (the messaging was Java RMI, but pretty much anything will work).
A2. I recently started to use Windows Communication Foundation. Fortunately for me, I'm only supporting Windows, and I don't need such big messages. However the documentation says the protocol is platform-independent, and there's the option to pass huge chunks of data using its streaming message transfer feature.
In both cases, I think you'll have to fulfill your #4 and #5 requirements in your own code.
如果您的客户端是内部客户端,您可能需要研究 ActiveMQ。 ActiveMQ 确实支持高达 2GB 的数据(我认为),并且还支持 blob 消息。它保证交付和处理(通过交易)。
希望这有帮助。
You may want to look into ActiveMQ if your clients are internal clients. ActiveMQ does support up to 2GB of data (I think) and also support blob messages. It guarantees delivery and processing (with transactions).
Hope this helps.