无限长管道技术
有两种非常简单的方法可以让一个程序向另一个程序发送数据流:
- Unix 管道、TCP 套接字或类似的东西。这需要消费者程序不断关注,否则生产者程序会阻塞。即使增加缓冲区(通常很小的默认值),这仍然是一个大问题。
- 普通文件 - 生产者程序附加
O_APPEND
,消费者只需读取方便时可用的任何新数据。这不需要任何同步(只要磁盘空间可用),但 Unix 文件仅支持在末尾截断,而不是在开头截断,因此它将填满磁盘,直到两个程序退出。
有没有一种简单的方法可以实现这两种方式,将数据存储在磁盘上直到被读取,然后释放?显然,程序可以通过数据库服务器或类似的东西进行通信,并且不会出现这个问题,但我正在寻找与普通 Unix 管道集成良好的东西。
There are two really simple ways to let one program send a stream of data to another:
- Unix pipe, or TCP socket, or something like that. This requires constant attention by consumer program, or producer program will block. Even increasing buffers their typically tiny defaults, it's still a huge problem.
- Plain files - producer program appends with
O_APPEND
, consumer just reads whatever new data became available at its convenience. This doesn't require any synchronization (as long as diskspace is available), but Unix files only support truncating at the end, not at beginning, so it will fill up disk until both programs quit.
Is there a simple way to have it both ways, with data stored on disk until it gets read, and then freed? Obviously programs could communicate via database server or something like that, and not have this problem, but I'm looking for something that integrates well with normal Unix piping.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
一个相对简单的手卷解决方案。
您可以让制作者创建文件并继续写入,直到达到一定大小/数量的记录,无论适合您的应用程序。然后,制作者关闭该文件并使用商定的命名算法启动一个新文件。
消费者从文件中读取新记录,然后当达到约定的最大大小时关闭并取消链接,然后打开下一个记录。
A relatively simple hand-rolled solution.
You could have the producer create files and keep writing until it gets to a certain size/number of record, whatever suits your application. The producer then closes the file and starts a new one with an agreed naming algorithm.
The consumer reads new records from a file then when it gets to the agreed maximum size closes and unlinks it and then opens the next one.
如果您的数据可以分为某种类型的块或事务,则可以使用带有序列号的文件方法。数据生产者将第一个兆字节的数据存储在
outfile.1
中,下一个兆字节数据存储在outfile.2
中,以此类推。消费者可以按顺序读取文件,并在读取时删除它们。因此,您会得到类似于第二种方法的东西,并一路进行清理。您可能应该将所有这些包装在一个库中,以便从应用程序的角度来看,这是某种管道。
If your data can be split into blocks or transactions of some sort, you can use the file method for this with a serial number. The data producer would store the first megabyte of data in
outfile.1
, the next inoutfile.2
etc. The consumer can read the files in order and delete them when read. Thus you get something like your second method, with cleanup along the way.You should probably wrap all this in a library, so that from the applications point of view this is a pipe of some sort.
您应该阅读一些有关 socat 的文档。您可以使用它来弥补 tcp 套接字、fifo 文件、管道、stdio 等之间的差距。
如果您感到懒惰,这里有一些不错的示例 有用的命令。
You should read some documentation on socat. You can use it to bridge the gap between tcp sockets, fifo files, pipes, stdio and others.
If you're feeling lazy, there's some nice examples of useful commands.
我什么都不知道,但编写一个以目录作为参数(或使用 $TMPDIR)的小实用程序应该不会太难;并且,使用 select/poll 在从 stdin 读取、分页到一系列临时文件以及写入 stdout 之间进行多路复用。
I'm not aware of anything, but it shouldn't be too hard to write a small utility that takes a directory as an argument (or uses $TMPDIR); and, uses select/poll to multiplex between reading from stdin, paging to a series of temporary files, and writing to stdout.