高效传输控制台数据、焦油和数据gzip/ bzip2 无需创建中间文件
Linux环境。因此,我们有这个程序“t_show”,当使用 ID 执行时,将在控制台上写入该 ID 的价格数据。没有其他方法可以获取此数据。
我需要使用最小带宽、最小连接数在两台服务器之间复制 ID 1-10,000 的价格数据。在目标服务器上,数据将是每个 id 的单独文件,格式如下:
<id>.dat
像这样的解决方案将是冗长的:
dest:
files=`seq 1 10000`
for id in `echo $files`;
do
./t_show $id > $id
done
tar cf - $files | nice gzip -c > dat.tar.gz
source:
scp user@source:dat.tar.gz ./
gunzip dat.tar.gz
tar xvf dat.tar
也就是说,将每个输出写入其自己的文件,压缩并保存。 tar,通过网络发送,提取。
它有一个问题,我需要为每个 id 创建一个新文件。这会占用大量空间并且无法很好地扩展。
是否可以将控制台输出直接写入(压缩的)tar 存档而不创建中间文件?有更好的想法(也许直接通过网络写入压缩数据,跳过 tar)?
正如我所说,tar 存档需要在目标服务器上提取为每个 ID 的单独文件。
感谢任何花时间提供帮助的人。
Linux environment. So, we have this program 't_show', when executed with an ID will write price data for that ID on the console. There is no other way to get this data.
I need to copy the price data for IDs 1-10,000 between two servers, using minimum bandwidth, minimum number of connections. On the destination server the data will be a separate file for each id with the format:
<id>.dat
Something like this would be the long-winded solution:
dest:
files=`seq 1 10000`
for id in `echo $files`;
do
./t_show $id > $id
done
tar cf - $files | nice gzip -c > dat.tar.gz
source:
scp user@source:dat.tar.gz ./
gunzip dat.tar.gz
tar xvf dat.tar
That is, write each output to its own file, compress & tar, send over network, extract.
It has the problem that I need to create a new file for each id. This takes up tonnes of space and doesn't scale well.
Is it possible to write the console output directly to a (compressed) tar archive without creating the intermediate files? Any better ideas (maybe writing compressed data directly across network, skipping tar)?
The tar archive would need to extract as I said on the destination server as a separate file for each ID.
Thanks to anyone who takes the time to help.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我认为这不适用于普通的 bash 脚本。但是您可以查看 Perl 或其他脚本语言的
Archive::TAR
模块。Perl 模块有一个函数
add_data
创建一个即时“文件”并将其添加到存档中以便通过网络进行流式传输。文档可以在这里找到:
I don't think this is working with a plain bash script. But you could have a look at the
Archive::TAR
module for perl or other scripting languages.The Perl Module has a function
add_data
to create a "file" on the fly and add it to the archive for streaming accros the network.The Documentation is found here:
没有 tar,你可以做得更好:
唯一的区别是你不会得到不同 ID 之间的界限。
现在将其放入脚本中,说
show_me_the_ids
并从客户端执行操作,它们就在那里!
或者,您可以指定
-C
标志来压缩 SSH 连接并一起删除 gzip /gunzip 使用。如果您真的很喜欢它,您可以尝试 ssh -C 、 gzip -9 和其他压缩程序。
就我个人而言,我会押注
lzma -9
。You can do better without tar:
The only difference is that you will not get the boundaries between different IDs.
Now put that in a script, say
show_me_the_ids
and do from the clientAnd there they are!
Or either, you can specify the
-C
flag to compress the SSH connection and remove the gzip / gunzip uses all together.If you are really into it you may try
ssh -C
,gzip -9
and other compression programs.Personally I'll bet for
lzma -9
.我会尝试这个:
这将打印“1:ValueOfID1”到标准输出,它通过ssh传输到目标主机,您可以在其中启动导入脚本或程序,它从标准输入读取行。
华泰
I would try this:
This will print "1: ValueOfID1" to standardout, which a transfered via ssh to the destination host, where you can start your importscript or program, which reads the lines from standardin.
HTH
谢谢大家,
我接受了建议“只需发送以某种方式格式化的数据并在接收器上解析它”,这似乎是共识。为了简单起见,跳过 tar 并使用 ssh -C。
Perl 脚本。将 ids 分成 1000 个组。ID 是哈希表中的 source_id。所有数据都通过单个 ssh 发送,由“HEADER”分隔,因此它会写入适当的文件。这效率很多:
Thanks all
I've taken the advice 'just send the data formatted in some way and parse it on the the receiver', it seems to be the consensus. Skipping tar and using ssh -C for simplicity.
Perl script. Breaks the ids into groups of 1000. IDs are source_id in hash table. All data is sent via single ssh, delimited by 'HEADER', so it writes to the appropriate file. This is a lot more efficient:
您可以发送以某种方式格式化的数据并在接收器上解析它。
发送方上的 foo.sh:
接收方上:
ssh -C
在传输过程中压缩数据You could just send the data formatted in some way and parse it on the the receiver.
foo.sh on the sender:
On the receiver:
ssh -C
compresses the data during transfer您至少可以通过 ssh 连接来压缩内容:
但是,我不知道如何在没有中间文件的情况下填充存档。
编辑:好的,我想你可以通过手动编写 tar 文件来完成。标头在此处指定,看起来并不太复杂,但是这不完全是我想象中的方便...
You can at least
tar
stuff over a ssh connection:How to populate the archive without intermediary files however, I don't know.
EDIT: Ok, I suppose you could do it by writing the tar file manually. The header is specified here and doesn't seem too complicated, but that isn't exactly my idea of convenient...