高效传输控制台数据、焦油和数据gzip/ bzip2 无需创建中间文件

发布于 2024-11-30 16:02:47 字数 741 浏览 1 评论 0原文

Linux环境。因此，我们有这个程序“t_show”，当使用 ID 执行时，将在控制台上写入该 ID 的价格数据。没有其他方法可以获取此数据。

我需要使用最小带宽、最小连接数在两台服务器之间复制 ID 1-10,000 的价格数据。在目标服务器上，数据将是每个 id 的单独文件，格式如下：

<id>.dat

像这样的解决方案将是冗长的：

dest:

files=`seq 1 10000`
for id in `echo $files`;
do
    ./t_show $id > $id
done
tar cf - $files | nice gzip -c  > dat.tar.gz

source:

scp user@source:dat.tar.gz ./
gunzip dat.tar.gz
tar xvf dat.tar

也就是说，将每个输出写入其自己的文件，压缩并保存。 tar，通过网络发送，提取。

它有一个问题，我需要为每个 id 创建一个新文件。这会占用大量空间并且无法很好地扩展。

是否可以将控制台输出直接写入（压缩的）tar 存档而不创建中间文件？有更好的想法（也许直接通过网络写入压缩数据，跳过 tar）？

正如我所说，tar 存档需要在目标服务器上提取为每个 ID 的单独文件。

感谢任何花时间提供帮助的人。

原文

Linux environment. So, we have this program 't_show', when executed with an ID will write price data for that ID on the console. There is no other way to get this data.

I need to copy the price data for IDs 1-10,000 between two servers, using minimum bandwidth, minimum number of connections. On the destination server the data will be a separate file for each id with the format:

<id>.dat

Something like this would be the long-winded solution:

dest:

files=`seq 1 10000`
for id in `echo $files`;
do
    ./t_show $id > $id
done
tar cf - $files | nice gzip -c  > dat.tar.gz

source:

scp user@source:dat.tar.gz ./
gunzip dat.tar.gz
tar xvf dat.tar

That is, write each output to its own file, compress & tar, send over network, extract.

It has the problem that I need to create a new file for each id. This takes up tonnes of space and doesn't scale well.

Is it possible to write the console output directly to a (compressed) tar archive without creating the intermediate files? Any better ideas (maybe writing compressed data directly across network, skipping tar)?

The tar archive would need to extract as I said on the destination server as a separate file for each ID.

Thanks to anyone who takes the time to help.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

南城追梦 2024-12-07 16:02:48

我认为这不适用于普通的 bash 脚本。但是您可以查看 Perl 或其他脚本语言的 Archive::TAR 模块。

Perl 模块有一个函数 add_data 创建一个即时“文件”并将其添加到存档中以便通过网络进行流式传输。

文档可以在这里找到：

回复收藏 0 原文

长梦不多时 2024-12-07 16:02:48

没有 tar，你可以做得更好：

#!/bin/bash
for id in `seq 1 1000`
do
    ./t_show $id
done | gzip

唯一的区别是你不会得到不同 ID 之间的界限。

现在将其放入脚本中，说 show_me_the_ids 并从客户端执行操作

shh user@source ./show_me_the_ids | gunzip

，它们就在那里！

或者，您可以指定 -C 标志来压缩 SSH 连接并一起删除 gzip /gunzip 使用。

如果您真的很喜欢它，您可以尝试 ssh -C 、 gzip -9 和其他压缩程序。
就我个人而言，我会押注 lzma -9。

You can do better without tar:

#!/bin/bash
for id in `seq 1 1000`
do
    ./t_show $id
done | gzip

The only difference is that you will not get the boundaries between different IDs.

Now put that in a script, say show_me_the_ids and do from the client

shh user@source ./show_me_the_ids | gunzip

And there they are!

Or either, you can specify the -C flag to compress the SSH connection and remove the gzip / gunzip uses all together.

If you are really into it you may try ssh -C, gzip -9 and other compression programs.
Personally I'll bet for lzma -9.

回复收藏 0 原文

情痴 2024-12-07 16:02:48

我会尝试这个：

(for ID in $(seq 1 10000); do echo $ID: $(/t_show $ID); done) | ssh user@destination "ImportscriptOrProgram"

这将打印“1：ValueOfID1”到标准输出，它通过ssh传输到目标主机，您可以在其中启动导入脚本或程序，它从标准输入读取行。

华泰

I would try this:

(for ID in $(seq 1 10000); do echo $ID: $(/t_show $ID); done) | ssh user@destination "ImportscriptOrProgram"

This will print "1: ValueOfID1" to standardout, which a transfered via ssh to the destination host, where you can start your importscript or program, which reads the lines from standardin.

HTH

回复收藏 0 原文

病女 2024-12-07 16:02:48

谢谢大家，

我接受了建议“只需发送以某种方式格式化的数据并在接收器上解析它”，这似乎是共识。为了简单起见，跳过 tar 并使用 ssh -C。

Perl 脚本。将 ids 分成 1000 个组。ID 是哈希表中的 source_id。所有数据都通过单个 ssh 发送，由“HEADER”分隔，因此它会写入适当的文件。这效率很多：

sub copy_tickserver_files {
my $self = shift;

my $cmd = 'cd tickserver/ ; ';

my $i = 1;

while ( my ($source_id, $dest_id) = each ( %{ $self->{id_translations} } ) ) {
    $cmd .= qq{ echo HEADER $source_id ; ./t_show $source_id ; };
    $i++;
    if ( $i % 1000 == 0 ) {
        $cmd = qq{ssh -C dba\@$self->{source_env}->{tickserver} " $cmd " | };
        $self->copy_tickserver_files_subset( $cmd );
        $cmd = 'cd tickserver/ ; ';
    }
}

$cmd = qq{ssh -C dba\@$self->{source_env}->{tickserver} " $cmd " | };
$self->copy_tickserver_files_subset( $cmd );

}

sub copy_tickserver_files_subset {
my $self = shift;
my $cmd = shift;

my $output = '';
open TICKS, $cmd;
while(<TICKS>) {
    if ( m{HEADER [ ] ([0-9]+) }mxs ) {
        my $id = $1;
        $output = "$self->{tmp_dir}/$id.ts";
        close TICKSOP;
        open TICKSOP, '>', $output;
        next;
    }
    next unless $output;
    print TICKSOP "$_";
}
close TICKS;
close TICKSOP;
}

Thanks all

I've taken the advice 'just send the data formatted in some way and parse it on the the receiver', it seems to be the consensus. Skipping tar and using ssh -C for simplicity.

Perl script. Breaks the ids into groups of 1000. IDs are source_id in hash table. All data is sent via single ssh, delimited by 'HEADER', so it writes to the appropriate file. This is a lot more efficient:

sub copy_tickserver_files {
my $self = shift;

my $cmd = 'cd tickserver/ ; ';

my $i = 1;

while ( my ($source_id, $dest_id) = each ( %{ $self->{id_translations} } ) ) {
    $cmd .= qq{ echo HEADER $source_id ; ./t_show $source_id ; };
    $i++;
    if ( $i % 1000 == 0 ) {
        $cmd = qq{ssh -C dba\@$self->{source_env}->{tickserver} " $cmd " | };
        $self->copy_tickserver_files_subset( $cmd );
        $cmd = 'cd tickserver/ ; ';
    }
}

$cmd = qq{ssh -C dba\@$self->{source_env}->{tickserver} " $cmd " | };
$self->copy_tickserver_files_subset( $cmd );

}

sub copy_tickserver_files_subset {
my $self = shift;
my $cmd = shift;

my $output = '';
open TICKS, $cmd;
while(<TICKS>) {
    if ( m{HEADER [ ] ([0-9]+) }mxs ) {
        my $id = $1;
        $output = "$self->{tmp_dir}/$id.ts";
        close TICKSOP;
        open TICKSOP, '>', $output;
        next;
    }
    next unless $output;
    print TICKSOP "$_";
}
close TICKS;
close TICKSOP;
}

回复收藏 0 原文

回梦 2024-12-07 16:02:47

您可以发送以某种方式格式化的数据并在接收器上解析它。

发送方上的 foo.sh：

#!/bin/bash
for (( id = 0; id <= 10000; id++ ))
do
    data="$(./t_show $id)"
    size=$(wc -c <<< "$data")

    echo $id $size
    cat <<< "$data"
done

接收方上：

ssh -C user@server 'foo.sh'|while read file size; do
    dd of="$file" bs=1 count="$size"
done

ssh -C 在传输过程中压缩数据

You could just send the data formatted in some way and parse it on the the receiver.

foo.sh on the sender:

#!/bin/bash
for (( id = 0; id <= 10000; id++ ))
do
    data="$(./t_show $id)"
    size=$(wc -c <<< "$data")

    echo $id $size
    cat <<< "$data"
done

On the receiver:

ssh -C user@server 'foo.sh'|while read file size; do
    dd of="$file" bs=1 count="$size"
done

ssh -C compresses the data during transfer

回复收藏 0 原文

苍白女子 2024-12-07 16:02:47

您至少可以通过 ssh 连接来压缩内容：

tar -czf - inputfiles | ssh remotecomputer "tar -xzf -"

但是，我不知道如何在没有中间文件的情况下填充存档。

编辑：好的，我想你可以通过手动编写 tar 文件来完成。标头在此处指定，看起来并不太复杂，但是这不完全是我想象中的方便...

You can at least tar stuff over a ssh connection:

tar -czf - inputfiles | ssh remotecomputer "tar -xzf -"

How to populate the archive without intermediary files however, I don't know.

EDIT: Ok, I suppose you could do it by writing the tar file manually. The header is specified here and doesn't seem too complicated, but that isn't exactly my idea of convenient...

回复收藏 0 原文

~没有更多了~