mac unix脚本问题

发布于 2024-08-22 07:29:13 字数 646 浏览 8 评论 0原文

我正在尝试编写一个脚本,将一个非常大的文件分解成较小的部分,然后将其发送到在后台运行的脚本。动机是如果脚本在后台运行,我可以并行运行。

这是我的代码, ./seq 的工作方式就像普通的 seq 命令(mac 没有)。 $1 是要分割的大文件。

echo "Splitting and Running Script"

for i in $(./seq 0 14000000 500000)
do
   awk ' { if (NR>='$i' && NR<'$(($i+500000))') { print $0 > "xPart'$i'" }  }' $1 
   python FastQ2Seq.py xPart$i &
done

wait

echo "Concatenating"

for k in *.out.seq
do
cat $k >> original.seq
done

for j in *.out.qul
do
cat $j >> original.qul
done

echo "Cleaning"
rm xPart*

我的问题是只创建了 xPart0,并且在程序挂起之前它只有 499995 行。我在脚本中添加了一些调试回显,并且我知道 awk 语句是停止脚本的原因。我就是不明白出了什么问题。

I'm trying to write a script that breaks up a VERY large file into smaller pieces that are then sent to a script that runs in the background. The motivation is that if the script is running in the background, I can run in parallel.

Here is my code, ./seq works just like the normal seq command (which mac doesn't have). and $1 is the huge file to be split.

echo "Splitting and Running Script"

for i in $(./seq 0 14000000 500000)
do
   awk ' { if (NR>='$i' && NR<'$(($i+500000))') { print $0 > "xPart'$i'" }  }' $1 
   python FastQ2Seq.py xPart$i &
done

wait

echo "Concatenating"

for k in *.out.seq
do
cat $k >> original.seq
done

for j in *.out.qul
do
cat $j >> original.qul
done

echo "Cleaning"
rm xPart*

My problem is that only xPart0 is made and it only has 499995 lines in it before the program hangs. I put some debugging echos in the script and I know the awk statement is what stops the script. I just can't figure out what's going wrong.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

帅气称霸 2024-08-29 07:29:13

查看 split 命令 --

 split -- 将文件分割成多个部分

  将固定大小的 INPUT 块输出到 PREFIXaa、PREFIXab、...;默认
  大小为 1000 行,默认前缀为“x”。没有输入,或者当
  INPUT为-,读取标准输入。

应该比循环运行 awk 更快、更可靠、更干净!

Check out the split command --

  split -- split a file into pieces

  Output  fixed-size  pieces of INPUT to PREFIXaa, PREFIXab, ...; default
  size is 1000 lines, and default PREFIX is `x'.  With no INPUT, or  when
  INPUT is -, read standard input.

Should be much faster, reliable, and cleaner than running awk in a loop!

国粹 2024-08-29 07:29:13
echo "Splitting and Running Script"
# splits to smaller files each 50000 lines, if i understand your problem correctly
awk 'NR%50000==1{++c}{print $0 > "xPart"c".txt"}' file
# or use split -l 50000 
for file in xPart*
do
    python FastQ2Seq.py "$file" &
done
echo "Concatenating"
cat *.out.seq >> original.seq
cat *.out.qul >> original.qul
echo "Splitting and Running Script"
# splits to smaller files each 50000 lines, if i understand your problem correctly
awk 'NR%50000==1{++c}{print $0 > "xPart"c".txt"}' file
# or use split -l 50000 
for file in xPart*
do
    python FastQ2Seq.py "$file" &
done
echo "Concatenating"
cat *.out.seq >> original.seq
cat *.out.qul >> original.qul
煮酒 2024-08-29 07:29:13

如果您的 seq 确实像标准 seq 一样工作,那么您就认为它是错误的。 seq 的正确命令行是:

seq FIRST INCREMENT LAST

因此您需要将 seq 命令行更改为:

seq 0 500000 14000000

If your seq truly works like the standard seq, you're calling it wrong. The proper command line for seq is:

seq FIRST INCREMENT LAST

So you would need to change your seq commandline to:

seq 0 500000 14000000
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文