在 Linux/bash 下分割文件及其行

发布于 2024-07-04 12:09:13 字数 151 浏览 15 评论 0原文

我有一个相当大的文件（1.5 亿行，每行 10 个字符）。我需要将其拆分为 150 个文件，每行 200 万行，每个输出行要么是源行的前 5 个字符，要么是后 5 个字符。我可以在 Perl 中相当快地完成此操作，但我想知道是否有使用 bash 的简单解决方案。有任何想法吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

顾北清歌寒 2024-07-11 12:09:14

我认为这样的东西可以工作：

out_file=1
out_pairs=0
cat $in_file | while read line; do
    if [ $out_pairs -gt 1000000 ]; then
        out_file=$(($out_file + 1))
        out_pairs=0
    fi
    echo "${line%?????}" >> out${out_file}
    echo "${line#?????}" >> out${out_file}
    out_pairs=$(($out_pairs + 1))
done

不过不确定它是否比使用 Perl 更简单或更有效。

I think that something like this could work:

out_file=1
out_pairs=0
cat $in_file | while read line; do
    if [ $out_pairs -gt 1000000 ]; then
        out_file=$(($out_file + 1))
        out_pairs=0
    fi
    echo "${line%?????}" >> out${out_file}
    echo "${line#?????}" >> out${out_file}
    out_pairs=$(($out_pairs + 1))
done

Not sure if it's simpler or more efficient than using Perl, though.

回复收藏 0 原文

难忘№最初的完美 2024-07-11 12:09:14

每行变体的前五个字符，假设大文件名为 x.txt，并假设可以在当前目录中创建名为 x.txt.* 的文件：

split -l 2000000 x.txt x.txt.out & ;& (对于 x.txt.out* 中的 splitfile; do outfile="${splitfile}.first Five"; echo "$splitfile -> $outfile"; cut -c 1-5 "$splitfile" > "$outfile" ; 完毕）

回复收藏 0 原文

゛清羽墨安 2024-07-11 12:09:14

为什么不直接使用原生linux split 函数呢？

split -d -l 999999 input_filename

这将输出新的分割文件，其文件名类似于 x00 x01 x02...

有关详细信息，请参阅手册

man split

Why not just use native linux split function?

split -d -l 999999 input_filename

this will output new split files with file names like x00 x01 x02...

for more info see the manual

man split

回复收藏 0 原文

疯狂的代价 2024-07-11 12:09:14

家庭作业？ :-)

我认为一个带有 sed （将每一行分成两部分）和 split （将内容分成多个文件）的简单管道就足够了。

man 命令是你的朋友。

确认不是作业后添加：

怎么样

sed 's/\(.....\)\(.....\)/\1\n\2/' input_file | split -l 2000000 - out-prefix-

？

Homework? :-)

I would think that a simple pipe with sed (to split each line into two) and split (to split things up into multiple files) would be enough.

The man command is your friend.

Added after confirmation that it is not homework:

How about

sed 's/\(.....\)\(.....\)/\1\n\2/' input_file | split -l 2000000 - out-prefix-

回复收藏 0 原文

~没有更多了~

关于作者

守不住的情

暂无简介

文章

26 人气

关注发私信

友情链接

文江博客

在 Linux/bash 下分割文件及其行

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

在 Linux/bash 下分割文件及其行

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

Promise

qq_lbRlsh

待＂谢繁草

yy2010hell

漫无边际

傲娇萝莉攻

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。