在 Linux/bash 下分割文件及其行
我有一个相当大的文件(1.5 亿行,每行 10 个字符)。 我需要将其拆分为 150 个文件,每行 200 万行,每个输出行要么是源行的前 5 个字符,要么是后 5 个字符。 我可以在 Perl 中相当快地完成此操作,但我想知道是否有使用 bash 的简单解决方案。 有任何想法吗?
I have a rather large file (150 million lines of 10 chars). I need to split it in 150 files of 2 million lines, with each output line being alternatively the first 5 characters or the last 5 characters of the source line.
I could do this in Perl rather quickly, but I was wondering if there was an easy solution using bash.
Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
我认为这样的东西可以工作:
不过不确定它是否比使用 Perl 更简单或更有效。
I think that something like this could work:
Not sure if it's simpler or more efficient than using Perl, though.
每行变体的前五个字符,假设大文件名为 x.txt,并假设可以在当前目录中创建名为 x.txt.* 的文件:
split -l 2000000 x.txt x.txt.out & ;& (对于 x.txt.out* 中的 splitfile; do outfile="${splitfile}.first Five"; echo "$splitfile -> $outfile"; cut -c 1-5 "$splitfile" > "$outfile" ; 完毕)
First five chars of each line variant, assuming that the large file called x.txt, and assuming it's OK to create files in the current directory with names x.txt.* :
split -l 2000000 x.txt x.txt.out && (for splitfile in x.txt.out*; do outfile="${splitfile}.firstfive"; echo "$splitfile -> $outfile"; cut -c 1-5 "$splitfile" > "$outfile"; done)
为什么不直接使用原生linux
split
函数呢?这将输出新的分割文件,其文件名类似于
x00 x01 x02...
有关详细信息,请参阅手册
Why not just use native linux
split
function?this will output new split files with file names like
x00 x01 x02...
for more info see the manual
家庭作业? :-)
我认为一个带有 sed (将每一行分成两部分)和 split (将内容分成多个文件)的简单管道就足够了。
man 命令是你的朋友。
确认不是作业后添加:
怎么样
?
Homework? :-)
I would think that a simple pipe with sed (to split each line into two) and split (to split things up into multiple files) would be enough.
The man command is your friend.
Added after confirmation that it is not homework:
How about
?