如何使用 Perl 计算大型 CSV 文件中的行数?
我在工作中必须在Windows环境下使用Perl,并且我需要能够找出一个大的csv文件包含的行数(大约1.4Gb)。 知道如何以最少的资源浪费来做到这一点吗?
谢谢
PS 这必须在 Perl 脚本内完成,并且我们不允许在系统上安装任何新模块。
I have to use Perl on a Windows environment at work, and I need to be able to find out the number of rows that a large csv file contains (about 1.4Gb).
Any idea how to do this with minimum waste of resources?
Thanks
PS This must be done within the Perl script and we're not allowed to install any new modules onto the system.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
你指的是行还是行? 单元格可能包含换行符,这会将行添加到文件中,但不会添加行。 如果保证没有单元格包含新行,则只需使用 Perl 常见问题。 否则,您将需要一个适当的 CSV 解析器,例如 Text ::xSV。
Do you mean lines or rows? A cell may contain line breaks which would add lines to the file, but not rows. If you are guaranteed that no cells contain new lines, then just use the technique in the Perl FAQ. Otherwise, you will need a proper CSV parser like Text::xSV.
是的,不要使用 perl。
相反,使用简单的实用程序来计算行数; wc.exe
它是从 unix 原始版本移植的 windows 实用程序套件的一部分。
http://unxutils.sourceforge.net/
例如;
其中 12 == 行数,26 == 字数,271 == 字符数。
如果你真的必须使用perl;
Yes, don't use perl.
Instead use the simple utility for counting lines; wc.exe
It's part of a suite of windows utilities ported from unix originals.
http://unxutils.sourceforge.net/
For example;
Where 12 == number of lines, 26 == number of words, 271 == number of characters.
If you really have to use perl;
一次只读取一行,因此不会浪费任何内存,除非每一行都非常长。
This only reads one line at a time, so it doesn't waste any memory unless each line is enormously long.
这一行处理行中的新行:
它使用很棒触发器运算符。
考虑一下:
wc
不会起作用。 它非常适合计算行数,但不适用于 CSV 行数Text::CSV
或一些类似的标准包以进行正确处理。EDIT: It slipped my mind that this was windows:
奇怪的是,The Broken OS 的 shell 将
&&
解释为操作系统条件执行,而我无法做任何事情来改变它的想法!如果我逃脱了它只是将其传递给 Perl。This one-liner handles new lines within the rows:
It uses the awesome flip-flop operator.
Consider:
wc
is not going to work. It's awesome for counting lines, but not CSV rowsText::CSV
or some similar standard package for proper handling.EDIT: It slipped my mind that this was windows:
The weird thing is that The Broken OS' shell interprets
&&
as the OS conditional exec and I couldn't do anything to change its mind!! If I escaped it, it would just pass it that way to perl.支持 edg 的答案,另一个选择是安装 cygwin 来在 Windows 上获取 wc 和其他一些方便的实用程序。
Upvote for edg's answer, another option is to install cygwin to get wc and a bunch of other handy utilities on Windows.
我很白痴,在脚本中执行此操作的简单方法是:
I was being idiotic, the simple way to do it in the script is: