如何处理带有 CRLF、NEL 行终止符的行?
我需要使用 shift_jis 编码处理文件。然而,行终止符的格式我不熟悉。
> file record.CSV
record.CSV: Non-ISO extended-ASCII text, with CRLF, NEL line terminators
我使用一般:
open my $CSV_FILE, "<:encoding(shift_jis)", $filename or die "Could not open: $CSV_FILE : $!";
while (<$CSV_FILE>) {
chomp;
# do stuff
}
但是它仍然在每条记录的末尾留下一个 CR。
终止这些类型的文件的正确方法是什么?
I need to process a file with shift_jis encoding. However the line terminators are in a format that im not familar with.
> file record.CSV
record.CSV: Non-ISO extended-ASCII text, with CRLF, NEL line terminators
Im using the general:
open my $CSV_FILE, "<:encoding(shift_jis)", $filename or die "Could not open: $CSV_FILE : $!";
while (<$CSV_FILE>) {
chomp;
# do stuff
}
However it is still leaving a CR at the end of each record.
What is the correct way to terminate files of these types?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
为什么不手动执行
$_ =~ s/\r//
?编辑:显然,您还可以
删除
NEL: Next Line, U+0085
字符。Why not do
$_ =~ s/\r//
manually?Edit: apparently, you can also do
to remove the
NEL: Next Line, U+0085
characters.您需要考虑谁在使用数据并了解有关生成这些文件的环境的更多信息。如果您最终想要的是一个普通的 CSV 输出文件,请使用您喜欢的任何旧字符串操作来删除它们(并生成 CRLF 终止符来代替它们),然后就可以了。
You need to consider who's consuming the data and learn more about the environment which produced these files. If it's a plain-vanilla CSV output file you're after in the end, use any old string manipulation you like to get rid of them (and produce CRLF terminators in their stead) and you'll be fine.