如何检查 Perl 中是否存在 UTF-16 文件名？

发布于 2024-08-02 13:01:13 字数 626 浏览 5 评论 0原文

我有一个以 UTF-16 编码的文本文件。每行包含许多由制表符分隔的列。对于那些关心的人来说，该文件是从 iTunes 导出的播放列表 TXT。第 27 列包含文件名。

我正在 Linux 中使用 Perl 5.8.8 阅读它，使用的代码类似于：（

binmode STDIN, ":encoding(UTF-16)";
while(<>)
{
    chomp;
    my @cols = split /\t/, $_;
    my $filename = $cols[26];   # Column #27 contains the filename
    print "File exists!" if (-e "$filename");
}

请注意：我已经缩短了此代码片段。在我的实际代码中，我做了一些替换，将 iTunes 使用的绝对 Windows 文件名转换为在我的 Linux 机器）

即使文件存在，（-e）文件测试也不会返回 true。我相信这与 UTF-16 格式的字符串有关，但无法找出问题所在。实际的文件名仅使用 ASCII 字符。如果我打印 $filename 变量，文件名将正确打印。

Perl 中的文件名可以是 UTF16 吗？有什么想法可以让这个代码片段发挥作用吗？

原文

I have a textfile encoded in UTF-16. Each line contains a number of columns separated by tabs. For those who care, the file is a playlist TXT export from iTunes. Column #27 contains a filename.

I am reading it using Perl 5.8.8 in Linux using code similar to:

binmode STDIN, ":encoding(UTF-16)";
while(<>)
{
    chomp;
    my @cols = split /\t/, $_;
    my $filename = $cols[26];   # Column #27 contains the filename
    print "File exists!" if (-e "$filename");
}

(Please note: I've shortened this code snippet. In my actual code I do some substitutions to convert the absolute windows filename used by iTunes into a filename valid on my Linux box)

Even though the files exist, the (-e) file test does not return true. I believe it has something to do with the string being in UTF-16 but cannot figure out what the problem is. The actual filename uses only ASCII characters. And the filename prints correctly if I print the $filename variable.

Can filenames in Perl be in UTF16? Any ideas how to get this code snippet to work?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

别把无礼当个性 2024-08-09 13:01:13

UTF-16 文本由 :encoding 层处理。当它进入 $_ 时，无法判断它曾经是 UTF-16。我不认为这是你的问题。

我的猜测是你的文件名中有一些空格（当你尝试打印它时你没有注意到）或者你不在你认为的目录中。

尝试

if (-e $filename) { print "File exists!" } 
else { print "File <$filename> not found" }

仔细检查文件名。您还可以使用 Cwd; 并打印出当前目录。

The UTF-16 text is processed by the :encoding layer. By the time it gets into $_, there's no way to tell that it was ever UTF-16. I don't think that's your issue.

My guess would be that you've either got some whitespace in your filename (that you didn't notice when you tried printing it out) or you're not in the directory you think you are.

Try

if (-e $filename) { print "File exists!" } 
else { print "File <$filename> not found" }

and check the filename carefully. You might also use Cwd; and print out the current directory.

回复收藏 0 原文

望笑 2024-08-09 13:01:13

我找到了解决方案：

第 27 列是最后一列，文件使用 0d0a (\r\n) 行结尾进行编码。 chomp 仅删除 0a (\n)。不知道为什么我之前没有看到这个，但它与 UTF16 没有任何关系。

添加：

s/\r$//;

在 chomp 之后修复问题。

感谢您的帮助 - 很抱歉让您陷入兔子的困境。

I figured out the solution:

Column 27 is the last column, and the file is encoded with 0d0a (\r\n) line endings. chomp was only removing 0a (\n). Not sure why I didn't see this before, but it doesn't have anything to do with UTF16.

Adding:

s/\r$//;

after chomp fixes the problem.

Thanks for your help - sorry to send you down a rabbit trail.

回复收藏 0 原文

寂寞美少年 2024-08-09 13:01:13

如果，正如你所说，实际的文件名只使用 ASCII 字符，那行不通吗

$filename =~ s/\0//g;

？无论如何，xxd 应该会在您下次遇到类似问题时有所帮助，

[sinan@archardy ~]$ xxd /mnt/c/Documents\ and\ Settings/sinan/Desktop/test.txt
0000000: fffe 2f00 6800 6f00 6d00 6500 2f00 7300  ../.h.o.m.e./.s.
0000010: 6900 6e00 6100 6e00 2f00 7400 6500 7300  i.n.a.n./.t.e.s.
0000020: 7400 6d00 6500 2e00 7400 7800 7400 0d00  t.m.e...t.x.t...
0000030: 0a00                                     ..

我发现您在我创建测试文件并重新启动到 Linux 时已经解决了您的问题。那好吧。

If, as you say, the actual filename uses only ASCII characters, wouldn't

$filename =~ s/\0//g;

work? Anyway, xxd should help the next time you run into something like this

[sinan@archardy ~]$ xxd /mnt/c/Documents\ and\ Settings/sinan/Desktop/test.txt
0000000: fffe 2f00 6800 6f00 6d00 6500 2f00 7300  ../.h.o.m.e./.s.
0000010: 6900 6e00 6100 6e00 2f00 7400 6500 7300  i.n.a.n./.t.e.s.
0000020: 7400 6d00 6500 2e00 7400 7800 7400 0d00  t.m.e...t.x.t...
0000030: 0a00                                     ..

I see that you have solved your problem in the time it took me to create a test file and reboot into Linux. Oh well.

回复收藏 0 原文

~没有更多了~