Windows 换行问题上的 HTML::Tidy
在 Windows 上使用 HTML::Tidy 清理 HTML::Element as_HTML 方法的输出时,我得到了错误类型的换行符。如果我没有在 HTML::Tidy 构造函数中指定换行符,我的行将以 CRCRLF 终止。如果我指定“LF”终止,我会得到“CRLF”,如果我指定“CRLF”,我会得到原始的 CRCRLF 终止。我怀疑这是 HTMLtidy 库中的一个错误,并且通过显式指定 Unix 终止并退出 DOS 很容易解决,几乎任何像样的编辑器都可以在任何平台上解析。
根据答案,我在适当的句柄上使用 binmode ':raw:utf8' 解决了该问题以禁用 /n
插值:
my $output = IO::File->new($ARGV[1], 'w');
$output->binmode(':raw:utf8');
print $output HTML::Tidy->new( { wrap => 80,
indent => 'auto',
'wrap-attributes' => 'yes',
}
)->clean($tree->as_HTML());
它非常通用,但我找不到真正提到其他有问题的人来自 HTMLtidy 库的一般错误。有没有人处理过这个问题并且可以确认这是一个库错误?如果是这样,我会感到惊讶,因为图书馆已经存在了很长时间,并且想在提交报告之前确认一下。
编辑:我更新了代码以显示文件句柄的创建。该问题可以通过将 filehandle binmode 设置为 raw 来解决,但是由于 Unicode HTML 内容。有没有办法在不插入其他问题的情况下解决它?
编辑 2:我应该注意,我最初将此视为 HTML::Tidy 问题,因为使用任何 binmode 将直接 $tree->as_HTML() 打印到文件句柄都会导致正确的 EOL 字符。仅当我使用 HTML::Tidy 将标量 HTML::Element 输出包装为 HTML 代码时,该问题才显现出来。
When using HTML::Tidy on Windows to clean the output of an HTML::Element as_HTML method I'm getting the wrong type of newline. If I don't specify the newline in the HTML::Tidy constructor, I get my lines terminated by CRCRLF. If I specify 'LF' termination, I get 'CRLF', and if I specify 'CRLF' I get the original CRCRLF termination. I suspect this is a bug in the HTMLtidy library, and it is easy enough to work around by specifying Unix termination explicitly, and getting DOS out, which pretty much any decent editor can parse on any platform.
Per answer, I resolved the issue using binmode ':raw:utf8' on the appropriate handle to disable /n
interpolation:
my $output = IO::File->new($ARGV[1], 'w');
$output->binmode(':raw:utf8');
print $output HTML::Tidy->new( { wrap => 80,
indent => 'auto',
'wrap-attributes' => 'yes',
}
)->clean($tree->as_HTML());
It is pretty generic, but I can't find real mention of others having issues aside from general bugginess of the HTMLtidy library. Has anyone dealt with this issue and can confirm this is a library bug? I'd be surprised if so, as the library's been around for ages, and want to confirm before filing a report.
Edit: I updated the code to show the filehandle creation. The issue can be resolved by setting filehandle binmode to raw, but then I have issues due to Unicode in the HTML content. Is there a way to resolve it without inserting other issues?
Edit 2: I should note that I was originally seeing this as an HTML::Tidy issue because printing a straight $tree->as_HTML() to the filehandle with any binmode resulted in the correct EOL characters. The issue only manifested itself once I wrapped the scalar HTML::Element output as the HTML code with HTML::Tidy.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
尝试将输出文件设为二进制:
我在模板工具包输出中遇到了类似的问题。
Tried to make the output file binary:
I had a similar issue with the Template Toolkit output.