如何以编程方式通过 XML::Twig 添加实体声明?

发布于 2024-09-29 03:18:04 字数 1161 浏览 6 评论 0原文

我一生都无法理解实体处理的 XML::Twig 文档。

我有一些用 HTML::Tidy 生成的 XML。调用如下:

my $tidy = HTML::Tidy->new({
    'indent'          => 1,
    'break-before-br' => 1,
    'output-xhtml'    => 0,
    'output-xml'      => 1,
    'char-encoding'   => 'raw',
});

$str = "foo   bar";
$xml = $tidy->clean("<xml>$str</xml>");

它产生:

<html>
  <head>
    <meta content="tidyp for Linux (v1.02), see www.w3.org" name="generator" />
    <title></title>
  </head>
  <body>foo &nbsp; bar</body>
</html>

XML::Twig (可以理解)在 &nbsp; 处 barfs。我想做一些转换,通过 XML::Twig 运行它:

my $twig = XML::Twig->new(
  twig_handlers => {... handlers ...}
);

$twig->parse($xml);

&nbsp; 上的 $twig->parse 行 barfs,但我不能弄清楚如何以编程方式添加 &nbsp; 元素。我尝试过类似的事情:

my $entity = XML::Twig::Entity->new("nbsp", "&#160;");
$twig->entity_list->add($entity);
$twig->parse($xml);

...但没有快乐。

请帮忙=)

For the life of me I cannot understand the XML::Twig documentation for entity handling.

I've got some XML I'm generating with HTML::Tidy. The call is as follows:

my $tidy = HTML::Tidy->new({
    'indent'          => 1,
    'break-before-br' => 1,
    'output-xhtml'    => 0,
    'output-xml'      => 1,
    'char-encoding'   => 'raw',
});

$str = "foo   bar";
$xml = $tidy->clean("<xml>$str</xml>");

which produces:

<html>
  <head>
    <meta content="tidyp for Linux (v1.02), see www.w3.org" name="generator" />
    <title></title>
  </head>
  <body>foo   bar</body>
</html>

XML::Twig (understandably) barfs at the  . I want to do some transformations, running it through XML::Twig:

my $twig = XML::Twig->new(
  twig_handlers => {... handlers ...}
);

$twig->parse($xml);

The $twig->parse line barfs on the  , but I can't figure out how to add the   element programmatically. I tried things like:

my $entity = XML::Twig::Entity->new("nbsp", " ");
$twig->entity_list->add($entity);
$twig->parse($xml);

... but no joy.

Please help =)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

泼猴你往哪里跑 2024-10-06 03:18:04

在这种情况下,一个肮脏但有效的技巧是添加一个虚假的 DTD 声明。

然后,进行解析的 XML::Parser 将假定该实体是在 DTD 中定义的,并且不会对其进行吐槽。

要摆脱假 DTD 声明,您可以输出树枝的根。如果您需要不同的声明,请创建它并替换当前的声明:

#!/usr/bin/perl 

use strict;
use warnings;

use XML::Twig;

my $fake_dtd= '<!DOCTYPE head SYSTEM "foo"[]>'; # foo may not even exist

my $xml='<html>
  <head>
    <meta content="tidyp for Linux (v1.02), see www.w3.org" name="generator" />
    <title></title>
  </head>
  <body>foo   bar</body>
</html>';

XML::Twig->new->parse( $fake_dtd . $xml)->root->print;

A dirty, but efficient, trick in a case like this would be to add a fake DTD declaration.

Then XML::Parser, which does the parsing, will assume that the entity is defined in the DTD and won't barf on it.

To get rid of the fake DTD declaration, you can output the root of the twig. If you need a different declaration, create it and replace the current one:

#!/usr/bin/perl 

use strict;
use warnings;

use XML::Twig;

my $fake_dtd= '<!DOCTYPE head SYSTEM "foo"[]>'; # foo may not even exist

my $xml='<html>
  <head>
    <meta content="tidyp for Linux (v1.02), see www.w3.org" name="generator" />
    <title></title>
  </head>
  <body>foo   bar</body>
</html>';

XML::Twig->new->parse( $fake_dtd . $xml)->root->print;
暗恋未遂 2024-10-06 03:18:04
use strict;
use XML::Twig;

my $doctype = '<?xml version="1.0" encoding="utf-8"?><!DOCTYPE html [<!ENTITY nbsp " ">]>';
my $xml = '<html><head><meta content="tidyp for Linux (v1.02), see www.w3.org" name="generator" /><title></title></head><body>foo   bar</body></html>';

my $xTwig = XML::Twig->new();

$xTwig->safe_parse($doctype . $xml) or die "Failure to parse XML : $@";

print $xTwig->sprint();
use strict;
use XML::Twig;

my $doctype = '<?xml version="1.0" encoding="utf-8"?><!DOCTYPE html [<!ENTITY nbsp " ">]>';
my $xml = '<html><head><meta content="tidyp for Linux (v1.02), see www.w3.org" name="generator" /><title></title></head><body>foo   bar</body></html>';

my $xTwig = XML::Twig->new();

$xTwig->safe_parse($doctype . $xml) or die "Failure to parse XML : $@";

print $xTwig->sprint();
ぃ弥猫深巷。 2024-10-06 03:18:04

也许有更好的方法,但下面的代码对我有用:

my $filter = sub {
    my $text  = shift;
    my $ascii = "\x{a0}";    # non breaking space
    my $nbsp  = ' ';
    $text =~ s/$ascii/$nbsp/;
    return $text;
};

XML::Twig->new( output_filter => $filter )
         ->parse_html( $xml )
         ->print;

There maybe a better way, but the code below worked for me:

my $filter = sub {
    my $text  = shift;
    my $ascii = "\x{a0}";    # non breaking space
    my $nbsp  = ' ';
    $text =~ s/$ascii/$nbsp/;
    return $text;
};

XML::Twig->new( output_filter => $filter )
         ->parse_html( $xml )
         ->print;
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文