XML::Twig 使用“map_xmlns”时 twig_roots 可能存在错误寻找顶级标签的子代时的选项

发布于 2025-01-19 10:38:27 字数 4391 浏览 0 评论 0原文

我在测试Perl模块XML :: Twig时遇到了以下奇怪的行为,其中包括参数twig_handlers and twig_roots以及'map_xmlns'选项(perl版本:5.30.2,xml :: twig :: twig vig version:3.52)。

背景:

几年来,我一直在使用Perl模块XML :: Twig。特别是我使用了twig_roots和twig_handlers参数来从大型XML文件中提取数据,将整个文件加载到内存不切实际。

在过去的几个月中,我在使用上述twig_roots和twig_handlers参数时发现了“ map_xmlns”选项。当要分析的相同类型的XML文件包含不同的标签前缀时,此选项非常有用。

我发现的奇怪行为在以下代码中概述了。

use strict;
use warnings;
use XML::Twig;
use Data::Dumper;

my %map_xmlns_hash;
my $prefix_val='x';
$map_xmlns_hash{'http://schemas.testdata/info'}=$prefix_val;
my $tag_to_look_forA='/' . $prefix_val . ':items';
my $tag_to_look_forB='/' . $prefix_val . ':items/' . $prefix_val . ':item';
        
my $dataA = '
<items>
    <item>
        <data1>data1A</data1>
        <data2>data2A</data2>
    </item>
    <item>
        <data1>data1B</data1>
        <data2>data2B</data2>
    </item>
</items>';
        
my $dataB = '
<x:items xmlns:x="http://schemas.testdata/info">
    <x:item>
        <x:data1>data1C</x:data1>
        <x:data2>data2C</x:data2>
    </x:item>
    <x:item>
        <x:data1>data1D</x:data1>
        <x:data2>data2D</x:data2>
    </x:item>
</x:items>';
#
# 1a.) twig_handlers test when no xmlns mapping used on root (top level) tag '/items'
#
my  @Array1h;
my $t1h = XML::Twig->new(
    pretty_print => 'indented',
    twig_handlers => {  
        'items' => sub {Get_children_data(@_,\@Array1h)}})->parse($dataA);
print Dumper \@Array1h;
#
# 1b.) twig_roots test when no xmlns mapping used on root (top level) tag '/items'
#
my  @Array1r;
my $t1r = XML::Twig->new(
    pretty_print => 'indented',
    twig_roots => {
        'items' => sub {Get_children_data(@_,\@Array1r)}})->parse($dataA);
print Dumper \@Array1r;
#
# 2a.) twig_handlers test with xmlns mapping used on root (top level) tag '/items'
#
my  @Array2h;
my $t2h = XML::Twig->new(
    map_xmlns => {
        %map_xmlns_hash
    },
    keep_original_prefix =>1,
    pretty_print => 'indented',
    twig_handlers => {
        $tag_to_look_forA => sub {Get_children_data(@_,\@Array2h)}})->parse($dataB);
print Dumper \@Array2h;
#
# 2b.) twig_roots test with xmlns mapping used on root (top level) tag '/items'
#
my  @Array2r;
my $t2r = XML::Twig->new(
    map_xmlns => {
        %map_xmlns_hash
    },
    keep_original_prefix =>1,
    pretty_print => 'indented',
    twig_roots => {
        $tag_to_look_forA => sub {Get_children_data(@_,\@Array2r);}})->parse($dataB);
print Dumper \@Array2r;
#
# 3a.) twig_handlers test with xmlns mapping used on tag '/items/item'
#
my  @Array3h;
my $t3h = XML::Twig->new(
    map_xmlns => {
        %map_xmlns_hash
    },
    keep_original_prefix =>1,
    pretty_print => 'indented',
    twig_handlers => {
        $tag_to_look_forB => sub {Get_children_data(@_,\@Array3h)}})->parse($dataB);
print Dumper \@Array3h;
#
# 3b.) twig_roots test with xmlns mapping used on tag '/items/item'
#
my  @Array3r;
my $t3r = XML::Twig->new(
    map_xmlns => {
        %map_xmlns_hash
    },
    keep_original_prefix =>1,
    pretty_print => 'indented',
    twig_roots => {
        $tag_to_look_forB => sub {Get_children_data(@_,\@Array3r);}})->parse($dataB);
print Dumper \@Array3r;
#
#
#
sub Get_children_data{
    my( $t, $elt,$Array1)= @_;
    my @children_list=$elt->children();
    for my $iChild (0 .. scalar @children_list-1){
        push @$Array1,$children_list[$iChild]->name();
    }
    $t->purge;
}

上述代码的结果是:

$VAR1 = [
      'item',
      'item'
    ];
$VAR1 = [
      'item',
      'item'
    ];
$VAR1 = [
      'x:item',
      'x:item'
    ];
$VAR1 = [];
$VAR1 = [
      'x:data1',
      'x:data2',
      'x:data1',
      'x:data2'
    ];
$VAR1 = [
      'x:data1',
      'x:data2',
      'x:data1',
      'x:data2'
    ];

这表明2B。)错误返回没有子信息数据!

用'twig_handlers'[2a。)替换“ twig_roots” [2b。)]生成所需的结果。

最后两个示例[3a。)和3b。)]证明“ twig_roots”和“ twig_handlers'参数都可以按预期使用,而“``map_xmlns'''''''tag forge nove nove nove nove noc of the tag noc of the not as dog as noc cot(top Level) 。

我无法弄清楚问题是什么。在寻找顶级标签的子标签时,使用“ map_xmlns”选项时,是否有一个错误?

I have encountered the following strange behaviour while testing the perl module XML::Twig with the arguments twig_handlers and twig_roots along with the 'map_xmlns' option (perl version: 5.30.2, XML::Twig version: 3.52).

BACKGROUND:

I have been using the perl module XML::Twig for a few years now. In particular I have used the twig_roots and twig_handlers arguments to extract data from large XML files were loading the entire file into memory is not practical.

Over the past few months I have discovered the 'map_xmlns' option when using the aforementioned twig_roots and twig_handlers arguments. This option is very useful when the same types of XML files to be analysed contain different tag prefixes.

The strange behaviour that I have found is outlined in the following piece of code.

use strict;
use warnings;
use XML::Twig;
use Data::Dumper;

my %map_xmlns_hash;
my $prefix_val='x';
$map_xmlns_hash{'http://schemas.testdata/info'}=$prefix_val;
my $tag_to_look_forA='/' . $prefix_val . ':items';
my $tag_to_look_forB='/' . $prefix_val . ':items/' . $prefix_val . ':item';
        
my $dataA = '
<items>
    <item>
        <data1>data1A</data1>
        <data2>data2A</data2>
    </item>
    <item>
        <data1>data1B</data1>
        <data2>data2B</data2>
    </item>
</items>';
        
my $dataB = '
<x:items xmlns:x="http://schemas.testdata/info">
    <x:item>
        <x:data1>data1C</x:data1>
        <x:data2>data2C</x:data2>
    </x:item>
    <x:item>
        <x:data1>data1D</x:data1>
        <x:data2>data2D</x:data2>
    </x:item>
</x:items>';
#
# 1a.) twig_handlers test when no xmlns mapping used on root (top level) tag '/items'
#
my  @Array1h;
my $t1h = XML::Twig->new(
    pretty_print => 'indented',
    twig_handlers => {  
        'items' => sub {Get_children_data(@_,\@Array1h)}})->parse($dataA);
print Dumper \@Array1h;
#
# 1b.) twig_roots test when no xmlns mapping used on root (top level) tag '/items'
#
my  @Array1r;
my $t1r = XML::Twig->new(
    pretty_print => 'indented',
    twig_roots => {
        'items' => sub {Get_children_data(@_,\@Array1r)}})->parse($dataA);
print Dumper \@Array1r;
#
# 2a.) twig_handlers test with xmlns mapping used on root (top level) tag '/items'
#
my  @Array2h;
my $t2h = XML::Twig->new(
    map_xmlns => {
        %map_xmlns_hash
    },
    keep_original_prefix =>1,
    pretty_print => 'indented',
    twig_handlers => {
        $tag_to_look_forA => sub {Get_children_data(@_,\@Array2h)}})->parse($dataB);
print Dumper \@Array2h;
#
# 2b.) twig_roots test with xmlns mapping used on root (top level) tag '/items'
#
my  @Array2r;
my $t2r = XML::Twig->new(
    map_xmlns => {
        %map_xmlns_hash
    },
    keep_original_prefix =>1,
    pretty_print => 'indented',
    twig_roots => {
        $tag_to_look_forA => sub {Get_children_data(@_,\@Array2r);}})->parse($dataB);
print Dumper \@Array2r;
#
# 3a.) twig_handlers test with xmlns mapping used on tag '/items/item'
#
my  @Array3h;
my $t3h = XML::Twig->new(
    map_xmlns => {
        %map_xmlns_hash
    },
    keep_original_prefix =>1,
    pretty_print => 'indented',
    twig_handlers => {
        $tag_to_look_forB => sub {Get_children_data(@_,\@Array3h)}})->parse($dataB);
print Dumper \@Array3h;
#
# 3b.) twig_roots test with xmlns mapping used on tag '/items/item'
#
my  @Array3r;
my $t3r = XML::Twig->new(
    map_xmlns => {
        %map_xmlns_hash
    },
    keep_original_prefix =>1,
    pretty_print => 'indented',
    twig_roots => {
        $tag_to_look_forB => sub {Get_children_data(@_,\@Array3r);}})->parse($dataB);
print Dumper \@Array3r;
#
#
#
sub Get_children_data{
    my( $t, $elt,$Array1)= @_;
    my @children_list=$elt->children();
    for my $iChild (0 .. scalar @children_list-1){
        push @$Array1,$children_list[$iChild]->name();
    }
    $t->purge;
}

The results of the above code are:

$VAR1 = [
      'item',
      'item'
    ];
$VAR1 = [
      'item',
      'item'
    ];
$VAR1 = [
      'x:item',
      'x:item'
    ];
$VAR1 = [];
$VAR1 = [
      'x:data1',
      'x:data2',
      'x:data1',
      'x:data2'
    ];
$VAR1 = [
      'x:data1',
      'x:data2',
      'x:data1',
      'x:data2'
    ];

This shows that 2b.) incorrectly returned no child information data!

Replaceing 'twig_roots' [2b.)] by 'twig_handlers' [2a.)] generates the desired results.

The last two examples [3a.) and 3b.)] show that both the 'twig_roots' and 'twig_handlers' arguments work as expected with the ''map_xmlns' option when the tag to look for is not the root (top level) tag.

I cannot work out what the problem is. Is there a bug with twig_roots when using the 'map_xmlns' option while looking for child tags of top level tags?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文