XML::Twig 使用“map_xmlns”时 twig_roots 可能存在错误寻找顶级标签的子代时的选项
我在测试Perl模块XML :: Twig时遇到了以下奇怪的行为,其中包括参数twig_handlers and twig_roots以及'map_xmlns'选项(perl版本:5.30.2,xml :: twig :: twig vig version:3.52)。
背景:
几年来,我一直在使用Perl模块XML :: Twig。特别是我使用了twig_roots和twig_handlers参数来从大型XML文件中提取数据,将整个文件加载到内存不切实际。
在过去的几个月中,我在使用上述twig_roots和twig_handlers参数时发现了“ map_xmlns”选项。当要分析的相同类型的XML文件包含不同的标签前缀时,此选项非常有用。
我发现的奇怪行为在以下代码中概述了。
use strict;
use warnings;
use XML::Twig;
use Data::Dumper;
my %map_xmlns_hash;
my $prefix_val='x';
$map_xmlns_hash{'http://schemas.testdata/info'}=$prefix_val;
my $tag_to_look_forA='/' . $prefix_val . ':items';
my $tag_to_look_forB='/' . $prefix_val . ':items/' . $prefix_val . ':item';
my $dataA = '
<items>
<item>
<data1>data1A</data1>
<data2>data2A</data2>
</item>
<item>
<data1>data1B</data1>
<data2>data2B</data2>
</item>
</items>';
my $dataB = '
<x:items xmlns:x="http://schemas.testdata/info">
<x:item>
<x:data1>data1C</x:data1>
<x:data2>data2C</x:data2>
</x:item>
<x:item>
<x:data1>data1D</x:data1>
<x:data2>data2D</x:data2>
</x:item>
</x:items>';
#
# 1a.) twig_handlers test when no xmlns mapping used on root (top level) tag '/items'
#
my @Array1h;
my $t1h = XML::Twig->new(
pretty_print => 'indented',
twig_handlers => {
'items' => sub {Get_children_data(@_,\@Array1h)}})->parse($dataA);
print Dumper \@Array1h;
#
# 1b.) twig_roots test when no xmlns mapping used on root (top level) tag '/items'
#
my @Array1r;
my $t1r = XML::Twig->new(
pretty_print => 'indented',
twig_roots => {
'items' => sub {Get_children_data(@_,\@Array1r)}})->parse($dataA);
print Dumper \@Array1r;
#
# 2a.) twig_handlers test with xmlns mapping used on root (top level) tag '/items'
#
my @Array2h;
my $t2h = XML::Twig->new(
map_xmlns => {
%map_xmlns_hash
},
keep_original_prefix =>1,
pretty_print => 'indented',
twig_handlers => {
$tag_to_look_forA => sub {Get_children_data(@_,\@Array2h)}})->parse($dataB);
print Dumper \@Array2h;
#
# 2b.) twig_roots test with xmlns mapping used on root (top level) tag '/items'
#
my @Array2r;
my $t2r = XML::Twig->new(
map_xmlns => {
%map_xmlns_hash
},
keep_original_prefix =>1,
pretty_print => 'indented',
twig_roots => {
$tag_to_look_forA => sub {Get_children_data(@_,\@Array2r);}})->parse($dataB);
print Dumper \@Array2r;
#
# 3a.) twig_handlers test with xmlns mapping used on tag '/items/item'
#
my @Array3h;
my $t3h = XML::Twig->new(
map_xmlns => {
%map_xmlns_hash
},
keep_original_prefix =>1,
pretty_print => 'indented',
twig_handlers => {
$tag_to_look_forB => sub {Get_children_data(@_,\@Array3h)}})->parse($dataB);
print Dumper \@Array3h;
#
# 3b.) twig_roots test with xmlns mapping used on tag '/items/item'
#
my @Array3r;
my $t3r = XML::Twig->new(
map_xmlns => {
%map_xmlns_hash
},
keep_original_prefix =>1,
pretty_print => 'indented',
twig_roots => {
$tag_to_look_forB => sub {Get_children_data(@_,\@Array3r);}})->parse($dataB);
print Dumper \@Array3r;
#
#
#
sub Get_children_data{
my( $t, $elt,$Array1)= @_;
my @children_list=$elt->children();
for my $iChild (0 .. scalar @children_list-1){
push @$Array1,$children_list[$iChild]->name();
}
$t->purge;
}
上述代码的结果是:
$VAR1 = [
'item',
'item'
];
$VAR1 = [
'item',
'item'
];
$VAR1 = [
'x:item',
'x:item'
];
$VAR1 = [];
$VAR1 = [
'x:data1',
'x:data2',
'x:data1',
'x:data2'
];
$VAR1 = [
'x:data1',
'x:data2',
'x:data1',
'x:data2'
];
这表明2B。)错误返回没有子信息数据!
用'twig_handlers'[2a。)替换“ twig_roots” [2b。)]生成所需的结果。
最后两个示例[3a。)和3b。)]证明“ twig_roots”和“ twig_handlers'参数都可以按预期使用,而“``map_xmlns'''''''tag forge nove nove nove nove noc of the tag noc of the not as dog as noc cot(top Level) 。
我无法弄清楚问题是什么。在寻找顶级标签的子标签时,使用“ map_xmlns”选项时,是否有一个错误?
I have encountered the following strange behaviour while testing the perl module XML::Twig with the arguments twig_handlers and twig_roots along with the 'map_xmlns' option (perl version: 5.30.2, XML::Twig version: 3.52).
BACKGROUND:
I have been using the perl module XML::Twig for a few years now. In particular I have used the twig_roots and twig_handlers arguments to extract data from large XML files were loading the entire file into memory is not practical.
Over the past few months I have discovered the 'map_xmlns' option when using the aforementioned twig_roots and twig_handlers arguments. This option is very useful when the same types of XML files to be analysed contain different tag prefixes.
The strange behaviour that I have found is outlined in the following piece of code.
use strict;
use warnings;
use XML::Twig;
use Data::Dumper;
my %map_xmlns_hash;
my $prefix_val='x';
$map_xmlns_hash{'http://schemas.testdata/info'}=$prefix_val;
my $tag_to_look_forA='/' . $prefix_val . ':items';
my $tag_to_look_forB='/' . $prefix_val . ':items/' . $prefix_val . ':item';
my $dataA = '
<items>
<item>
<data1>data1A</data1>
<data2>data2A</data2>
</item>
<item>
<data1>data1B</data1>
<data2>data2B</data2>
</item>
</items>';
my $dataB = '
<x:items xmlns:x="http://schemas.testdata/info">
<x:item>
<x:data1>data1C</x:data1>
<x:data2>data2C</x:data2>
</x:item>
<x:item>
<x:data1>data1D</x:data1>
<x:data2>data2D</x:data2>
</x:item>
</x:items>';
#
# 1a.) twig_handlers test when no xmlns mapping used on root (top level) tag '/items'
#
my @Array1h;
my $t1h = XML::Twig->new(
pretty_print => 'indented',
twig_handlers => {
'items' => sub {Get_children_data(@_,\@Array1h)}})->parse($dataA);
print Dumper \@Array1h;
#
# 1b.) twig_roots test when no xmlns mapping used on root (top level) tag '/items'
#
my @Array1r;
my $t1r = XML::Twig->new(
pretty_print => 'indented',
twig_roots => {
'items' => sub {Get_children_data(@_,\@Array1r)}})->parse($dataA);
print Dumper \@Array1r;
#
# 2a.) twig_handlers test with xmlns mapping used on root (top level) tag '/items'
#
my @Array2h;
my $t2h = XML::Twig->new(
map_xmlns => {
%map_xmlns_hash
},
keep_original_prefix =>1,
pretty_print => 'indented',
twig_handlers => {
$tag_to_look_forA => sub {Get_children_data(@_,\@Array2h)}})->parse($dataB);
print Dumper \@Array2h;
#
# 2b.) twig_roots test with xmlns mapping used on root (top level) tag '/items'
#
my @Array2r;
my $t2r = XML::Twig->new(
map_xmlns => {
%map_xmlns_hash
},
keep_original_prefix =>1,
pretty_print => 'indented',
twig_roots => {
$tag_to_look_forA => sub {Get_children_data(@_,\@Array2r);}})->parse($dataB);
print Dumper \@Array2r;
#
# 3a.) twig_handlers test with xmlns mapping used on tag '/items/item'
#
my @Array3h;
my $t3h = XML::Twig->new(
map_xmlns => {
%map_xmlns_hash
},
keep_original_prefix =>1,
pretty_print => 'indented',
twig_handlers => {
$tag_to_look_forB => sub {Get_children_data(@_,\@Array3h)}})->parse($dataB);
print Dumper \@Array3h;
#
# 3b.) twig_roots test with xmlns mapping used on tag '/items/item'
#
my @Array3r;
my $t3r = XML::Twig->new(
map_xmlns => {
%map_xmlns_hash
},
keep_original_prefix =>1,
pretty_print => 'indented',
twig_roots => {
$tag_to_look_forB => sub {Get_children_data(@_,\@Array3r);}})->parse($dataB);
print Dumper \@Array3r;
#
#
#
sub Get_children_data{
my( $t, $elt,$Array1)= @_;
my @children_list=$elt->children();
for my $iChild (0 .. scalar @children_list-1){
push @$Array1,$children_list[$iChild]->name();
}
$t->purge;
}
The results of the above code are:
$VAR1 = [
'item',
'item'
];
$VAR1 = [
'item',
'item'
];
$VAR1 = [
'x:item',
'x:item'
];
$VAR1 = [];
$VAR1 = [
'x:data1',
'x:data2',
'x:data1',
'x:data2'
];
$VAR1 = [
'x:data1',
'x:data2',
'x:data1',
'x:data2'
];
This shows that 2b.) incorrectly returned no child information data!
Replaceing 'twig_roots' [2b.)] by 'twig_handlers' [2a.)] generates the desired results.
The last two examples [3a.) and 3b.)] show that both the 'twig_roots' and 'twig_handlers' arguments work as expected with the ''map_xmlns' option when the tag to look for is not the root (top level) tag.
I cannot work out what the problem is. Is there a bug with twig_roots when using the 'map_xmlns' option while looking for child tags of top level tags?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论