解析大型 XML 文件？

发布于 2024-10-02 13:21:02 字数 4770 浏览 4 评论 0原文

我有 2 个 xml 文件，其中一个大小为 115mb，另一个大小为 34mb。

Wiile 读取文件 A 时，有 1 个名为 desc 的字段与文件 B 相关，我从文件 B 中检索字段 id，其中 desc.file A 等于 name.file B。

文件 A 已经太大了，所以我必须在内部搜索文件B需要很长时间才能完成。

我怎样才能加快这个过程或者什么是更好的方法？

我正在使用的当前代码：

#!/usr/bin/perl

use strict;
use warnings;

use XML::Simple qw(:strict XMLin);

my $npcs = XMLin('Client/client_npcs.xml', KeyAttr => { }, ForceArray => [ 'npc_client' ]);
my $strings = XMLin('Client/client_strings.xml', KeyAttr => { }, ForceArray => [ 'string' ]);

my ($nameid,$rank);

open (my $fh, '>>', 'Output/npc_templates.xml');
print $fh "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<npc_templates xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:noNamespaceSchemaLocation=\"npcs.xsd\">\n";
foreach my $npc ( @{ $npcs->{npc_client} } ) {
        if (defined $npc->{desc}) {
                foreach my $string (@{$strings->{string}}) {
                        if (defined $string->{name} && $string->{name} =~ /$npc->{desc}/i) {
                                $nameid = $string->{id};
                                last;
                        }
                }
        } else {
                $nameid = "";
        }

        if (defined $npc->{hpgauge_level} && $npc->{hpgauge_level} > 25 && $npc->{hpgauge_level} < 28) {
            $rank = 'LEGENDARY';
        } elsif (defined $npc->{hpgauge_level} && $npc->{hpgauge_level} > 21 && $npc->{hpgauge_level} < 23) {
            $rank = 'HERO';
        } elsif (defined $npc->{hpgauge_level} && $npc->{hpgauge_level} > 10 && $npc->{hpgauge_level} < 15) {
            $rank = 'ELITE';
        } elsif (defined $npc->{hpgauge_level} && $npc->{hpgauge_level} > 0 && $npc->{hpgauge_level} < 11) {
            $rank = 'NORMAL';
        } else {
            $rank = $gauge;
        }

        print $fh qq|\t<npc_template npc_id="$npc->{id}" name="$npc->{name}" name_id="$nameid" height="$npc->{scale}" rank="$rank" tribe="$npc->{tribe}" race="$npc->{race_type}" hp_gauge="$npc->{hpgauge_level}"/>\n|;
}
print $fh "</<npc_templates>";
close($fh);

文件 A.xml 的示例：

<?xml version="1.0" encoding="utf-16"?>
<npc_clients>
  <npc_client>
    <id>200000</id>
    <name>SkillZone</name>
    <desc>STR_NPC_NO_NAME</desc>
    <dir>Monster/Worm</dir>
    <mesh>Worm</mesh>
    <material>mat_mob_reptile</material>
    <show_dmg_decal>0</show_dmg_decal>
    <ui_type>general</ui_type>
    <cursor_type>none</cursor_type>
    <hide_path>0</hide_path>
    <erect>1</erect>
    <bound_radius>
      <front>1.200000</front>
      <side>3.456000</side>
      <upper>3.000000</upper>
    </bound_radius>
    <scale>10</scale>
    <weapon_scale>100</weapon_scale>
    <altitude>0.000000</altitude>
    <stare_angle>75.000000</stare_angle>
    <stare_distance>20.000000</stare_distance>
    <move_speed_normal_walk>0.000000</move_speed_normal_walk>
    <art_org_move_speed_normal_walk>0.000000</art_org_move_speed_normal_walk>
    <move_speed_normal_run>0.000000</move_speed_normal_run>
    <move_speed_combat_run>0.000000</move_speed_combat_run>
    <art_org_speed_combat_run>0.000000</art_org_speed_combat_run>
    <in_time>0.100000</in_time>
    <out_time>0.500000</out_time>
    <neck_angle>90.000000</neck_angle>
    <spine_angle>10.000000</spine_angle>
    <ammo_bone>Bip01 Head</ammo_bone>
    <ammo_fx>skill_stoneshard.stoneshard.ammo</ammo_fx>
    <ammo_speed>50</ammo_speed>
    <pushed_range>0.000000</pushed_range>
    <hpgauge_level>3</hpgauge_level>
    <magical_skill_boost>0</magical_skill_boost>
    <attack_delay>2000</attack_delay>
    <ai_name>SummonSkillArea</ai_name>
    <tribe>General</tribe>
    <pet_ai_name>Pet</pet_ai_name>
    <sensory_range>15.000000</sensory_range>
  </npc_client>
</npc_clients>

文件 B.xml 的示例：

<?xml version="1.0" encoding="utf-16"?>
<strings>
  <string>
    <id>350000</id>
    <name>STR_NPC_NO_NAME</name>
    <body> </body>
  </string>
</strings>

原文

I have 2 xml files 1 with 115mb size and another with 34mb size.

Wiile reading file A there is 1 field called desc that relations it with file B where I retrieve the field id from file B where desc.file A is iqual to name.file B.

file A is already too big then I have to search inside file B and it takes a very long time to complete.

How could I speed up this proccess or what would be a better approch to do it ?

current code I am using:

#!/usr/bin/perl

use strict;
use warnings;

use XML::Simple qw(:strict XMLin);

my $npcs = XMLin('Client/client_npcs.xml', KeyAttr => { }, ForceArray => [ 'npc_client' ]);
my $strings = XMLin('Client/client_strings.xml', KeyAttr => { }, ForceArray => [ 'string' ]);

my ($nameid,$rank);

open (my $fh, '>>', 'Output/npc_templates.xml');
print $fh "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<npc_templates xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:noNamespaceSchemaLocation=\"npcs.xsd\">\n";
foreach my $npc ( @{ $npcs->{npc_client} } ) {
        if (defined $npc->{desc}) {
                foreach my $string (@{$strings->{string}}) {
                        if (defined $string->{name} && $string->{name} =~ /$npc->{desc}/i) {
                                $nameid = $string->{id};
                                last;
                        }
                }
        } else {
                $nameid = "";
        }

        if (defined $npc->{hpgauge_level} && $npc->{hpgauge_level} > 25 && $npc->{hpgauge_level} < 28) {
            $rank = 'LEGENDARY';
        } elsif (defined $npc->{hpgauge_level} && $npc->{hpgauge_level} > 21 && $npc->{hpgauge_level} < 23) {
            $rank = 'HERO';
        } elsif (defined $npc->{hpgauge_level} && $npc->{hpgauge_level} > 10 && $npc->{hpgauge_level} < 15) {
            $rank = 'ELITE';
        } elsif (defined $npc->{hpgauge_level} && $npc->{hpgauge_level} > 0 && $npc->{hpgauge_level} < 11) {
            $rank = 'NORMAL';
        } else {
            $rank = $gauge;
        }

        print $fh qq|\t<npc_template npc_id="$npc->{id}" name="$npc->{name}" name_id="$nameid" height="$npc->{scale}" rank="$rank" tribe="$npc->{tribe}" race="$npc->{race_type}" hp_gauge="$npc->{hpgauge_level}"/>\n|;
}
print $fh "</<npc_templates>";
close($fh);

example of file A.xml:

<?xml version="1.0" encoding="utf-16"?>
<npc_clients>
  <npc_client>
    <id>200000</id>
    <name>SkillZone</name>
    <desc>STR_NPC_NO_NAME</desc>
    <dir>Monster/Worm</dir>
    <mesh>Worm</mesh>
    <material>mat_mob_reptile</material>
    <show_dmg_decal>0</show_dmg_decal>
    <ui_type>general</ui_type>
    <cursor_type>none</cursor_type>
    <hide_path>0</hide_path>
    <erect>1</erect>
    <bound_radius>
      <front>1.200000</front>
      <side>3.456000</side>
      <upper>3.000000</upper>
    </bound_radius>
    <scale>10</scale>
    <weapon_scale>100</weapon_scale>
    <altitude>0.000000</altitude>
    <stare_angle>75.000000</stare_angle>
    <stare_distance>20.000000</stare_distance>
    <move_speed_normal_walk>0.000000</move_speed_normal_walk>
    <art_org_move_speed_normal_walk>0.000000</art_org_move_speed_normal_walk>
    <move_speed_normal_run>0.000000</move_speed_normal_run>
    <move_speed_combat_run>0.000000</move_speed_combat_run>
    <art_org_speed_combat_run>0.000000</art_org_speed_combat_run>
    <in_time>0.100000</in_time>
    <out_time>0.500000</out_time>
    <neck_angle>90.000000</neck_angle>
    <spine_angle>10.000000</spine_angle>
    <ammo_bone>Bip01 Head</ammo_bone>
    <ammo_fx>skill_stoneshard.stoneshard.ammo</ammo_fx>
    <ammo_speed>50</ammo_speed>
    <pushed_range>0.000000</pushed_range>
    <hpgauge_level>3</hpgauge_level>
    <magical_skill_boost>0</magical_skill_boost>
    <attack_delay>2000</attack_delay>
    <ai_name>SummonSkillArea</ai_name>
    <tribe>General</tribe>
    <pet_ai_name>Pet</pet_ai_name>
    <sensory_range>15.000000</sensory_range>
  </npc_client>
</npc_clients>

example of file B.xml:

<?xml version="1.0" encoding="utf-16"?>
<strings>
  <string>
    <id>350000</id>
    <name>STR_NPC_NO_NAME</name>
    <body> </body>
  </string>
</strings>

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

风蛊 2024-10-09 13:21:03

以下是 XML::Twig 用法的示例。主要优点是它不会将整个文件保存在内存中，因此处理速度要快得多。下面的代码试图模拟问题中脚本的操作。

use XML::Twig;

my %strings = ();
XML::Twig->new(
    twig_handlers => {
        'strings/string' => sub {
            $strings{ lc $_->first_child('name')->text }
                = $_->first_child('id')->text
        },
    }
)->parsefile('B.xml');

print "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<npc_templates xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:noNamespaceSchemaLocation=\"npcs.xsd\">\n";
XML::Twig->new(
    twig_handlers => {
        'npc_client' => sub {
            my $nameid = eval { $strings{ lc $_->first_child('desc')->text } };

            # calculate rank as needed
            my $hpgauge_level = eval { $_->first_child('hpgauge_level')->text };
            $rank = $hpgauge_level >= 28 ? 'ERROR'
                  : $hpgauge_level  > 25 ? 'LEGENDARY'
                  : $hpgauge_level  > 21 ? 'HERO'
                  : $hpgauge_level  > 10 ? 'ELITE'
                  : $hpgauge_level  >  0 ? 'NORMAL'
                  :                        $hpgauge_level;

            my $npc_id    = eval { $_->first_child('id')->text };
            my $name      = eval { $_->first_child('name')->text };
            my $tribe     = eval { $_->first_child('tribe')->text };
            my $scale     = eval { $_->first_child('scale')->text };
            my $race_type = eval { $_->first_child('race_type')->text };
            print
                qq|\t<npc_template npc_id="$npc_id" name="$name" name_id="$nameid" height="$scale" rank="$rank" tribe="$tribe" race="$race_type" hp_gauge="$hpgauge_level"/>\n|;
            $_->purge;
        }
    }
)->parsefile('A.xml');
print "</<npc_templates>";

Here is example of XML::Twig usage. The main advantage is that it is not holding whole file in memory, so processing is much faster. The code below is trying to emulate operation of script from question.

use XML::Twig;

my %strings = ();
XML::Twig->new(
    twig_handlers => {
        'strings/string' => sub {
            $strings{ lc $_->first_child('name')->text }
                = $_->first_child('id')->text
        },
    }
)->parsefile('B.xml');

print "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<npc_templates xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\" xsi:noNamespaceSchemaLocation=\"npcs.xsd\">\n";
XML::Twig->new(
    twig_handlers => {
        'npc_client' => sub {
            my $nameid = eval { $strings{ lc $_->first_child('desc')->text } };

            # calculate rank as needed
            my $hpgauge_level = eval { $_->first_child('hpgauge_level')->text };
            $rank = $hpgauge_level >= 28 ? 'ERROR'
                  : $hpgauge_level  > 25 ? 'LEGENDARY'
                  : $hpgauge_level  > 21 ? 'HERO'
                  : $hpgauge_level  > 10 ? 'ELITE'
                  : $hpgauge_level  >  0 ? 'NORMAL'
                  :                        $hpgauge_level;

            my $npc_id    = eval { $_->first_child('id')->text };
            my $name      = eval { $_->first_child('name')->text };
            my $tribe     = eval { $_->first_child('tribe')->text };
            my $scale     = eval { $_->first_child('scale')->text };
            my $race_type = eval { $_->first_child('race_type')->text };
            print
                qq|\t<npc_template npc_id="$npc_id" name="$name" name_id="$nameid" height="$scale" rank="$rank" tribe="$tribe" race="$race_type" hp_gauge="$hpgauge_level"/>\n|;
            $_->purge;
        }
    }
)->parsefile('A.xml');
print "</<npc_templates>";

回复收藏 0 原文

傻比既视感 2024-10-09 13:21:03

从文件 A 中获取所有有趣的“desc”字段并将它们放入哈希中。您只需解析一次，但如果仍然需要很长时间，请查看 XML：：树枝。
解析文件 B. 一次并提取您需要的内容。使用哈希。

看起来您只需要 xml 文件的一部分。 XML::Twig 可以仅解析您感兴趣的元素，并使用“twig_roots”参数丢弃其余元素。 XML::Simple 更容易上手。

回复收藏 0 原文

三寸金莲 2024-10-09 13:21:03

虽然我无法帮助您了解 Perl 代码的细节，但在处理大量 XML 数据时有一些通用准则。概括地说，有 2 种 XML API：基于 DOM 的和基于 Stream 的。基于 Dom 的 API（如 XML DOM）将在用户级 API 变得“可用”之前将整个 XML 文档解析到内存中，而使用基于流的 API（如 SAX），实现不需要解析整个 XML 文档。基于流的解析器的一个好处是它们通常使用更少的内存，因为它们不需要立即将整个 XML 文档保存在内存中 - 这在处理大型 XML 文档时显然是一件好事。看看这里的 XML::Simple 文档，似乎有可能提供 SAX 支持 - 您尝试过吗？

回复收藏 0 原文