如何使用perl消除xml文件中的标签名称

发布于 2024-12-17 06:02:19 字数 1743 浏览 0 评论 0原文

我在一个文件夹中有多个 XML 文件,所以我编写了这样的脚本来组合成一个 xml 文件,

#!/usr/bin/perl
use warnings;
use XML::LibXML;
use Carp;
use File::Find;
use File::Spec::Functions qw( canonpath );
use XML::LibXML::Reader;
use Digest::MD5 'md5';

if ( @ARGV == 0 ) {
push @ARGV, "c:/main/work";
warn "Using default path $ARGV[0]\n  Usage: $0  path ...\n";
}

open( my $allxml, '>', "all_xml_contents.combined.xml" )
 or die "can't open output xml file for writing: $!\n";
print $allxml '<?xml version="1.0" encoding="UTF-8"?>',
"\n<Shiporder xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\">\n";
 my %shipto_md5;
find(
sub {
return unless ( /(_stc\.xml)$/ and -f );
extract_information();
return;
 },
@ARGV
);

print $allxml "</Shiporder>\n";

sub extract_information {
my $path = $_;
 if ( my $reader = XML::LibXML::Reader->new( location => $path )) {
while ( $reader->nextElement( 'data' )) {
    my $elem = $reader->readOuterXml();
    my $md5 = md5( $elem );
    print $allxml $reader->readOuterXml() unless ( $shipto_md5{$md5}++ );
 }
 }
return;
}

它将所有 xml 文件打印到一个 xml 中,如下所示。

 all_xml.combined.xml
 <?xml version="1.0" encoding="UTF-8"?>
<student specification xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <student>
<name>johan</name>
 </student>

<student>
<name>benny</name>
</student>

 <student>
<name>kent</name>
 </student>

 </student specification>

但我在一个 xml 文件中还有一个节点信息,我尝试在 while 循环中像这样提取该信息。

    $reader->nextElement( 'details' );
     $information = $reader->readInnerXml();

但我如何将此信息添加到输出文件中,请帮助我解决这个问题。

I have multiple XML files in a folder,so I written script like this to combine into one xml file

#!/usr/bin/perl
use warnings;
use XML::LibXML;
use Carp;
use File::Find;
use File::Spec::Functions qw( canonpath );
use XML::LibXML::Reader;
use Digest::MD5 'md5';

if ( @ARGV == 0 ) {
push @ARGV, "c:/main/work";
warn "Using default path $ARGV[0]\n  Usage: $0  path ...\n";
}

open( my $allxml, '>', "all_xml_contents.combined.xml" )
 or die "can't open output xml file for writing: $!\n";
print $allxml '<?xml version="1.0" encoding="UTF-8"?>',
"\n<Shiporder xmlns:xsi=\"http://www.w3.org/2001/XMLSchema-instance\">\n";
 my %shipto_md5;
find(
sub {
return unless ( /(_stc\.xml)$/ and -f );
extract_information();
return;
 },
@ARGV
);

print $allxml "</Shiporder>\n";

sub extract_information {
my $path = $_;
 if ( my $reader = XML::LibXML::Reader->new( location => $path )) {
while ( $reader->nextElement( 'data' )) {
    my $elem = $reader->readOuterXml();
    my $md5 = md5( $elem );
    print $allxml $reader->readOuterXml() unless ( $shipto_md5{$md5}++ );
 }
 }
return;
}

It printing all xml files into one xml like this.

 all_xml.combined.xml
 <?xml version="1.0" encoding="UTF-8"?>
<student specification xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
 <student>
<name>johan</name>
 </student>

<student>
<name>benny</name>
</student>

 <student>
<name>kent</name>
 </student>

 </student specification>

but I have one more node information in one xml file, i tried to extract that information like this in while loop.

    $reader->nextElement( 'details' );
     $information = $reader->readInnerXml();

but how can i add this information to output file, please help me with this problem.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

像极了他 2024-12-24 06:02:19

三个明显的点。

  1. 您正在加载 XML::LibXML 模块,但没有使用它。
  2. 有问题的 XML 声明始终位于输入文件的第一行。那么为什么不直接跳过第一行呢?
  3. 您最终得到的文件将不是有效的 XML。 XML 文档需要一个根元素。因此,您需要创建另一个元素(可能是)来包围其他文件中的所有数据。

Three obvious points.

  1. You're loading the XML::LibXML module but not making any use of it.
  2. The problematic XML declaration is always the first line of the input files. So why not just skip the first line?
  3. The file you will end up with will not be valid XML. An XML document needs a single root element. So you'll need to create another element (perhaps <students>) that surrounds all of the data from the other files.
逐鹿 2024-12-24 06:02:19

您可以切换到 XML::Twig 吗?它提供了处理标签的绝佳方法。

也许您需要类似的东西

 my $twig=XML::Twig->new(   
    twig_handlers => 
      { 
        **student with specification** => sub { $_->delete;       },  # remove hidden elements
      },

您需要修改学生的规范才能为您工作。抱歉,我没有太多时间,否则我就写完整的代码了。

Will it be possible for you to switch to XML::Twig? It provides excellent way of handling the tags.

Probably you need something like

 my $twig=XML::Twig->new(   
    twig_handlers => 
      { 
        **student with specification** => sub { $_->delete;       },  # remove hidden elements
      },

You need to modify the student with specification to work for you. Sorry, I don't have much time, otherwise I would have written complete code.

流心雨 2024-12-24 06:02:19

下面是一些使用 DOMDocument() 执行此操作的代码

总体而言,
1)从字符串或类似内容创建父文档
2)加载每个文件,导入并追加
3) 保存结果。

在 XML 编程中,使用 XML 解析器函数通常比字符串操作更好。

祝你好运。

function loadXMLString( $strXML ) {
    $xmlDoc = new DOMDocument();
    $xmlDoc->formatOutput = true; 
    $xmlDoc->loadXML( $strXML );
    return $xmlDoc;
}

function loadXMLFile( $strFileName, $defaultXML=null ) {
    $xmlDoc = new DOMDocument();
    if( file_exists( $strFileName )  ){
        $xmlDoc->load( $strFileName );
    } else {
        if( $defaultXML == null  ) {
            throw new Exception( "Cannot locate file: " . $strFileName . " no default specified." );
        } else {
            // create it, if default XML is supplied
            return $this->loadXMLString( $defaultXML );
        } 
    }
    return $xmlDoc;
}


$xmlMain = loadXMLString( "<xmlparent/>" );

$xmlChild = loadXMLFile( "test1.xml" );
$ndTemp = $xmlMain->importNode( $xmlChild->documentElement, true );
$xmlMain->documentElement->appendChild( $ndTemp );

$xmlChild = loadXMLFile( "test2.xml" );
$ndTemp = $xmlMain->importNode( $xmlChild->documentElement, true );
$xmlMain->documentElement->appendChild( $ndTemp );

$xmlMain->save( "all.xml" );

Here's some code that does it using DOMDocument()

Over all,
1) Create a parent document from a string or similar
2) Load each file, import, and append
3) Save the results.

It's usually better in XML programming to use XML parser functions, rather than string manipulation.

Good luck.

function loadXMLString( $strXML ) {
    $xmlDoc = new DOMDocument();
    $xmlDoc->formatOutput = true; 
    $xmlDoc->loadXML( $strXML );
    return $xmlDoc;
}

function loadXMLFile( $strFileName, $defaultXML=null ) {
    $xmlDoc = new DOMDocument();
    if( file_exists( $strFileName )  ){
        $xmlDoc->load( $strFileName );
    } else {
        if( $defaultXML == null  ) {
            throw new Exception( "Cannot locate file: " . $strFileName . " no default specified." );
        } else {
            // create it, if default XML is supplied
            return $this->loadXMLString( $defaultXML );
        } 
    }
    return $xmlDoc;
}


$xmlMain = loadXMLString( "<xmlparent/>" );

$xmlChild = loadXMLFile( "test1.xml" );
$ndTemp = $xmlMain->importNode( $xmlChild->documentElement, true );
$xmlMain->documentElement->appendChild( $ndTemp );

$xmlChild = loadXMLFile( "test2.xml" );
$ndTemp = $xmlMain->importNode( $xmlChild->documentElement, true );
$xmlMain->documentElement->appendChild( $ndTemp );

$xmlMain->save( "all.xml" );
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文