如何在 perl 中有效地搜索/替换文件中的某些字符串？

发布于 2024-12-15 07:29:40 字数 1607 浏览 2 评论 0原文

我的文件如下所示：

<MAIN>  
  <SUB_MAIN>one</SUB_MAIN>  
  <VER>version#</VER>  
  (OTHER STUFF...)  
  <LOCATION>PATH</LOCATION>  
</MAIN>

<MAIN>  
  <SUB_MAIN>two</SUB_MAIN>  
  <VER>version#</VER>  
  (OTHER STUFF...)  
  <LOC>PATH</LOC>  
</MAIN>

我想要做的是搜索 SUB_MAIN 的值（假设有一个），如果找到它，则查找 LOCATION 的值。转到该位置进行一些同步，从那里获取新版本并更新 VER 信息。

我当前的代码有大约三个循环，而且很丑陋。骨架是这样的：

$value = "one|two|three";

# for each line in file
while ($line < @FileDat) {

    # see if it is a sub module?   
    if ( $line =~ /\<SUB_MAIN\>$value\<\/SUB_MAIN\>/ ) 
    {   
       $found_it = 0;

        while (!$found_it) 
        {       
            $lineNum++;     
            if ( $FileDat[$lineNum] =~ /\<VER\>\d+\<\/VER\>/ ) 
            {
                $currIndex = $lineNum;

                while(1)
                {
                   $lineNum++;
                   if ( $FileDat[$lineNum] =~ /\<LOC\>(.+)\<\/LOC\>/ ) 
                    {   #DO SOME STUFF...
                        $found_it = 1;
                        last;
                    }
                }               
                        #replace version #
                $FileDat[$currIndex] = "    <VER>$latestChangeList</VER>\n";
            }
        }
    }
    $lineNum++;
}

# write the modified array to new file
print NEWCFGFILEPTR @FileDat;

close(OPEN_FILES);

我怎样才能让它变得更好？
谢谢。

原文

My file looks like this:

<MAIN>  
  <SUB_MAIN>one</SUB_MAIN>  
  <VER>version#</VER>  
  (OTHER STUFF...)  
  <LOCATION>PATH</LOCATION>  
</MAIN>

<MAIN>  
  <SUB_MAIN>two</SUB_MAIN>  
  <VER>version#</VER>  
  (OTHER STUFF...)  
  <LOC>PATH</LOC>  
</MAIN>

What I want to do is to search for the value of SUB_MAIN lets say one, and if I find it then look for the value of LOCATION. Go to that location do some syncing get a new version from there and update the VER information.

My current code has like three loops and is ugly. The skeleton is like this:

$value = "one|two|three";

# for each line in file
while ($line < @FileDat) {

    # see if it is a sub module?   
    if ( $line =~ /\<SUB_MAIN\>$value\<\/SUB_MAIN\>/ ) 
    {   
       $found_it = 0;

        while (!$found_it) 
        {       
            $lineNum++;     
            if ( $FileDat[$lineNum] =~ /\<VER\>\d+\<\/VER\>/ ) 
            {
                $currIndex = $lineNum;

                while(1)
                {
                   $lineNum++;
                   if ( $FileDat[$lineNum] =~ /\<LOC\>(.+)\<\/LOC\>/ ) 
                    {   #DO SOME STUFF...
                        $found_it = 1;
                        last;
                    }
                }               
                        #replace version #
                $FileDat[$currIndex] = "    <VER>$latestChangeList</VER>\n";
            }
        }
    }
    $lineNum++;
}

# write the modified array to new file
print NEWCFGFILEPTR @FileDat;

close(OPEN_FILES);

How can I make it better?
Thank you.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

尐偏执 2024-12-22 07:29:40

使用 XML::Simple。没有必要重新发明轮子，除非你打算让它变得更好，我非常怀疑这是你的任务。

回复收藏 0 原文

[浮城] 2024-12-22 07:29:40

实际上，使用 XML 解析器比仅使用 XML 模块要复杂一些，因为您拥有的不是格式良好的 XML。格式良好的 XML 文件将具有单个根，因此所有 MAIN 元素都将包装在单个元素中。

不过，有一种相对简单的方法可以伪造它，即将 XML 实体中引用的文件包装在适当的高级元素中。

另外，在您的示例数据中，第一个 MAIN 中有一个 LOCATION 元素，然后第二个 MAIN 中有一个 LOC 元素，我认为这是剪切粘贴错误。

这是使用 XML::Twig 执行此操作的一种方法，它可以处理任何大小的输入文件（包括大到适合内存），并且可以输出到标准输出。

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

binmode( STDOUT, ':utf8'); # if your input file is in UTF-8

my $file= shift @ARGV;
# wrap the content of the file in <data>...</data> so it becomes well-formed XML
my $xml= qq{<?xml version="1.0"?>
            <!DOCTYPE data [ <!ENTITY file SYSTEM "$file">]>
            <data>&file;</data>
           };

XML::Twig->new( twig_handlers => { MAIN => \&main },
                keep_spaces => 1,
              )
         ->parse( $xml);

exit;

sub main
  { my( $t, $main)= @_;
    my $location= $main->field( 'LOCATION');
    $main->set_field( VER => get_version( $location));
    $main->print;
    $main->purge; # if the file is big and you want to free the memory
  }

sub get_version
  { my( $location)= @_;
    return "new.version.$location"; # the real code might be different!
  }

如果您的输入文件不是 UTF-8，您可能需要更改包装器以将正确的编码添加到 XML 声明中。如果使用纯 ASCII，那么就很好（如果添加 UTF-8 字符，它仍然可以工作）。

如果您不想使用 XML::Twig，则可以使用相同的技术来创建可由 XML::Simple 或您想要使用的任何其他模块读取的正确 XML。

Actually, using an XML parser is a bit more complex than just using an XML module, since what you have is NOT well-formed XML. A well-formed XML file would have a single root, so all the MAIN elements would be wrapped in a single element.

There is a relatively simple way to fake it though, which is to wrap your file, referenced in an XML entity, in a proper high-level element.

Also, in your example data, you have a LOCATION element in the first MAIN, then a LOC element in the second MAIN, I assume it's a cut'n paste error.

Here is a way to do this with XML::Twig, that would work with an input file of any size (including to big to fit in memory), and that would output to the standard output.

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

binmode( STDOUT, ':utf8'); # if your input file is in UTF-8

my $file= shift @ARGV;
# wrap the content of the file in <data>...</data> so it becomes well-formed XML
my $xml= qq{<?xml version="1.0"?>
            <!DOCTYPE data [ <!ENTITY file SYSTEM "$file">]>
            <data>&file;</data>
           };

XML::Twig->new( twig_handlers => { MAIN => \&main },
                keep_spaces => 1,
              )
         ->parse( $xml);

exit;

sub main
  { my( $t, $main)= @_;
    my $location= $main->field( 'LOCATION');
    $main->set_field( VER => get_version( $location));
    $main->print;
    $main->purge; # if the file is big and you want to free the memory
  }

sub get_version
  { my( $location)= @_;
    return "new.version.$location"; # the real code might be different!
  }

If your input file is NOT in UTF-8 you may need to change the wrapper to add the proper encoding to the XML declaration. If it is in pure ASCII is used, then you're good (and should UTF-8 characters be added, it will still work).

If you don't want to use XML::Twig, the same technique applies to create proper XML that can be read by XML::Simple or whatever other module you want to use.

回复收藏 0 原文