如何在 perl 中有效地搜索/替换文件中的某些字符串?

发布于 2024-12-15 07:29:40 字数 1607 浏览 2 评论 0原文

我的文件如下所示:

<MAIN>  
  <SUB_MAIN>one</SUB_MAIN>  
  <VER>version#</VER>  
  (OTHER STUFF...)  
  <LOCATION>PATH</LOCATION>  
</MAIN>

<MAIN>  
  <SUB_MAIN>two</SUB_MAIN>  
  <VER>version#</VER>  
  (OTHER STUFF...)  
  <LOC>PATH</LOC>  
</MAIN>

我想要做的是搜索 SUB_MAIN 的值(假设有一个),如果找到它,则查找 LOCATION 的值。转到该位置进行一些同步,从那里获取新版本并更新 VER 信息。

我当前的代码有大约三个循环,而且很丑陋。骨架是这样的:

$value = "one|two|three";

# for each line in file
while ($line < @FileDat) {

    # see if it is a sub module?   
    if ( $line =~ /\<SUB_MAIN\>$value\<\/SUB_MAIN\>/ ) 
    {   
       $found_it = 0;

        while (!$found_it) 
        {       
            $lineNum++;     
            if ( $FileDat[$lineNum] =~ /\<VER\>\d+\<\/VER\>/ ) 
            {
                $currIndex = $lineNum;

                while(1)
                {
                   $lineNum++;
                   if ( $FileDat[$lineNum] =~ /\<LOC\>(.+)\<\/LOC\>/ ) 
                    {   #DO SOME STUFF...
                        $found_it = 1;
                        last;
                    }
                }               
                        #replace version #
                $FileDat[$currIndex] = "    <VER>$latestChangeList</VER>\n";
            }
        }
    }
    $lineNum++;
}

# write the modified array to new file
print NEWCFGFILEPTR @FileDat;

close(OPEN_FILES);

我怎样才能让它变得更好?
谢谢。

My file looks like this:

<MAIN>  
  <SUB_MAIN>one</SUB_MAIN>  
  <VER>version#</VER>  
  (OTHER STUFF...)  
  <LOCATION>PATH</LOCATION>  
</MAIN>

<MAIN>  
  <SUB_MAIN>two</SUB_MAIN>  
  <VER>version#</VER>  
  (OTHER STUFF...)  
  <LOC>PATH</LOC>  
</MAIN>

What I want to do is to search for the value of SUB_MAIN lets say one, and if I find it then look for the value of LOCATION. Go to that location do some syncing get a new version from there and update the VER information.

My current code has like three loops and is ugly. The skeleton is like this:

$value = "one|two|three";

# for each line in file
while ($line < @FileDat) {

    # see if it is a sub module?   
    if ( $line =~ /\<SUB_MAIN\>$value\<\/SUB_MAIN\>/ ) 
    {   
       $found_it = 0;

        while (!$found_it) 
        {       
            $lineNum++;     
            if ( $FileDat[$lineNum] =~ /\<VER\>\d+\<\/VER\>/ ) 
            {
                $currIndex = $lineNum;

                while(1)
                {
                   $lineNum++;
                   if ( $FileDat[$lineNum] =~ /\<LOC\>(.+)\<\/LOC\>/ ) 
                    {   #DO SOME STUFF...
                        $found_it = 1;
                        last;
                    }
                }               
                        #replace version #
                $FileDat[$currIndex] = "    <VER>$latestChangeList</VER>\n";
            }
        }
    }
    $lineNum++;
}

# write the modified array to new file
print NEWCFGFILEPTR @FileDat;

close(OPEN_FILES);

How can I make it better?
Thank you.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

尐偏执 2024-12-22 07:29:40

使用 XML::Simple。没有必要重新发明轮子,除非你打算让它变得更好,我非常怀疑这是你的任务。

Use XML::Simple. There is no need to reinvent the wheel, unless you are planning on making it better, which I highly doubt that this is your task.

[浮城] 2024-12-22 07:29:40

实际上,使用 XML 解析器比仅使用 XML 模块要复杂一些,因为您拥有的不是格式良好的 XML。格式良好的 XML 文件将具有单个根,因此所有 MAIN 元素都将包装在单个元素中。

不过,有一种相对简单的方法可以伪造它,即将 XML 实体中引用的文件包装在适当的高级元素中。

另外,在您的示例数据中,第一个 MAIN 中有一个 LOCATION 元素,然后第二个 MAIN 中有一个 LOC 元素,我认为这是剪切粘贴错误。

这是使用 XML::Twig 执行此操作的一种方法,它可以处理任何大小的输入文件(包括大到适合内存),并且可以输出到标准输出。

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

binmode( STDOUT, ':utf8'); # if your input file is in UTF-8

my $file= shift @ARGV;
# wrap the content of the file in <data>...</data> so it becomes well-formed XML
my $xml= qq{<?xml version="1.0"?>
            <!DOCTYPE data [ <!ENTITY file SYSTEM "$file">]>
            <data>&file;</data>
           };

XML::Twig->new( twig_handlers => { MAIN => \&main },
                keep_spaces => 1,
              )
         ->parse( $xml);

exit;

sub main
  { my( $t, $main)= @_;
    my $location= $main->field( 'LOCATION');
    $main->set_field( VER => get_version( $location));
    $main->print;
    $main->purge; # if the file is big and you want to free the memory
  }

sub get_version
  { my( $location)= @_;
    return "new.version.$location"; # the real code might be different!
  }

如果您的输入文件不是 UTF-8,您可能需要更改包装器以将正确的编码添加到 XML 声明中。如果使用纯 ASCII,那么就很好(如果添加 UTF-8 字符,它仍然可以工作)。

如果您不想使用 XML::Twig,则可以使用相同的技术来创建可由 XML::Simple 或您想要使用的任何其他模块读取的正确 XML。

Actually, using an XML parser is a bit more complex than just using an XML module, since what you have is NOT well-formed XML. A well-formed XML file would have a single root, so all the MAIN elements would be wrapped in a single element.

There is a relatively simple way to fake it though, which is to wrap your file, referenced in an XML entity, in a proper high-level element.

Also, in your example data, you have a LOCATION element in the first MAIN, then a LOC element in the second MAIN, I assume it's a cut'n paste error.

Here is a way to do this with XML::Twig, that would work with an input file of any size (including to big to fit in memory), and that would output to the standard output.

#!/usr/bin/perl

use strict;
use warnings;

use XML::Twig;

binmode( STDOUT, ':utf8'); # if your input file is in UTF-8

my $file= shift @ARGV;
# wrap the content of the file in <data>...</data> so it becomes well-formed XML
my $xml= qq{<?xml version="1.0"?>
            <!DOCTYPE data [ <!ENTITY file SYSTEM "$file">]>
            <data>&file;</data>
           };

XML::Twig->new( twig_handlers => { MAIN => \&main },
                keep_spaces => 1,
              )
         ->parse( $xml);

exit;

sub main
  { my( $t, $main)= @_;
    my $location= $main->field( 'LOCATION');
    $main->set_field( VER => get_version( $location));
    $main->print;
    $main->purge; # if the file is big and you want to free the memory
  }

sub get_version
  { my( $location)= @_;
    return "new.version.$location"; # the real code might be different!
  }

If your input file is NOT in UTF-8 you may need to change the wrapper to add the proper encoding to the XML declaration. If it is in pure ASCII is used, then you're good (and should UTF-8 characters be added, it will still work).

If you don't want to use XML::Twig, the same technique applies to create proper XML that can be read by XML::Simple or whatever other module you want to use.

甩你一脸翔 2024-12-22 07:29:40

您有一个 XML 文件。不要使用正则表达式进行解析(这通常被认为是一个坏主意),而是尝试使用现有的 XML 解析模块之一,例如 XML::解析器。还有许多其他类似的模块,您可以通过 搜索 xml 找到它们在 search.cpan.org 上,但这是一个很好的。

You have an XML file. Rather than parsing that with regular expressions (which is generally considered to be a Bad Idea), try using one of the existing XML parsing modules, like XML::Parser. There are many other modules like it, which you can find by searching for xml on search.cpan.org, but that's a good one.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文