如何在 perl 中有效地搜索/替换文件中的某些字符串?
我的文件如下所示:
<MAIN>
<SUB_MAIN>one</SUB_MAIN>
<VER>version#</VER>
(OTHER STUFF...)
<LOCATION>PATH</LOCATION>
</MAIN>
<MAIN>
<SUB_MAIN>two</SUB_MAIN>
<VER>version#</VER>
(OTHER STUFF...)
<LOC>PATH</LOC>
</MAIN>
我想要做的是搜索 SUB_MAIN
的值(假设有一个),如果找到它,则查找 LOCATION
的值。转到该位置进行一些同步,从那里获取新版本并更新 VER
信息。
我当前的代码有大约三个循环,而且很丑陋。骨架是这样的:
$value = "one|two|three";
# for each line in file
while ($line < @FileDat) {
# see if it is a sub module?
if ( $line =~ /\<SUB_MAIN\>$value\<\/SUB_MAIN\>/ )
{
$found_it = 0;
while (!$found_it)
{
$lineNum++;
if ( $FileDat[$lineNum] =~ /\<VER\>\d+\<\/VER\>/ )
{
$currIndex = $lineNum;
while(1)
{
$lineNum++;
if ( $FileDat[$lineNum] =~ /\<LOC\>(.+)\<\/LOC\>/ )
{ #DO SOME STUFF...
$found_it = 1;
last;
}
}
#replace version #
$FileDat[$currIndex] = " <VER>$latestChangeList</VER>\n";
}
}
}
$lineNum++;
}
# write the modified array to new file
print NEWCFGFILEPTR @FileDat;
close(OPEN_FILES);
我怎样才能让它变得更好?
谢谢。
My file looks like this:
<MAIN>
<SUB_MAIN>one</SUB_MAIN>
<VER>version#</VER>
(OTHER STUFF...)
<LOCATION>PATH</LOCATION>
</MAIN>
<MAIN>
<SUB_MAIN>two</SUB_MAIN>
<VER>version#</VER>
(OTHER STUFF...)
<LOC>PATH</LOC>
</MAIN>
What I want to do is to search for the value of SUB_MAIN
lets say one, and if I find it then look for the value of LOCATION
. Go to that location do some syncing get a new version from there and update the VER
information.
My current code has like three loops and is ugly. The skeleton is like this:
$value = "one|two|three";
# for each line in file
while ($line < @FileDat) {
# see if it is a sub module?
if ( $line =~ /\<SUB_MAIN\>$value\<\/SUB_MAIN\>/ )
{
$found_it = 0;
while (!$found_it)
{
$lineNum++;
if ( $FileDat[$lineNum] =~ /\<VER\>\d+\<\/VER\>/ )
{
$currIndex = $lineNum;
while(1)
{
$lineNum++;
if ( $FileDat[$lineNum] =~ /\<LOC\>(.+)\<\/LOC\>/ )
{ #DO SOME STUFF...
$found_it = 1;
last;
}
}
#replace version #
$FileDat[$currIndex] = " <VER>$latestChangeList</VER>\n";
}
}
}
$lineNum++;
}
# write the modified array to new file
print NEWCFGFILEPTR @FileDat;
close(OPEN_FILES);
How can I make it better?
Thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
使用 XML::Simple。没有必要重新发明轮子,除非你打算让它变得更好,我非常怀疑这是你的任务。
Use XML::Simple. There is no need to reinvent the wheel, unless you are planning on making it better, which I highly doubt that this is your task.
实际上,使用 XML 解析器比仅使用 XML 模块要复杂一些,因为您拥有的不是格式良好的 XML。格式良好的 XML 文件将具有单个根,因此所有 MAIN 元素都将包装在单个元素中。
不过,有一种相对简单的方法可以伪造它,即将 XML 实体中引用的文件包装在适当的高级元素中。
另外,在您的示例数据中,第一个 MAIN 中有一个 LOCATION 元素,然后第二个 MAIN 中有一个 LOC 元素,我认为这是剪切粘贴错误。
这是使用 XML::Twig 执行此操作的一种方法,它可以处理任何大小的输入文件(包括大到适合内存),并且可以输出到标准输出。
如果您的输入文件不是 UTF-8,您可能需要更改包装器以将正确的编码添加到 XML 声明中。如果使用纯 ASCII,那么就很好(如果添加 UTF-8 字符,它仍然可以工作)。
如果您不想使用 XML::Twig,则可以使用相同的技术来创建可由 XML::Simple 或您想要使用的任何其他模块读取的正确 XML。
Actually, using an XML parser is a bit more complex than just using an XML module, since what you have is NOT well-formed XML. A well-formed XML file would have a single root, so all the MAIN elements would be wrapped in a single element.
There is a relatively simple way to fake it though, which is to wrap your file, referenced in an XML entity, in a proper high-level element.
Also, in your example data, you have a LOCATION element in the first MAIN, then a LOC element in the second MAIN, I assume it's a cut'n paste error.
Here is a way to do this with XML::Twig, that would work with an input file of any size (including to big to fit in memory), and that would output to the standard output.
If your input file is NOT in UTF-8 you may need to change the wrapper to add the proper encoding to the XML declaration. If it is in pure ASCII is used, then you're good (and should UTF-8 characters be added, it will still work).
If you don't want to use XML::Twig, the same technique applies to create proper XML that can be read by XML::Simple or whatever other module you want to use.
您有一个 XML 文件。不要使用正则表达式进行解析(这通常被认为是一个坏主意),而是尝试使用现有的 XML 解析模块之一,例如 XML::解析器。还有许多其他类似的模块,您可以通过 搜索
xml
找到它们在 search.cpan.org 上,但这是一个很好的。You have an XML file. Rather than parsing that with regular expressions (which is generally considered to be a Bad Idea), try using one of the existing XML parsing modules, like XML::Parser. There are many other modules like it, which you can find by searching for
xml
on search.cpan.org, but that's a good one.