在 Perl 中解析不规则文本文件
我是 Perl 编程新手,想了解如何使用 Perl 解析文本文件。 我有一个文本文件,其中格式不规则,我想将其解析为三个。
基本上,该文件包含与以下内容类似的文本:
;out;asoljefsaiouerfas'pozsirt'z
mysql_query("SELECT * FROM Table WHERE (value='true') OR (value2='true') OR (value3='true') ");
1234 434 3454
4if[9put[e]9sd=09q]024s-q]3-=04i
select ta.somefield, tc.somefield
from TableA ta INNER JOIN TableC tc on tc.somefield=ta.somefield
INNER JOIN TableB tb on tb.somefield=ta.somefield
ORDER by tb.somefield
234 4536 234
并且这种格式的列表还有很多。
所以我需要做的就是把它分成三部分来解析。即最上面的那个,进行哈希检查。第二个是 mysql 查询,第三个是解析这三个数字。由于某种原因,我不知道如何做到这一点。我使用 perl 中的“open”函数从文本文件中获取数据。然后我尝试使用“分割”函数来换行,但结果发现查询不是在一行或一个模式中,所以我不能像我想象的那样使用它。
I am new to perl programming and would like to know about parsing text files with perl.
I have a text file that has irregular formatting in it and I would like to parse it into three.
Basically the file includes text similar to these:
;out;asoljefsaiouerfas'pozsirt'z
mysql_query("SELECT * FROM Table WHERE (value='true') OR (value2='true') OR (value3='true') ");
1234 434 3454
4if[9put[e]9sd=09q]024s-q]3-=04i
select ta.somefield, tc.somefield
from TableA ta INNER JOIN TableC tc on tc.somefield=ta.somefield
INNER JOIN TableB tb on tb.somefield=ta.somefield
ORDER by tb.somefield
234 4536 234
and the list goes on with this format.
So what I need to do is to parse it in three. Namely the one on top, getting hash checks. The second is the mysql query and third would be to parse the three numbers. For some reason I do not get how to do this. I use the 'open' function in perl to get the data from the text file. And then I try to use the 'split' function for the line breaks but turns out the queries aren't in a single line or in a pattern so I can't use it that way as I have figured.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
假设:
考虑到这一点:
通过更改
$RS
($/
或$INPUT_RECORD_SEPARATOR
)到双换行符,我们改变记录的读入方式。这并不奇怪,但在我使用 Perl 的这些年里,我不得不做出记录分隔符是一些非常有趣的字符串,但有时只需要读取您想要读取的块即可。
Assumptions:
with that in mind:
By changing the
$RS
($/
or$INPUT_RECORD_SEPARATOR
) to double newlines, we change how records are read in.This is not so bizarre, but in my years with Perl, I have had to make the record separator some pretty interesting strings, but sometimes it's all it takes to read in just the chunk that you want to read.
哦,天哪。
我看到的算法是:
考虑到这一点,我提供了以下代码:
当然,要进行修改以适应边界条件。
Oh, oh GOD.
The algorithm I see is:
With that in mind, I present the following code:
Modify, of course, to suit boundary conditions.
以下似乎有效:
这是正则表达式的解释:
第一组始终只有一行,因为遇到下一行时它很懒,正则表达式将尝试从第二组开始匹配。此时,如果可以完成比赛的其余部分,则第二组将包含数字之前的所有后续行。
The following seems to work:
Here is an explanation for the regex:
The first group will always only be one line, since it is lazy when the next line is encountered the regex will try to start matching at the second group. At that point if the rest of the match can be completed that second group will contain all subsequent lines before the numbers.