在 Perl 中解析多行数据
我有一些数据需要分析。数据是多行的,每个块由换行符分隔。所以,就像这样,
Property 1: 1234
Property 2: 34546
Property 3: ACBGD
Property 1: 1234
Property 4: 4567
Property 1: just
Property 3: an
Property 5: simple
Property 6: example
我需要过滤掉那些存在某些特定属性的数据块。例如,仅那些具有属性 4 的块,仅具有属性 3 和 6 的块等。我可能还需要根据这些属性的值进行选择,因此例如仅那些具有属性 3 且其值为 '一个'。
我将如何在 Perl 中做到这一点。我尝试用“\n”分割它,但似乎无法正常工作。我错过了什么吗?
I have some data that I need to analyze. The data is multilined and each block is separated by a newline. So, it is something like this
Property 1: 1234
Property 2: 34546
Property 3: ACBGD
Property 1: 1234
Property 4: 4567
Property 1: just
Property 3: an
Property 5: simple
Property 6: example
I need to filter out those data blocks that have some particular Property present. For example, only those that have Property 4, only those that have Property 3 and 6 both etc. I might also need to choose based upon the value at these Properties, so for example only those blocks that have Property 3 and its value is 'an'.
How would I do this in Perl. I tried splitting it by "\n" but didn't seem to work properly. Am I missing something?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
使这个任务变得简单的秘诀是使用 $/ 变量将 Perl 置于“段落模式”。这样可以轻松地一次处理一个记录。然后你可以用 grep 之类的东西过滤它们。
在该示例中,我将每条记录作为纯文本处理。对于更复杂的工作,我可能会将每个记录转换为散列。
The secret to making this task simple is to use the $/ variable to put Perl into "paragraph mode". That makes it easy to process your records one at a time. You can then filter them with something like grep.
In that example I'm processing each record as plain text. For more complex work, I'd probably turn each record into a hash.
取决于每个属性集的大小以及您拥有多少内存...
我会使用一个简单的状态机,它顺序扫描文件 - 使用逐行顺序扫描,而不是多行 - 添加每个属性/id /value 为以 id 为键的哈希值。当您收到空行或文件结尾时,确定是否应过滤掉散列的元素,并根据需要发出它们,然后重置散列。
Dependent on the size of each property set and how much memory you have...
I'd use a simple state machine that scans sequentially through the file - with a line-by-line sequential scan, not multiline - adding each property/id/value to a hash keyed on id. When you get a blank line or end-of-file, determine whether the elements of the hash should be filtered in or out, and emit them as necessary, then reset the hash.
又快又脏:
Quick and dirty:
假设该脚本名为
propertyParser.pl
,并且您有一个包含属性和值的文件,名为properties.txt
。您可以这样调用:使用所有数据填充
$propertyRef
后,您可以循环遍历元素并根据您需要应用的任何规则(例如某些键和/或值组合:Let's say this script is called
propertyParser.pl
and you have a file containing the properties and values calledproperties.txt
. You could call this as follows:Once you have populated
$propertyRef
with all your data, you can then loop through elements and filter them based on whatever rules you need to apply, such as certain key and/or value combinations:您的记录分隔符应为
“\n\n”
。每行都以 1 结尾,并且可以通过双换行符来区分块。使用这个想法,可以很容易地过滤掉具有属性 4 的块。两个表达式都使用多行 ('m') 标志,因此
^
适用于任何行开始。最后一个使用标志在“.”中包含换行符。表达式('s')和扩展语法('x'),其中忽略表达式中的空格。如果数据相当小,您可以一次性处理所有数据,如下
所示: 结果如下:
Your record separator should be
"\n\n"
. Every line ends with one, and you differentiate a block by a double newline. Using this idea, it was rather easy to filter out the blocks with Property 4.Both expressions use a multiline ('m') flag, so that
^
applies to any line start. The last one uses the flag to include newlines in '.' expressions ('s') and the extended syntax ('x') which, among other things, ignores whitespace within the expression.If the data was rather small, you could process it all in one go like:
Which shows the result to be:
检查 $/ 变量将为您做什么,例如此处的解释。您可以将“行尾”分隔符设置为您喜欢的任何内容。您可以尝试将其设置为“\n\n”,
因为您的数据元素似乎是由空行分隔的,这将一一读取行的每个属性组。
您还可以将整个文件读入数组并从内存中处理它
my(@lines) =
Check what the $/ variable will do for you, for example explanation here. You can set the 'end of line' separator to be whatever you please. You could try setting it to '\n\n'
As your data elements seem to be deilmited by blank lines this will read each property group of lines one by one.
You could also read the entire file into an array and process it from memory
my(@lines) = <DATA>
假设您的数据存储在一个文件中(假设为 mydata.txt),您可以编写以下 perl 脚本(我们称他为 Bob.pl):
接下来,您只需启动 perl Bob.pl
perl Bob.pl < mydata.txt
,瞧!Assuming that your data are stored into a file (let's say mydata.txt), you could write the following perl script (let's call him Bob.pl):
Next, you just have to lauch
perl Bob.pl < mydata.txt
, and voila !关于问题的第一部分,您可以使用perl的“段落模式”读取记录
-00
命令行选项,例如:In relation to the first part of your question, you can read records in "paragraph mode" using perl's
-00
commandline option, for example: