如何在 Perl 中从固定宽度格式中提取列?
我正在编写一个 Perl 脚本来运行并抓取各种数据元素,例如:
1253592000
1253678400 86400 6183.000000
1253764800 86400 4486.000000
1253851200 36.000000 86400 10669.000000
1253937600 0.000000 86400 9126.000000
1254024000 0.000000 86400 2930.000000
1254110400 0.000000 86400 2895.000000
1254196800 0.000000 8828.000000
我可以毫无问题地抓取该文本文件的每一行。
我有工作正则表达式来获取每个字段。一旦我将行放入变量中,即 $line - 如何获取每个字段并将它们放入自己的变量中,即使它们具有不同的分隔符?
I'm writing a Perl script to run through and grab various data elements such as:
1253592000
1253678400 86400 6183.000000
1253764800 86400 4486.000000
1253851200 36.000000 86400 10669.000000
1253937600 0.000000 86400 9126.000000
1254024000 0.000000 86400 2930.000000
1254110400 0.000000 86400 2895.000000
1254196800 0.000000 8828.000000
I can grab each line of this text file no problem.
I have working regex to grab each of those fields. Once I have the line in a variable, i.e. $line - how can I grab each of those fields and place them into their own variables even though they have different delimiters?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
此示例说明如何使用空格作为分隔符 (split) 或使用固定列布局(unpack)。使用
unpack
如果您使用大写字母(A10 等),将为您删除空格。 注意:正如 brian d foy 指出的那样,split
方法对于缺少字段(例如第二行数据)的情况效果不佳,因为该字段位置信息将会丢失;unpack
是此处的方法,除非我们误解了您的数据。This example illustrates how to parse the line either with whitespace as the delimiter (split) or with a fixed-column layout (unpack). With
unpack
if you use upper-case (A10 etc), whitespace will be removed for you. Note: as brian d foy points out, thesplit
approach does not work well for a situation with missing fields (for example, the second line of data), because the field position information will be lost;unpack
is the way to go here, unless we are misunderstanding your data.使用 我的模块
DataExtract::FixedWidth
。它是功能最齐全且经过充分测试的,适用于在 Perl 中使用固定宽度列。如果这还不够快,您可以传入unpack_string
并消除对边界进行启发式检测的需要。Use my module
DataExtract::FixedWidth
. It is the most full featured, and well tested, for working with Fixed Width columns in perl. If this isn't fast enough you can pass in anunpack_string
and eliminate the need for heuristic detection of boundaries.我不确定列名称和格式,但您应该能够使用 Text::FixedWidth
I'm unsure of the column names and formatting but you should be able to adjust this recipe to your liking using Text::FixedWidth
您可以分割线。看起来你的分隔符只是空格?您可以执行以下操作:
这将匹配所有空白。然后,您可以进行边界检查并通过 $line[0]、$line[1] 等访问每个字段。Split
还可以采用正则表达式而不是字符串作为分隔符。
这可能会做同样的事情。
You can split the line. It appears that your delimiter is just whitespace? You can do something on the order of:
This will match all whitespace. You can then do bounds checking and access each field via $line[0], $line[1], etc.
Split can also take a regular expression rather than a string as a delimiter as well.
This might do the same thing.
如果所有字段都具有相同固定宽度并使用空格格式化,则可以使用以下
分割
:其中
N
是字段的with 。这将为每个空字段产生一个空间。If all fields have the same fixed width and are formatted with spaces, you can use the following
split
:where
N
is the with of the field. This will yield a space for each empty field.固定宽度定界可以这样完成:
我的 Perl 非常生疏,所以我确信那里存在语法错误。但这就是要点。
Fixed width delimiting can be done like this:
My Perl is very rusty so I am sure there are syntax errors there. but that is the gist of it.