如何使用 Perl 提取数据列?
我有这种类型的字符串
NAME1 NAME2 DEPTNAME POSITION
JONH MILLER ROBERT JIM CS ASST GENERAL MANAGER
,我希望输出为 name1 name2 和位置,我如何使用 split/regex/trim/etc 而不使用 CPAN 模块来做到这一点?
I have strings of this kind
NAME1 NAME2 DEPTNAME POSITION
JONH MILLER ROBERT JIM CS ASST GENERAL MANAGER
I want the output to be name1 name2 and position how can i do it using split/regex/trim/etc and without using CPAN modules?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
这将取决于这些字段是否是固定长度字段,或者是否是制表符分隔的字段。最简单的(使用拆分)是如果它们是制表符分隔的。
如果它们是固定长度,并且假设它们都是 10 个字符长,那么您可以像这样解析它
It's going to depend on whether those are fixed length fields, or if they are tab separated. The easiest (using split) is if they are tab separated.
If they're fixed length, and assuming they are all, say, 10 characters long, you can parse it like
如果您的输入数据作为字符串数组 (@strings) 出现,这
将提取并修剪所需的信息。
和
(“|”是我为了更好地解释结果而添加的)
问候
rbo
If your input data comes in as an array of strings (@strings), this
would extract and trim the information needed.
and
(The '|' were included by me for better expalnation of the result)
Regards
rbo
假设字段之间的空格不固定,因此在两个或多个空格的基础上分割字符串,这样就不会像 JONH MILLER 这样将名称分成两部分。
Assuming that space between the fields are not fixed so split string on the basis of two or more spaces so that it will not break the Name like JONH MILLER into two parts.
从那里的样本来看,单个空格属于数据,但 2 个或更多连续空格则不属于数据。因此您可以轻松地分割成 2 个或更多空间。我添加的唯一内容是使用
List::MoreUtils: :网格
From the sample there, a single space belongs in the data, but 2 or more contiguous spaces do not. So you can easily split on 2 or more spaces. The only thing I add to this is the use of
List::MoreUtils::mesh
考虑在命令行中的 Perl 单行中使用自动拆分:
单行将拆分为两个或多个连续空格并打印第一个、第二个和第四个字段,对应于 NAME1、NAME2 和 POSITION 字段。
当然,如果您只有一个空格分隔 NAME1 和 NAME2 条目,则这种情况将会中断,但需要有关您的文件的更多信息才能确定最佳操作方案。
Consider using autosplit in a Perl one-liner from your command line:
The one-liner will split on two or more consecutive spaces and print the first, second and fourth fields, corresponding to NAME1, NAME2 and POSITION fields.
Of course, this will break if you have only a single space separating NAME1 and NAME2 entries, but more information is needed about your file in order to ascertain what the best course of action might be.
按空格分割:
这会将
$string
分割成子字符串列表。分隔符将是正则表达式\s+
,这意味着一个或多个空白字符。这包括空格、制表符和(除非我弄错了)换行符。编辑:我发现要求之一不是仅分割一个空间,而是分割两个或多个空间。我相应地修改了正则表达式。
To split on whitespace:
This will split
$string
into a list of substrings. The separator will be the regex\s+
, which means one or more whitespace characters. This includes spaces, tabs, and (unless I'm mistaken) newlines.Edit: I see that one of the requirements is not to split on only one space, but to split on two or more. I modified the regex accordingly.