Perl 解析具有一个或多个字段的字符串
我有一个需要解析的字符串。它满足以下要求:
- 它由0个或多个键->值对组成。
- 密钥始终是 2 个字母。
- 该值是一个或多个数字。
- 键和值之间不会有空格。
- 各个对之间可能有也可能没有空格。
我可能看到的示例字符串:
- AB1234 //一个键->值对(键=AB,值=1234)
- AB1234 BC2345 //两个键->值对,用空格分隔
- AB1234BC2345 //两个键->值对,不以空格分隔
- //Empty Sting,没有键->值对
- AB12345601BC1234CD1232PE2343 //很多键->值对,没有空格
- AB12345601 BC1234 CD1232 PE2343 //很多键->值对,有空格
我需要构建这个字符串的Perl哈希。如果我能保证它是一对,我会做这样的事情:
$string =~ /([A-Z][A-Z])([0-9]+)/
$key = $1
$value = $2
$hash{$key} = $value
对于多个字符串,我可能会做一些事情,在上述正则表达式的每次匹配之后,我获取原始字符串的子字符串(排除第一个匹配),然后搜索再次。然而,我确信有一种更聪明的、perl 风格的方法来实现这一点。
希望我没有这么糟糕的数据源来处理——
乔纳森
I have a string I need to parse. It meets the following requirements:
- It is comprised of 0 or more key->value pairs.
- The key is always 2 letters.
- The value is one or more numbers.
- There will not be a space between the key and value.
- There may or may not be a space between individual pairs.
Example strings I may see:
- AB1234 //One key->value pair (Key=AB, Value=1234)
- AB1234 BC2345 //Two key->value pairs, separated by space
- AB1234BC2345 //Two key->value pairs, not separated by space
- //Empty Sting, No key->value pairs
- AB12345601BC1234CD1232PE2343 //Lots of key->value pairs, no space
- AB12345601 BC1234 CD1232 PE2343 //Lots of key->value pairs, with spaces
I need to build a Perl hash of this string. If I could guarantee it was 1 pair I would do something like this:
$string =~ /([A-Z][A-Z])([0-9]+)/
$key = $1
$value = $2
$hash{$key} = $value
For multiple strings, I could potentially do something where after each match of the above regex, I take a substring of the original string (exempting the first match) and then search again. However, I'm sure there's a more clever, perl-esque way to achieve this.
Wishing I didn't have such a crappy data source to deal with-
Jonathan
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
在具有全局标志的列表上下文中,正则表达式将返回 所有匹配的子字符串:
为了获得更大的不透明度,请删除模式匹配周围的括号:
%parts = $str =~ /([AZ][AZ])(\d+)/g;
。In a list context with the global flag, a regex will return all matched substrings:
For greater opacity, remove the parentheses around the pattern matching:
%parts = $str =~ /([A-Z][A-Z])(\d+)/g;
.你已经在那里了:
You are already there:
假设您的字符串肯定会与您的方案匹配(即不会有任何
A122
或ABC123
形式的字符串),那么这应该有效:Assuming your strings are definitely going to match your scheme (i.e. there won't be any strings of the form
A122
orABC123
), then this should work: