用 perl 分割变化的字符串
我在 Perl 中有一堆字符串,它们看起来都是这样的:
10 NE HARRISBURG
4 E HASWELL
2 SE OAKLEY
6 SE REDBIRD
PROVO
6 W EADS
21 N HARRISON
我需要做的是删除城市名称之前的数字和字母。我遇到的问题是,每个城市的情况差异很大。数据几乎从不相同。是否可以删除这些数据并将其保存在单独的字符串中?
I have a bunch of strings in perl that all look like this:
10 NE HARRISBURG
4 E HASWELL
2 SE OAKLEY
6 SE REDBIRD
PROVO
6 W EADS
21 N HARRISON
What I am needing to do is remove the numbers and the letters from before the city names. The problem I am having is that it varies a lot from city to city. The data is almost never the same. Is it possible to remove this data and keep it in a separate string?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
试试这个:
您可以测试数组大小以确定字段的数量:
Try this:
You can test the array size to determine the number of fields:
根据 hoobs 我更改了正则表达式输出
:
对于 shinjuo,如果您只想运行一个字符串,您可以这样做:
并且为了避免对未初始化值发出警告,您必须测试 $beg 是否已定义:
according to hoobs i changed the regex
output:
for shinjuo, if you want to run only one string you can do :
and to avoid warning on uninitialized value you have to test if $beg is defined:
看起来您总是想要 split() 结果中的最后一个元素。或者你可以使用 m/(\S+)$/。
Looks like you always want the very last element in the result of split(). Or you can go with m/(\S+)$/.
难道我们不能假设总是有一个城市名称并且它出现在一行的最后吗?如果是这种情况,请将线分开并保留最后一部分。这是一个单行命令行解决方案:
输出:
更新 1
如果您编写了像 SAN FRANCISCO 这样的城市名称(下面的评论中发现了这种情况),则此解决方案将不起作用。
您的输入数据来自哪里?如果您自己生成,则应添加分隔符。如果有人为您生成了它,请要求他们使用分隔符重新生成它。解析它就变得轻而易举了。
Can't we assume there is always a city name and that it appears last on a line? If that's the case, split the line and keep the last portion of it. Here's a one liner command line solution:
Output:
Update 1
This solution won't work if you have composed city names like SAN FRANCISCO (case spotted in a comment below).
Where is your input data coming from? If you have generated it yourself, you should add delimiters. If someone generated it for you, ask them to regenerate it with delimiters. Parsing it will then become child's play.
正则表达式解决方案 解决
方案 1:保留所有内容(vol7ron 的电子邮件解决方案)
解决方案 2:去掉不需要的内容
更新:
使用 vol7ron 进行更改 的建议和示例,使用重复运算符有效。这将去掉前导数字和方向,并且如果数字或方向(或两者)丢失也不会中断。
Regex Solution
Solution 1: Keep everything (vol7ron's emailed solution)
Solution 2: Strip off what you don't need
Update:
Making the changes with vol7ron's suggestion and example, using the repetition operator worked. This will strip off leading digits and the direction and won't break if the digits or direction (or both) are missing.