Python 正则表达式将字符串作为模式进行匹配并返回数字
我有一些行代表文本文件中的一些数据。它们都采用以下格式:
s = 'TheBears SUCCESS Number of wins : 14'
它们都以名称开头,然后是空格和文本“SUCCESS 获胜次数:”,最后是获胜次数 n1。有多个字符串,每个字符串都有不同的名称和值。我正在尝试编写一个程序,可以解析这些字符串中的任何一个,并返回数据集的名称和字符串末尾的数值。我正在尝试使用正则表达式来执行此操作,并且得出以下结论:
import re
def winnumbers(s):
pattern = re.compile(r"""(?P<name>.*?) #starting name
\s*SUCCESS #whitespace and success
\s*Number\s*of\s*wins #whitespace and strings
\s*\:\s*(?P<n1>.*?)""",re.VERBOSE)
match = pattern.match(s)
name = match.group("name")
n1 = match.group("n1")
return (name, n1)
到目前为止,我的程序可以返回名称,但问题随之而来。他们都有文字“成功获胜次数:”所以我的想法是找到一种方法来匹配这个文字。但我意识到我匹配精确子字符串的方法现在不正确。有没有办法将整个子字符串作为模式的一部分进行匹配?我最近读了很多关于正则表达式的文章,但没有发现这样的内容。我对编程仍然很陌生,非常感谢任何帮助。
最终,我将使用 float() 将 n1 作为数字返回,但我将其省略,因为它现在无法正确找到第一个位置的数字,并且只会返回错误。
I have some lines that represent some data in a text file. They are all of the following format:
s = 'TheBears SUCCESS Number of wins : 14'
They all begin with the name then whitespace and the text 'SUCCESS Number of wins : ' and finally the number of wins, n1. There are multiple strings each with a different name and value. I am trying to write a program that can parse any of these strings and return the name of the dataset and the numerical value at the end of the string. I am trying to use regular expressions to do this and I have come up with the following:
import re
def winnumbers(s):
pattern = re.compile(r"""(?P<name>.*?) #starting name
\s*SUCCESS #whitespace and success
\s*Number\s*of\s*wins #whitespace and strings
\s*\:\s*(?P<n1>.*?)""",re.VERBOSE)
match = pattern.match(s)
name = match.group("name")
n1 = match.group("n1")
return (name, n1)
So far, my program can return the name, but the trouble comes after that. They all have the text "SUCCESS Number of wins : " so my thinking was to find a way to match this text. But I realize that my method of matching an exact substring isn't correct right now. Is there any way to match a whole substring as part of the pattern? I have been reading quite a bit on regular expressions lately but haven't found anything like this. I'm still really new to programming and I appreciate any assistance.
Eventually, I will use float() to return n1 as a number, but I left that out because it doesn't properly find the number in the first place right now and would only return an error.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
试试这个:
结果如下:
如果不需要完整的字符串,只需删除括号即可。
Try this one out:
These are the results:
If you don't need the full string just remove the surround parenthesis.
我相信这里没有实际需要使用正则表达式。因此,如果您可以接受,您可以使用以下代码(请注意,我已经发布了它,以便您可以有另一种选择):
或者如果您确定所有单词都由单个空格分隔:
I believe that there is no actual need to use a regex here. So you can use the following code if it acceptable for you(note that i have posted it so you will have ability to have another one option):
OR in case of you are sure that all words are splitted by single spaces:
如果中间的文本始终是常量,则不需要正则表达式。内置的字符串处理函数将更加高效,并且更易于开发、调试和维护。在这种情况下,您可以使用内置的 split() 函数来获取棋子,然后根据需要清理两个棋子:
请注意,我已将获胜次数输出为整数(大概是这样)这将始终是一个整数),但如果您愿意,您可以轻松地将
float()
或任何其他转换函数替换为int()
。编辑:显然这仅适用于单行 - 如果您调用多行函数,则会出现错误。要处理整个文件,我会使用
map()
:另外,我不确定您对此代码的最终用途,但您可能会发现将输出作为字典使用会更容易:
If the text in the middle is always constant, there is no need for a regular expression. The inbuilt string processing functions will be more efficient and easier to develop, debug and maintain. In this case, you can just use the inbuilt
split()
function to get the pieces, and then clean the two pieces as appropriate:Note that I have output the number of wins as an integer (as presumably this will always be a whole number), but you can easily substitute
float()
- or any other conversion function - forint()
if you desire.Edit: Obviously this will only work for single lines - if you call the function with several lines it will give you errors. To process an entire file, I'd use
map()
:Also, I'm not sure of your end use for this code, but you might find it easier to work with the outputs as a dictionary: