Python 正则表达式将字符串作为模式进行匹配并返回数字

发布于 2024-11-15 14:09:49 字数 949 浏览 3 评论 0原文

我有一些行代表文本文件中的一些数据。它们都采用以下格式:

s = 'TheBears      SUCCESS Number of wins : 14'

它们都以名称开头,然后是空格和文本“SUCCESS 获胜次数:”,最后是获胜次数 n1。有多个字符串,每个字符串都有不同的名称和值。我正在尝试编写一个程序,可以解析这些字符串中的任何一个,并返回数据集的名称和字符串末尾的数值。我正在尝试使用正则表达式来执行此操作,并且得出以下结论:

import re
def winnumbers(s):
    pattern = re.compile(r"""(?P<name>.*?)     #starting name
                             \s*SUCCESS        #whitespace and success
                             \s*Number\s*of\s*wins  #whitespace and strings
                             \s*\:\s*(?P<n1>.*?)""",re.VERBOSE)
    match = pattern.match(s)

    name = match.group("name")
    n1 = match.group("n1")

    return (name, n1)

到目前为止,我的程序可以返回名称,但问题随之而来。他们都有文字“成功获胜次数:”所以我的想法是找到一种方法来匹配这个文字。但我意识到我匹配精确子字符串的方法现在不正确。有没有办法将整个子字符串作为模式的一部分进行匹配?我最近读了很多关于正则表达式的文章,但没有发现这样的内容。我对编程仍然很陌生,非常感谢任何帮助。

最终,我将使用 float() 将 n1 作为数字返回,但我将其省略,因为它现在无法正确找到第一个位置的数字,并且只会返回错误。

I have some lines that represent some data in a text file. They are all of the following format:

s = 'TheBears      SUCCESS Number of wins : 14'

They all begin with the name then whitespace and the text 'SUCCESS Number of wins : ' and finally the number of wins, n1. There are multiple strings each with a different name and value. I am trying to write a program that can parse any of these strings and return the name of the dataset and the numerical value at the end of the string. I am trying to use regular expressions to do this and I have come up with the following:

import re
def winnumbers(s):
    pattern = re.compile(r"""(?P<name>.*?)     #starting name
                             \s*SUCCESS        #whitespace and success
                             \s*Number\s*of\s*wins  #whitespace and strings
                             \s*\:\s*(?P<n1>.*?)""",re.VERBOSE)
    match = pattern.match(s)

    name = match.group("name")
    n1 = match.group("n1")

    return (name, n1)

So far, my program can return the name, but the trouble comes after that. They all have the text "SUCCESS Number of wins : " so my thinking was to find a way to match this text. But I realize that my method of matching an exact substring isn't correct right now. Is there any way to match a whole substring as part of the pattern? I have been reading quite a bit on regular expressions lately but haven't found anything like this. I'm still really new to programming and I appreciate any assistance.

Eventually, I will use float() to return n1 as a number, but I left that out because it doesn't properly find the number in the first place right now and would only return an error.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

々眼睛长脚气 2024-11-22 14:09:49

试试这个:

((\S+)\s+SUCCESS Number of wins : (\d+))

结果如下:

>>> regex = re.compile("((\S+)\s+SUCCESS Number of wins : (\d+))")
>>> r = regex.search(string)
>>> r
<_sre.SRE_Match object at 0xc827cf478a56b350>
>>> regex.match(string)
<_sre.SRE_Match object at 0xc827cf478a56b228>

# List the groups found
>>> r.groups()
(u'TheBears SUCCESS Number of wins : 14', u'TheBears', u'14')

# List the named dictionary objects found
>>> r.groupdict()
{}

# Run findall
>>> regex.findall(string)
[(u'TheBears SUCCESS Number of wins : 14', u'TheBears', u'14')]
# So you can do this for the name and number:
>>> fullstring, name, number = r.groups()

如果不需要完整的字符串,只需删除括号即可。

Try this one out:

((\S+)\s+SUCCESS Number of wins : (\d+))

These are the results:

>>> regex = re.compile("((\S+)\s+SUCCESS Number of wins : (\d+))")
>>> r = regex.search(string)
>>> r
<_sre.SRE_Match object at 0xc827cf478a56b350>
>>> regex.match(string)
<_sre.SRE_Match object at 0xc827cf478a56b228>

# List the groups found
>>> r.groups()
(u'TheBears SUCCESS Number of wins : 14', u'TheBears', u'14')

# List the named dictionary objects found
>>> r.groupdict()
{}

# Run findall
>>> regex.findall(string)
[(u'TheBears SUCCESS Number of wins : 14', u'TheBears', u'14')]
# So you can do this for the name and number:
>>> fullstring, name, number = r.groups()

If you don't need the full string just remove the surround parenthesis.

我喜欢麦丽素 2024-11-22 14:09:49

我相信这里没有实际需要使用正则表达式。因此,如果您可以接受,您可以使用以下代码(请注意,我已经发布了它,以便您可以有另一种选择):

dict((line[:line.lower().index('success')+1], line[line.lower().index('wins:') + 6:]) for line in text.split('\n') if 'success' in line.lower())

或者如果您确定所有单词都由单个空格分隔:

output={}
for line in text:
    if 'success' in line.lower():
        words = line.strip().split(' ')
        output[words[0]] = words[-1]

I believe that there is no actual need to use a regex here. So you can use the following code if it acceptable for you(note that i have posted it so you will have ability to have another one option):

dict((line[:line.lower().index('success')+1], line[line.lower().index('wins:') + 6:]) for line in text.split('\n') if 'success' in line.lower())

OR in case of you are sure that all words are splitted by single spaces:

output={}
for line in text:
    if 'success' in line.lower():
        words = line.strip().split(' ')
        output[words[0]] = words[-1]
眼睛会笑 2024-11-22 14:09:49

如果中间的文本始终是常量,则不需要正则表达式。内置的字符串处理函数将更加高效,并且更易于开发、调试和维护。在这种情况下,您可以使用内置的 split() 函数来获取棋子,然后根据需要清理两个棋子:

>>> def winnumber(s):
...     parts = s.split('SUCCESS Number of wins : ')
...     return (parts[0].strip(), int(parts[1]))
... 
>>> winnumber('TheBears      SUCCESS Number of wins : 14')
('TheBears', 14)

请注意,我已将获胜次数输出为整数(大概是这样)这将始终是一个整数),但如果您愿意,您可以轻松地将 float() 或任何其他转换函数替换为 int()

编辑:显然这仅适用于单行 - 如果您调用多行函数,则会出现错误。要处理整个文件,我会使用 map()

>>> map(winnumber, open(filename, 'r'))
[('TheBears', 14), ('OtherTeam', 6)]

另外,我不确定您对此代码的最终用途,但您可能会发现将输出作为字典使用会更容易:

>>> dict(map(winnumber, open(filename, 'r')))
{'OtherTeam': 6, 'TheBears': 14}

If the text in the middle is always constant, there is no need for a regular expression. The inbuilt string processing functions will be more efficient and easier to develop, debug and maintain. In this case, you can just use the inbuilt split() function to get the pieces, and then clean the two pieces as appropriate:

>>> def winnumber(s):
...     parts = s.split('SUCCESS Number of wins : ')
...     return (parts[0].strip(), int(parts[1]))
... 
>>> winnumber('TheBears      SUCCESS Number of wins : 14')
('TheBears', 14)

Note that I have output the number of wins as an integer (as presumably this will always be a whole number), but you can easily substitute float()- or any other conversion function - for int() if you desire.

Edit: Obviously this will only work for single lines - if you call the function with several lines it will give you errors. To process an entire file, I'd use map():

>>> map(winnumber, open(filename, 'r'))
[('TheBears', 14), ('OtherTeam', 6)]

Also, I'm not sure of your end use for this code, but you might find it easier to work with the outputs as a dictionary:

>>> dict(map(winnumber, open(filename, 'r')))
{'OtherTeam': 6, 'TheBears': 14}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文