python re.compile带有变量和数字的字符串

发布于 2024-11-03 14:51:58 字数 554 浏览 6 评论 0原文

您好,我想获得以下内容的匹配项:

test = re.compile(r' [0-12](am|pm) [1-1000] days from (yesterday|today|tomorrow)')

与此匹配:

print test.match(" 3pm 2 days from today")

它不返回任何内容,我做错了什么?我刚刚进入正则表达式并阅读文档,我认为这应该可行!感谢任何帮助 基督

------------------------------------------------- -------------------------------------------------

我正在问一个关于系统设计的新问题与 NLP 中的上述过程类似 这里

Hi I want to get a match for the following:

test = re.compile(r' [0-12](am|pm) [1-1000] days from (yesterday|today|tomorrow)')

with this match:

print test.match(" 3pm 2 days from today")

It returns none, what am i doing wrong? I am just getting into regex and reading the docs I thought this should work! ANY HELP APPRECIATED
chrism

--------------------------------------------------------------------------------------

I am asking a new question about the design of a sytem using similar process to above in NLP HERE

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

猥琐帝 2024-11-10 14:51:58

这是我戴在戒指上的帽子。仔细研究这个正则表达式可以学到一些教训:

import re
reobj = re.compile(
    r"""# Loosely match a date/time reference
    ^                    # Anchor to start of string.
    \s*                  # Optional leading whitespace.
    (?P<time>            # $time: military or AM/PM time.
      (?:                # Group for military hours options.
        [2][0-3]         # Hour is either 20, 21, 22, 23,
      | [01]?[0-9]       # or 0-9, 00-09 or 10-19
      )                  # End group of military hours options.
      (?:                # Group for optional minutes.
        :                # Hours and minutes separated by ":"
        [0-5][0-9]       # 00-59 minutes
      )?                 # Military minutes are optional.
    |                    # or time is given in AM/PM format.
      (?:1[0-2]|0?[1-9]) # 1-12 or 01-12 AM/PM options (hour)
      (?::[0-5][0-9])?   # Optional minutes for AM/PM time.
      \s*                # Optional whitespace before AM/PM.
      [ap]m              # Required AM or PM (case insensitive)
    )                    # End group of time options.
    \s+                  # Required whitespace.
    (?P<offset> \d+ )    # $offset: count of time increments.
    \s+                  # Required whitespace.
    (?P<units>           # $units: units of time increment.
      (?:sec(?:ond)?|min(ute)?|hour|day|week|month|year|decade|century)
      s?                 # Time units may have optional plural "s".
    )                    # End $units: units of time increment.
    \s+                  # Required whitespace.
    (?P<dir>from|before|after|since) # #dir: Time offset direction.
    \s+                  # Required whitespace.
    (?P<base>yesterday|today|tomorrow|(?:right )?now)
    \s*                  # Optional whitespace before end.
    $                    # Anchor to end of string.""", 
    re.IGNORECASE | re.VERBOSE)
match = reobj.match(' 3 pm 2 days from today')
if match:
    print('Time:       %s' % (match.group('time')))
    print('Offset:     %s' % (match.group('offset')))
    print('Units:      %s' % (match.group('units')))
    print('Direction:  %s' % (match.group('dir')))
    print('Base time:  %s' % (match.group('base')))
else:
    print("No match.")

输出:

r"""
Time:       3 pm
Offset:     2
Units:      days
Direction:  from
Base time:  today
"""

这个正则表达式说明了一些需要学习的教训:

  • 正则表达式非常强大(而且有用)!
  • 这个正则表达式确实验证了数字,但正如您所看到的,这样做既麻烦又困难(因此,不推荐 - 我在这里展示它是为了演示为什么不这样做)。使用正则表达式捕获数字然后使用程序代码验证范围要容易得多。
  • 命名捕获组减轻了从较大文本中提取多个数据子字符串的痛苦。
  • 始终使用自由间距、详细模式编写正则表达式,并使用适当的组缩进和大量描述性注释。这有助于编写正则表达式以及稍后的维护过程。

现代正则表达式构成了一种丰富而强大的语言。一旦您学习语法并养成习惯编写冗长、正确缩进、注释良好的代码,那么即使是像上面这样复杂的正则表达式也很容易编写、易于阅读并且易于维护。不幸的是,它们因困难、笨重和容易出错而闻名(因此不推荐用于复杂的任务)。

快乐的调整!

Here is my hat in the ring. Careful study of this regex will teach a few lessons:

import re
reobj = re.compile(
    r"""# Loosely match a date/time reference
    ^                    # Anchor to start of string.
    \s*                  # Optional leading whitespace.
    (?P<time>            # $time: military or AM/PM time.
      (?:                # Group for military hours options.
        [2][0-3]         # Hour is either 20, 21, 22, 23,
      | [01]?[0-9]       # or 0-9, 00-09 or 10-19
      )                  # End group of military hours options.
      (?:                # Group for optional minutes.
        :                # Hours and minutes separated by ":"
        [0-5][0-9]       # 00-59 minutes
      )?                 # Military minutes are optional.
    |                    # or time is given in AM/PM format.
      (?:1[0-2]|0?[1-9]) # 1-12 or 01-12 AM/PM options (hour)
      (?::[0-5][0-9])?   # Optional minutes for AM/PM time.
      \s*                # Optional whitespace before AM/PM.
      [ap]m              # Required AM or PM (case insensitive)
    )                    # End group of time options.
    \s+                  # Required whitespace.
    (?P<offset> \d+ )    # $offset: count of time increments.
    \s+                  # Required whitespace.
    (?P<units>           # $units: units of time increment.
      (?:sec(?:ond)?|min(ute)?|hour|day|week|month|year|decade|century)
      s?                 # Time units may have optional plural "s".
    )                    # End $units: units of time increment.
    \s+                  # Required whitespace.
    (?P<dir>from|before|after|since) # #dir: Time offset direction.
    \s+                  # Required whitespace.
    (?P<base>yesterday|today|tomorrow|(?:right )?now)
    \s*                  # Optional whitespace before end.
    $                    # Anchor to end of string.""", 
    re.IGNORECASE | re.VERBOSE)
match = reobj.match(' 3 pm 2 days from today')
if match:
    print('Time:       %s' % (match.group('time')))
    print('Offset:     %s' % (match.group('offset')))
    print('Units:      %s' % (match.group('units')))
    print('Direction:  %s' % (match.group('dir')))
    print('Base time:  %s' % (match.group('base')))
else:
    print("No match.")

Output:

r"""
Time:       3 pm
Offset:     2
Units:      days
Direction:  from
Base time:  today
"""

This regex illustrates a few lessons to be learned:

  • Regular expressions are very powerful (and useful)!
  • This regex does validate the numbers, but as you can see, doing so is cumbersome and difficult (and thus, not recommended - I'm showing it here to demonstrate why not to do it this way). It is much easier to simply capture the numbers with a regex then validate the ranges using procedural code.
  • Named capture groups ease the pain of plucking multiple data sub-strings from larger text.
  • Always write regexes using free-spacing, verbose mode with proper indentation of groups and lots of descriptive comments. This helps while writing the regex and later during maintenance.

Modern regular expressions comprise a rich and powerful language. Once you learn the syntax and develop a habit of writing verbose, properly indented, well-commented code, then even complex regexes such as the one above are easy to write, easy to read and are easy to maintain. It is unfortunate that they have acquired a reputation for being difficult, unwieldy and error-prone (and thus not recommendable for complex tasks).

Happy regexing!

红ご颜醉 2024-11-10 14:51:58

小时

test = re.compile(r' ([0-9]|1[012])(am|pm) \d+ days from (yesterday|today|tomorrow)')

部分应该匹配 0, 1, ..., 9 或 10, 11, 12
但不是 13, 14, ..., 19。

您可以以类似的方式限制天数部分 1, ..., 1000,即 (1000|\d{1,3})。

what about

test = re.compile(r' ([0-9]|1[012])(am|pm) \d+ days from (yesterday|today|tomorrow)')

the hours part should match 0, 1, ..., 9 or 10, 11, 12
but not 13, 14, ..., 19.

you can limit days part in similar way for 1, ..., 1000, i.e. (1000|\d{1,3}).

终止放荡 2024-11-10 14:51:58

试试这个:

test = re.compile(' \d+(am|pm) \d+ days from (yesterday|today|tomorrow)')

Try this:

test = re.compile(' \d+(am|pm) \d+ days from (yesterday|today|tomorrow)')
捎一片雪花 2024-11-10 14:51:58

试试这个:

import re

test = re.compile('^\s[0-1]?[0-9]{1}pm \d+ days from (today|yesterday|tomorrow)

您遇到的问题是您无法在正则表达式中指定多个数字范围(据我所知),因此您必须将它们视为单个字符。

示例

) print test.match(" 12pm 2 days from today")

您遇到的问题是您无法在正则表达式中指定多个数字范围(据我所知),因此您必须将它们视为单个字符。

示例

Try this:

import re

test = re.compile('^\s[0-1]?[0-9]{1}pm \d+ days from (today|yesterday|tomorrow)

The problem that you're having is that you can't specify multiple digit numeric ranges in regex (afaik), so you have to treat them as individual characters.

Sample here

) print test.match(" 12pm 2 days from today")

The problem that you're having is that you can't specify multiple digit numeric ranges in regex (afaik), so you have to treat them as individual characters.

Sample here

余生一个溪 2024-11-10 14:51:58

如果您想单独提取匹配的各个部分,可以使用 (?P[match]) 来标记组。例如:

import re

pattern = re.compile(
    r'\s*(?P<time>1?[0-9])(?P<ampm>am|pm)\s+'
    r'(?P<days>[1-9]\d*)\s+days\s+from\s+'
    r'(?P<when>yesterday|today|tomorrow)\s*')

for time in range(0, 13):
    for ampm in ('am', 'pm'):
        for days in range(1, 1000):
            for when in ('yesterday', 'today', 'tomorrow'):
                text = ' %d%s %d days from %s ' % (time, ampm, days, when)
                match = pattern.match(text)
                assert match is not None
                keys = sorted(match.groupdict().keys())
                assert keys == ['ampm', 'days', 'time', 'when']

text = ' 3pm 2 days from today '
print pattern.match(text).groupdict()

输出:

{'time': '3', 'when': 'today', 'days': '2', 'ampm': 'pm'}

If you want to extract the parts of the match individually, you can label the groups with (?P<name>[match]). For example:

import re

pattern = re.compile(
    r'\s*(?P<time>1?[0-9])(?P<ampm>am|pm)\s+'
    r'(?P<days>[1-9]\d*)\s+days\s+from\s+'
    r'(?P<when>yesterday|today|tomorrow)\s*')

for time in range(0, 13):
    for ampm in ('am', 'pm'):
        for days in range(1, 1000):
            for when in ('yesterday', 'today', 'tomorrow'):
                text = ' %d%s %d days from %s ' % (time, ampm, days, when)
                match = pattern.match(text)
                assert match is not None
                keys = sorted(match.groupdict().keys())
                assert keys == ['ampm', 'days', 'time', 'when']

text = ' 3pm 2 days from today '
print pattern.match(text).groupdict()

Output:

{'time': '3', 'when': 'today', 'days': '2', 'ampm': 'pm'}
无需解释 2024-11-10 14:51:58
test = re.compile(' 1?\d[ap]m \d{1,3} days? from (?:yesterday|today|tomorrow)')

编辑

阅读了 Rumple Stiltskin 和 Demian Brecht 之间的讨论后,我注意到我的上述命题很差,因为它检测到字符串的某种结构,但它并不能准确地验证它是一个好的“时间模式”字符串,因为它可以例如,检测“今天起 2 天晚上 18 点”。

因此,我现在提出一种模式,它允许精确检测验证您的要求的字符串,并指出每个字符串具有与有效字符串相同的结构,但不具有有效的良好“时间模式”字符串所需的值:

import re

regx = re.compile("(?<= )"  # better than a blank as first character
                  ""
                  "(?:(1[012]|\d)([ap]m) (?!0 )(\d{1,3}|1000)"
                  "|"
                  "(\d+)([ap]m) (\d+))"
                  ""
                  " days? from (yesterday|today|tomorrow)") # shared part




for ch in (" 12pm 2 days from today",
           " 4pm 1 day from today",
           " 12pm 0 days from today",
           " 12pm 1001 days from today",
           " 18pm 2 days from today",
           " 1212pm 2 days from today",
           " 12pm five days from today"):

    print ch
    mat = regx.search(ch)
    if mat:
        if mat.group(1):
            print mat.group(1,2,3,7),'\n# time-pattern-VALIDATED string #'
        else:
            print mat.group(4,5,6,7),'\n* SIMILI-time-pattern STRUCTURED string*'
    else:
        print '- NO STRUCTURED STRING in the text -'
    print

结果

 12pm 2 days from today
('12', 'pm', '2', 'today') 
# time-pattern-VALIDATED string #

 4pm 1 day from today
('4', 'pm', '1', 'today') 
# time-pattern-VALIDATED string #

 12pm 0 days from today
('12', 'pm', '0', 'today') 
* SIMILI-time-pattern STRUCTURED string*

 12pm 1001 days from today
('12', 'pm', '1001', 'today') 
* SIMILI-time-pattern STRUCTURED string*

 18pm 2 days from today
('18', 'pm', '2', 'today') 
* SIMILI-time-pattern STRUCTURED string*

 1212pm 2 days from today
('1212', 'pm', '2', 'today') 
* SIMILI-time-pattern STRUCTURED string*

 12pm five days from today
- NO STRUCTURED STRING in the text -

如果您只需要一个检测时间模式验证字符串的正则表达式,您只需使用

regx = re.compile("(?<= )(1[012]|\d)([ap]m) (?!0 )(\d{1,3}|1000) days?"
                  " from (yesterday|today|tomorrow)")
test = re.compile(' 1?\d[ap]m \d{1,3} days? from (?:yesterday|today|tomorrow)')

EDIT

Having read the discussion between Rumple Stiltskin and Demian Brecht, I noticed that my above proposition is poor because it detects a certain structure of string, but it doesn't validate precisely it is a good "time-pattern" string, because it can detect " 18pm 2 days from today" for exemple.

So I propose now a pattern that allows to detect precisely a string verifying your requirement and that points out every string having the same structure as a valid one but not with the required values of a valid good "time-pattern" string:

import re

regx = re.compile("(?<= )"  # better than a blank as first character
                  ""
                  "(?:(1[012]|\d)([ap]m) (?!0 )(\d{1,3}|1000)"
                  "|"
                  "(\d+)([ap]m) (\d+))"
                  ""
                  " days? from (yesterday|today|tomorrow)") # shared part




for ch in (" 12pm 2 days from today",
           " 4pm 1 day from today",
           " 12pm 0 days from today",
           " 12pm 1001 days from today",
           " 18pm 2 days from today",
           " 1212pm 2 days from today",
           " 12pm five days from today"):

    print ch
    mat = regx.search(ch)
    if mat:
        if mat.group(1):
            print mat.group(1,2,3,7),'\n# time-pattern-VALIDATED string #'
        else:
            print mat.group(4,5,6,7),'\n* SIMILI-time-pattern STRUCTURED string*'
    else:
        print '- NO STRUCTURED STRING in the text -'
    print

result

 12pm 2 days from today
('12', 'pm', '2', 'today') 
# time-pattern-VALIDATED string #

 4pm 1 day from today
('4', 'pm', '1', 'today') 
# time-pattern-VALIDATED string #

 12pm 0 days from today
('12', 'pm', '0', 'today') 
* SIMILI-time-pattern STRUCTURED string*

 12pm 1001 days from today
('12', 'pm', '1001', 'today') 
* SIMILI-time-pattern STRUCTURED string*

 18pm 2 days from today
('18', 'pm', '2', 'today') 
* SIMILI-time-pattern STRUCTURED string*

 1212pm 2 days from today
('1212', 'pm', '2', 'today') 
* SIMILI-time-pattern STRUCTURED string*

 12pm five days from today
- NO STRUCTURED STRING in the text -

If you need only a regex that detects a time-pattern validated string, you use only

regx = re.compile("(?<= )(1[012]|\d)([ap]m) (?!0 )(\d{1,3}|1000) days?"
                  " from (yesterday|today|tomorrow)")
孤独患者 2024-11-10 14:51:58

在匹配后检查整数范围更容易(并且更具可读性):

m = re.match(r' (\d+)(?:pm|am) (\d+) days from (yesterday|today|tomorrow)',
             " 3pm 2 days from today")
assert m and int(m.group(1)) <= 12 and 1 <= int(m.group(2)) <= 1000

或者您可以使用现有的库,例如 pip install parsedatetime

import parsedatetime.parsedatetime as pdt

cal = pdt.Calendar()
print cal.parse("3pm 2 days from today")

输出

((2011, 4, 26, 15, 0, 0, 1, 116, -1), 3)

It is easier (and more readable) to check integer ranges after the match:

m = re.match(r' (\d+)(?:pm|am) (\d+) days from (yesterday|today|tomorrow)',
             " 3pm 2 days from today")
assert m and int(m.group(1)) <= 12 and 1 <= int(m.group(2)) <= 1000

Or you could use an existing library e.g., pip install parsedatetime:

import parsedatetime.parsedatetime as pdt

cal = pdt.Calendar()
print cal.parse("3pm 2 days from today")

Output

((2011, 4, 26, 15, 0, 0, 1, 116, -1), 3)
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文