正则表达式:如何匹配字符串末尾的键值对序列

发布于 2024-10-22 11:45:44 字数 1650 浏览 1 评论 0原文

我正在尝试匹配出现在(长)字符串末尾的键值对。字符串看起来像(我替换了“\n”),

my_str = "lots of blah
          key1: val1-words
          key2: val2-words
          key3: val3-words"

所以我期望匹配“key1:val1-words”,“key2:val2-words”和“key3:val3-words”。

  • 可能的键名称集是已知的。
  • 并非所有可能的键都出现在每个字符串中。
  • 每个字符串中至少出现两个键(如果这样更容易匹配)。
  • val-words 可以是多个单词。
  • 键值对只能在字符串末尾匹配。
  • 我正在使用 Python re 模块。

我在想

re.compile('(?:tag1|tag2|tag3):')

加上一些前瞻断言的东西将是一个解决方案。但我还是做不到。我该怎么办?

谢谢。

/David

真实示例字符串:

my_str = u'ucourt métrage pour kino session volume 18\nThème: O sombres héros\nContraintes: sous titrés\nAuthor: nicoalabdou\nTags: wakatanka productions court métrage kino session humour cantat bertrand noir désir sombres héros mer medine marie trintignant femme droit des femmes nicoalabdou pute soumise\nPosted: 06 June 2009\nRating: 1.3\nVotes: 3'

编辑:

基于 Mikel 的解决方案,我现在使用以下内容:


my_tags = ['\S+'] # gets all tags
my_tags = ['Tags','Author','Posted'] # selected tags
regex = re.compile(r'''
    \n                     # all key-value pairs are on separate lines
    (                      # start group to return
       (?:{0}):            # placeholder for tags to detect '\S+' == all
        \s                 # the space between ':' and value
       .*                  # the value
    )                      # end group to return
    '''.format('|'.join(my_tags)), re.VERBOSE)

regex.sub('',my_str) # 返回 my_str 而不匹配键值行 regex.findall(my_str) # 返回匹配的键值行

I am trying to match key-value pairs that appear at the end of (long) strings. The strings look like (I replaced the "\n")

my_str = "lots of blah
          key1: val1-words
          key2: val2-words
          key3: val3-words"

so I expect matches "key1: val1-words", "key2: val2-words" and "key3: val3-words".

  • The set of possible key names is known.
  • Not all possible keys appear in every string.
  • At least two keys appear in every string (if that makes it easier to match).
  • val-words can be several words.
  • key-value pairs should only be matched at the end of string.
  • I am using Python re module.

I was thinking

re.compile('(?:tag1|tag2|tag3):')

plus some look-ahead assertion stuff would be a solution. I can't get it right though. How do I do?

Thank you.

/David

Real example string:

my_str = u'ucourt métrage pour kino session volume 18\nThème: O sombres héros\nContraintes: sous titrés\nAuthor: nicoalabdou\nTags: wakatanka productions court métrage kino session humour cantat bertrand noir désir sombres héros mer medine marie trintignant femme droit des femmes nicoalabdou pute soumise\nPosted: 06 June 2009\nRating: 1.3\nVotes: 3'

EDIT:

Based on Mikel's solution I am now using the following:


my_tags = ['\S+'] # gets all tags
my_tags = ['Tags','Author','Posted'] # selected tags
regex = re.compile(r'''
    \n                     # all key-value pairs are on separate lines
    (                      # start group to return
       (?:{0}):            # placeholder for tags to detect '\S+' == all
        \s                 # the space between ':' and value
       .*                  # the value
    )                      # end group to return
    '''.format('|'.join(my_tags)), re.VERBOSE)

regex.sub('',my_str) # return my_str without matching key-vaue lines regex.findall(my_str) # return matched key-value lines

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

↙温凉少女 2024-10-29 11:45:44

负零宽度前瞻是(?!pattern)

re 模块文档页面中提到了它。

(?!...)

如果...下一个不匹配则匹配。这是一个否定的前瞻断言。例如,仅当 Isaac (?!Asimov) 后面没有跟“Asimov”时,它才会匹配“Isaac”。

因此,您可以使用它来匹配键后的任意数量的单词,但不能使用 (?!\S+:)\S+ 之类的键来匹配。

完整的代码如下所示:

regex = re.compile(r'''
    [\S]+:                # a key (any word followed by a colon)
    (?:
    \s                    # then a space in between
        (?!\S+:)\S+       # then a value (any word not followed by a colon)
    )+                    # match multiple values if present
    ''', re.VERBOSE)

matches = regex.findall(my_str)

如果

['key1: val1-words ', 'key2: val2-words ', 'key3: val3-words']

您使用以下方式打印键/值:

for match in matches:
    print match

它将打印:

key1: val1-words
key2: val2-words
key3: val3-words

或者使用您更新的示例,它将打印:

Thème: O sombres héros 
Contraintes: sous titrés 
Author: nicoalabdou 
Tags: wakatanka productions court métrage kino session humour cantat bertrand noir désir sombres héros mer medine marie trintignant femme droit des femmes nicoalabdou pute soumise 
Posted: 06 June 2009 
Rating: 1.3 
Votes: 3

您可以使用类似以下内容将每个键/值对转换为字典:

pairs = dict([match.split(':', 1) for match in matches])

这将使您更轻松地仅查找所需的键(和值)。

更多信息:


The negative zero-width lookahead is (?!pattern).

It's mentioned part-way down the re module documentation page.

(?!...)

Matches if ... doesn’t match next. This is a negative lookahead assertion. For example, Isaac (?!Asimov) will match 'Isaac ' only if it’s not followed by 'Asimov'.

So you could use it to match any number of words after a key, but not a key using something like (?!\S+:)\S+.

And the complete code would look like this:

regex = re.compile(r'''
    [\S]+:                # a key (any word followed by a colon)
    (?:
    \s                    # then a space in between
        (?!\S+:)\S+       # then a value (any word not followed by a colon)
    )+                    # match multiple values if present
    ''', re.VERBOSE)

matches = regex.findall(my_str)

Which gives

['key1: val1-words ', 'key2: val2-words ', 'key3: val3-words']

If you print the key/values using:

for match in matches:
    print match

It will print:

key1: val1-words
key2: val2-words
key3: val3-words

Or using your updated example, it would print:

Thème: O sombres héros 
Contraintes: sous titrés 
Author: nicoalabdou 
Tags: wakatanka productions court métrage kino session humour cantat bertrand noir désir sombres héros mer medine marie trintignant femme droit des femmes nicoalabdou pute soumise 
Posted: 06 June 2009 
Rating: 1.3 
Votes: 3

You could turn each key/value pair into a dictionary using something like this:

pairs = dict([match.split(':', 1) for match in matches])

which would make it easier to look up only the keys (and values) you want.

More info:


~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文