正则表达式：如何匹配字符串末尾的键值对序列

发布于 2024-10-22 11:45:44 字数 1650 浏览 1 评论 0原文

我正在尝试匹配出现在（长）字符串末尾的键值对。字符串看起来像（我替换了“\n”），

my_str = "lots of blah
          key1: val1-words
          key2: val2-words
          key3: val3-words"

所以我期望匹配“key1：val1-words”，“key2：val2-words”和“key3：val3-words”。

可能的键名称集是已知的。
并非所有可能的键都出现在每个字符串中。
每个字符串中至少出现两个键（如果这样更容易匹配）。
val-words 可以是多个单词。
键值对只能在字符串末尾匹配。
我正在使用 Python re 模块。

我在想

re.compile('(?:tag1|tag2|tag3):')

加上一些前瞻断言的东西将是一个解决方案。但我还是做不到。我该怎么办？

谢谢。

/David

真实示例字符串：

my_str = u'ucourt métrage pour kino session volume 18\nThème: O sombres héros\nContraintes: sous titrés\nAuthor: nicoalabdou\nTags: wakatanka productions court métrage kino session humour cantat bertrand noir désir sombres héros mer medine marie trintignant femme droit des femmes nicoalabdou pute soumise\nPosted: 06 June 2009\nRating: 1.3\nVotes: 3'

编辑：

基于 Mikel 的解决方案，我现在使用以下内容：


my_tags = ['\S+'] # gets all tags
my_tags = ['Tags','Author','Posted'] # selected tags
regex = re.compile(r'''
    \n                     # all key-value pairs are on separate lines
    (                      # start group to return
       (?:{0}):            # placeholder for tags to detect '\S+' == all
        \s                 # the space between ':' and value
       .*                  # the value
    )                      # end group to return
    '''.format('|'.join(my_tags)), re.VERBOSE)

 regex.sub('',my_str) # 返回 my_str 而不匹配键值行
regex.findall(my_str) # 返回匹配的键值行

原文

I am trying to match key-value pairs that appear at the end of (long) strings. The strings look like (I replaced the "\n")

my_str = "lots of blah
          key1: val1-words
          key2: val2-words
          key3: val3-words"

so I expect matches "key1: val1-words", "key2: val2-words" and "key3: val3-words".

The set of possible key names is known.
Not all possible keys appear in every string.
At least two keys appear in every string (if that makes it easier to match).
val-words can be several words.
key-value pairs should only be matched at the end of string.
I am using Python re module.

I was thinking

re.compile('(?:tag1|tag2|tag3):')

plus some look-ahead assertion stuff would be a solution. I can't get it right though. How do I do?

Thank you.

/David

Real example string:

my_str = u'ucourt métrage pour kino session volume 18\nThème: O sombres héros\nContraintes: sous titrés\nAuthor: nicoalabdou\nTags: wakatanka productions court métrage kino session humour cantat bertrand noir désir sombres héros mer medine marie trintignant femme droit des femmes nicoalabdou pute soumise\nPosted: 06 June 2009\nRating: 1.3\nVotes: 3'

EDIT:

Based on Mikel's solution I am now using the following:


my_tags = ['\S+'] # gets all tags
my_tags = ['Tags','Author','Posted'] # selected tags
regex = re.compile(r'''
    \n                     # all key-value pairs are on separate lines
    (                      # start group to return
       (?:{0}):            # placeholder for tags to detect '\S+' == all
        \s                 # the space between ':' and value
       .*                  # the value
    )                      # end group to return
    '''.format('|'.join(my_tags)), re.VERBOSE)

regex.sub('',my_str) # return my_str without matching key-vaue lines regex.findall(my_str) # return matched key-value lines

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

↙温凉少女 2024-10-29 11:45:44

负零宽度前瞻是(?!pattern)。

re 模块文档页面中提到了它。

(?!...)

如果...下一个不匹配则匹配。这是一个否定的前瞻断言。例如，仅当 Isaac (?!Asimov) 后面没有跟“Asimov”时，它才会匹配“Isaac”。

因此，您可以使用它来匹配键后的任意数量的单词，但不能使用 (?!\S+:)\S+ 之类的键来匹配。

完整的代码如下所示：

regex = re.compile(r'''
    [\S]+:                # a key (any word followed by a colon)
    (?:
    \s                    # then a space in between
        (?!\S+:)\S+       # then a value (any word not followed by a colon)
    )+                    # match multiple values if present
    ''', re.VERBOSE)

matches = regex.findall(my_str)

如果

['key1: val1-words ', 'key2: val2-words ', 'key3: val3-words']

您使用以下方式打印键/值：

for match in matches:
    print match

它将打印：

key1: val1-words
key2: val2-words
key3: val3-words

或者使用您更新的示例，它将打印：

Thème: O sombres héros 
Contraintes: sous titrés 
Author: nicoalabdou 
Tags: wakatanka productions court métrage kino session humour cantat bertrand noir désir sombres héros mer medine marie trintignant femme droit des femmes nicoalabdou pute soumise 
Posted: 06 June 2009 
Rating: 1.3 
Votes: 3

您可以使用类似以下内容将每个键/值对转换为字典：

pairs = dict([match.split(':', 1) for match in matches])

这将使您更轻松地仅查找所需的键（和值）。

更多信息：

The negative zero-width lookahead is (?!pattern).

It's mentioned part-way down the re module documentation page.

(?!...)

Matches if ... doesn’t match next. This is a negative lookahead assertion. For example, Isaac (?!Asimov) will match 'Isaac ' only if it’s not followed by 'Asimov'.

So you could use it to match any number of words after a key, but not a key using something like (?!\S+:)\S+.

And the complete code would look like this:

regex = re.compile(r'''
    [\S]+:                # a key (any word followed by a colon)
    (?:
    \s                    # then a space in between
        (?!\S+:)\S+       # then a value (any word not followed by a colon)
    )+                    # match multiple values if present
    ''', re.VERBOSE)

matches = regex.findall(my_str)

Which gives

['key1: val1-words ', 'key2: val2-words ', 'key3: val3-words']

If you print the key/values using:

for match in matches:
    print match

It will print:

key1: val1-words
key2: val2-words
key3: val3-words

Or using your updated example, it would print:

Thème: O sombres héros 
Contraintes: sous titrés 
Author: nicoalabdou 
Tags: wakatanka productions court métrage kino session humour cantat bertrand noir désir sombres héros mer medine marie trintignant femme droit des femmes nicoalabdou pute soumise 
Posted: 06 June 2009 
Rating: 1.3 
Votes: 3

You could turn each key/value pair into a dictionary using something like this:

pairs = dict([match.split(':', 1) for match in matches])

which would make it easier to look up only the keys (and values) you want.

More info:

回复收藏 0 原文

~没有更多了~

关于作者

假面具

暂无简介

0 文章

0 评论

22 人气

关注发私信

1CH1MKgiKxn9p

文章 0 评论 0

关注

ゞ记忆︶ㄣ

文章 0 评论 0

关注

JackDx

文章 0 评论 0

关注

信远

文章 0 评论 0

关注

yaoduoduo1995

文章 0 评论 0

关注

霞映澄塘

文章 0 评论 0

友情链接

文江博客

正则表达式：如何匹配字符串末尾的键值对序列

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

正则表达式：如何匹配字符串末尾的键值对序列

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（1）

关于作者

相关话题

热门标签

推荐作者

1CH1MKgiKxn9p

ゞ记忆︶ㄣ

JackDx

信远

yaoduoduo1995

霞映澄塘

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。