Python:翻译/替换字符串中不是你想要的单词

发布于 2024-09-30 05:19:03 字数 117 浏览 8 评论 0原文

基本上,我有一堆短语,我只对包含某些单词的短语感兴趣。我想做的是 1)找出该单词是否存在,如果存在,2)删除所有其他单词。我可以用一堆 if 和 for 来做到这一点,但我想知道是否有一种简短/Pythonic 的方法。

Basically, I've got a bunch of phrases and I'm only interested in the ones that contain certain words. What I want to do is 1) find out if that word is there and if it is, 2) erase all the other words. I could do this with a bunch of if's and for's but I was wondering if there'd be a short/pythonic approach to it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

无力看清 2024-10-07 05:19:03

建议的算法:

  • 对于每个短语
    1. 查找是否有有趣的单词
    2. 如果是,请删除所有其他单词
    3. 否则,只需继续下一个短语

是的,实现此操作将需要“一堆 ifs 和 fors”,但您会惊讶地发现如此容易和这样的逻辑可以清晰地翻译成Python。

实现相同目的的更简洁的方法是使用列表理解,这在一定程度上简化了这个逻辑。鉴于 phrases 是一个短语列表:

phrases = [process(p) if isinteresting(p) else p for p in phrases]

对于 processisinteresting 函数的合适定义。

A suggested algorithm:

  • For each phrase
    1. Find whether the interesting word is there
    2. If it is, erase all other words
    3. Otherwise, just continue to the next phrase

Yes, implementing this would take "a bunch of ifs and fors", but you would be surprised how easily and cleanly such logic translates to Python.

A more succinct way to achieve the same would be to use a list comprehension, which flattens this logic somewhat. Given that phrases is a list of phrases:

phrases = [process(p) if isinteresting(p) else p for p in phrases]

For a suitable definition of the process and isinteresting functions.

说不完的你爱 2024-10-07 05:19:03

基于正则表达式的解决方案:

>>> import re
>>> phrase = "A lot of interesting and boring words"
>>> regex = re.compile(r"\b(?!(?:interesting|words)\b)\w+\W*")
>>> clean = regex.sub("", phrase)
>>> clean
'interesting words'

正则表达式的工作原理如下:

\b             # start the match at a word boundary
(?!            # assert that it's not possible to match
 (?:           # one of the following:
  interesting  # "interesting"
  |            # or
  words        # "words"
 )             # add more words if desired...
 \b            # assert that there is a word boundary after our needle matches
)              # end of lookahead
\w+\W*         # match the word plus any non-word characters that follow.

A regex-based solution:

>>> import re
>>> phrase = "A lot of interesting and boring words"
>>> regex = re.compile(r"\b(?!(?:interesting|words)\b)\w+\W*")
>>> clean = regex.sub("", phrase)
>>> clean
'interesting words'

The regex works as follows:

\b             # start the match at a word boundary
(?!            # assert that it's not possible to match
 (?:           # one of the following:
  interesting  # "interesting"
  |            # or
  words        # "words"
 )             # add more words if desired...
 \b            # assert that there is a word boundary after our needle matches
)              # end of lookahead
\w+\W*         # match the word plus any non-word characters that follow.
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文