当前位置：文江博客话题详情

Python：翻译/替换字符串中不是你想要的单词

发布于 2024-09-30 05:19:03 字数 117 浏览 8 评论 0原文

基本上，我有一堆短语，我只对包含某些单词的短语感兴趣。我想做的是 1）找出该单词是否存在，如果存在，2）删除所有其他单词。我可以用一堆 if 和 for 来做到这一点，但我想知道是否有一种简短/Pythonic 的方法。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

无力看清 2024-10-07 05:19:03

建议的算法：

对于每个短语
1. 查找是否有有趣的单词
2. 如果是，请删除所有其他单词
3. 否则，只需继续下一个短语

是的，实现此操作将需要“一堆 ifs 和 fors”，但您会惊讶地发现如此容易和这样的逻辑可以清晰地翻译成Python。

实现相同目的的更简洁的方法是使用列表理解，这在一定程度上简化了这个逻辑。鉴于 phrases 是一个短语列表：

phrases = [process(p) if isinteresting(p) else p for p in phrases]

对于 process 和 isinteresting 函数的合适定义。

A suggested algorithm:

For each phrase
1. Find whether the interesting word is there
2. If it is, erase all other words
3. Otherwise, just continue to the next phrase

Yes, implementing this would take "a bunch of ifs and fors", but you would be surprised how easily and cleanly such logic translates to Python.

A more succinct way to achieve the same would be to use a list comprehension, which flattens this logic somewhat. Given that phrases is a list of phrases:

phrases = [process(p) if isinteresting(p) else p for p in phrases]

For a suitable definition of the process and isinteresting functions.

回复收藏 0 原文

说不完的你爱 2024-10-07 05:19:03

基于正则表达式的解决方案：

>>> import re
>>> phrase = "A lot of interesting and boring words"
>>> regex = re.compile(r"\b(?!(?:interesting|words)\b)\w+\W*")
>>> clean = regex.sub("", phrase)
>>> clean
'interesting words'

正则表达式的工作原理如下：

\b             # start the match at a word boundary
(?!            # assert that it's not possible to match
 (?:           # one of the following:
  interesting  # "interesting"
  |            # or
  words        # "words"
 )             # add more words if desired...
 \b            # assert that there is a word boundary after our needle matches
)              # end of lookahead
\w+\W*         # match the word plus any non-word characters that follow.

A regex-based solution:

>>> import re
>>> phrase = "A lot of interesting and boring words"
>>> regex = re.compile(r"\b(?!(?:interesting|words)\b)\w+\W*")
>>> clean = regex.sub("", phrase)
>>> clean
'interesting words'

The regex works as follows:

\b             # start the match at a word boundary
(?!            # assert that it's not possible to match
 (?:           # one of the following:
  interesting  # "interesting"
  |            # or
  words        # "words"
 )             # add more words if desired...
 \b            # assert that there is a word boundary after our needle matches
)              # end of lookahead
\w+\W*         # match the word plus any non-word characters that follow.

回复收藏 0 原文

~没有更多了~