从python句子中删除非英语单词

发布于 2024-09-29 11:31:39 字数 225 浏览 10 评论 0 原文

我编写了一个代码,用于向 Google 发送查询并返回结果。我从这些结果中提取片段(摘要)以进行进一步处理。然而,有时这些片段中会出现我不想要的非英语单词。例如:

/\u02b0w\u025bn w\u025bn unstressed \u02b0w\u0259n w\u0259n/ 

我只想要这句话中的“unstressed”这个词。 我怎样才能做到这一点? 谢谢

I have written a code which sends queries to Google and returns the results. I extract the snippets(summaries) from these results for further processing. However, sometime non-english words are in these snippets which I don't want them. for example:

/\u02b0w\u025bn w\u025bn unstressed \u02b0w\u0259n w\u0259n/ 

I only want the "unstressed" word in this sentence.
How can I do that?
thanks

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

水染的天色ゝ 2024-10-06 11:31:40

PyEnchant 对您来说可能是一个简单的选择。我不知道它的速度,但你可以执行以下操作:

>>> import enchant
>>> d = enchant.Dict("en_US")
>>> d.check("Hello")
True
>>> d.check("Helo")
False
>>>

找到教程 Enca 根据语言知识检测文本文件的编码。)

PyEnchant might be a simple option for you. I do not know about its speed, but you can do things like:

>>> import enchant
>>> d = enchant.Dict("en_US")
>>> d.check("Hello")
True
>>> d.check("Helo")
False
>>>

A tutorial is found here, it also has options to return suggestions which you can you again for another query or something. In addition you can check if your result is in latin-1 (is_utf8() excists, do not know if is_latin-1() does also, maybe use something like Enca which detects the encoding of text files, on the basis of knowledge of their language.)

南薇 2024-10-06 11:31:40

您可以将收到的单词与英语单词词典进行比较,例如 BSD 系统上的 /usr/share/dict/words。

我猜想谷歌的结果在很大程度上在语法上是正确的,但如果不是,你可能需要研究词干以便与你的字典匹配。

You can compare the words you receive with a dictionary of english words, for example /usr/share/dict/words on a BSD system.

I would guess that googles results for the most part is grammatically correct, but if not, you might have to look into stemming in order to match against your dictionary.

榆西 2024-10-06 11:31:40

您可以使用 PyWordNet。这是 WordNet 的 python 接口。只需将句子分成空格,然后检查每个单词是否在字典中即可。

You can use PyWordNet. That is a python interface for the WordNet. Just split your sentence on white spaces and check for each word is it in the dictionary.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文