NLP算法“填写”搜索词

发布于 2024-12-07 20:46:19 字数 535 浏览 1 评论 0原文

我正在尝试编写一种算法(我假设该算法将依赖于自然语言处理技术)来“填写”搜索词列表。这种东西可能有一个我不知道的名字。这种问题叫什么,什么样的算法会给我以下行为?

输入:

    docs = [
    "I bought a ticket to the Dolphin Watching cruise",
    "I enjoyed the Dolphin Watching tour",
    "The Miami Dolphins lost again!",
    "It was good going to that Miami Dolphins game"
    ], 
    search_term = "Dolphin"

输出:

["Dolphin Watching", "Miami Dolphins"]

基本上应该弄清楚,如果“Dolphin”出现,它实际上总是在二元组“Dolphin Watching”或“Miami Dolphins”中。首选 Python 解决方案。

I'm trying to write an algorithm (which I'm assuming will rely on natural language processing techniques) to 'fill out' a list of search terms. There is probably a name for this kind of thing which I'm unaware of. What is this kind of problem called, and what kind of algorithm will give me the following behavior?

Input:

    docs = [
    "I bought a ticket to the Dolphin Watching cruise",
    "I enjoyed the Dolphin Watching tour",
    "The Miami Dolphins lost again!",
    "It was good going to that Miami Dolphins game"
    ], 
    search_term = "Dolphin"

Output:

["Dolphin Watching", "Miami Dolphins"]

It should basically figure out that if "Dolphin" appears at all, it's virtually always either in the bigrams "Dolphin Watching" or "Miami Dolphins". Solutions in Python preferred.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

因为看清所以看轻 2024-12-14 20:46:19

基本上应该弄清楚,如果“Dolphin”出现,它实际上总是在二元词“Dolphin Watching”或“Miami Dolphins”中。

听起来你想确定搭配 Dolphin 发生于。搭配查找的方法有多种,最流行的是计算 语料库中术语之间的逐点互信息 (PMI),然后为 Dolphin 选择 PMI 最高的术语。您可能还记得我之前建议的情绪分析算法中的 PMI。

NLTK 中包含各种搭配查找方法的 Python 实现,格式为 nltk.collocationsManning 和 Schütze 的 FSNLP (1999 ,但对于该主题仍然是最新的)。

It should basically figure out that if "Dolphin" appears at all, it's virtually always either in the bigrams "Dolphin Watching" or "Miami Dolphins".

Sounds like you want to determine the collocations that Dolphin occurs in. There are various methods for collocation finding, the most popular being to compute point-wise mutual information (PMI) between terms in your corpus, then select the terms with the highest PMI for Dolphin. You might remember PMI from the sentiment analysis algorithm that I suggested earlier.

A Python implementation of various collocation finding methods is included in NLTK as nltk.collocations. The area is covered in some depth in Manning and Schütze's FSNLP (1999, but still current for this topic).

哥,最终变帅啦 2024-12-14 20:46:19

我在大学的 NLP 课程中使用了自然语言工具包,并取得了不错的成功。我认为它有一些标记器可以帮助您确定哪些是名词,并帮助您将其解析为树。我不记得太多了,但我会从那里开始。

I used the Natural Language Toolkit in my NLP class in university with decent success. I think it's got some taggers that can help you determine which are the nouns, and help you parse it into a tree. I don't remember much, but I'd start there.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文