NLP算法“填写”搜索词
我正在尝试编写一种算法(我假设该算法将依赖于自然语言处理技术)来“填写”搜索词列表。这种东西可能有一个我不知道的名字。这种问题叫什么,什么样的算法会给我以下行为?
输入:
docs = [
"I bought a ticket to the Dolphin Watching cruise",
"I enjoyed the Dolphin Watching tour",
"The Miami Dolphins lost again!",
"It was good going to that Miami Dolphins game"
],
search_term = "Dolphin"
输出:
["Dolphin Watching", "Miami Dolphins"]
基本上应该弄清楚,如果“Dolphin”出现,它实际上总是在二元组“Dolphin Watching”或“Miami Dolphins”中。首选 Python 解决方案。
I'm trying to write an algorithm (which I'm assuming will rely on natural language processing techniques) to 'fill out' a list of search terms. There is probably a name for this kind of thing which I'm unaware of. What is this kind of problem called, and what kind of algorithm will give me the following behavior?
Input:
docs = [
"I bought a ticket to the Dolphin Watching cruise",
"I enjoyed the Dolphin Watching tour",
"The Miami Dolphins lost again!",
"It was good going to that Miami Dolphins game"
],
search_term = "Dolphin"
Output:
["Dolphin Watching", "Miami Dolphins"]
It should basically figure out that if "Dolphin" appears at all, it's virtually always either in the bigrams "Dolphin Watching" or "Miami Dolphins". Solutions in Python preferred.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
听起来你想确定搭配 Dolphin 发生于。搭配查找的方法有多种,最流行的是计算 语料库中术语之间的逐点互信息 (PMI),然后为 Dolphin 选择 PMI 最高的术语。您可能还记得我之前建议的情绪分析算法中的 PMI。
NLTK 中包含各种搭配查找方法的 Python 实现,格式为
nltk.collocations
。 Manning 和 Schütze 的 FSNLP (1999 ,但对于该主题仍然是最新的)。Sounds like you want to determine the collocations that Dolphin occurs in. There are various methods for collocation finding, the most popular being to compute point-wise mutual information (PMI) between terms in your corpus, then select the terms with the highest PMI for Dolphin. You might remember PMI from the sentiment analysis algorithm that I suggested earlier.
A Python implementation of various collocation finding methods is included in NLTK as
nltk.collocations
. The area is covered in some depth in Manning and Schütze's FSNLP (1999, but still current for this topic).我在大学的 NLP 课程中使用了自然语言工具包,并取得了不错的成功。我认为它有一些标记器可以帮助您确定哪些是名词,并帮助您将其解析为树。我不记得太多了,但我会从那里开始。
I used the Natural Language Toolkit in my NLP class in university with decent success. I think it's got some taggers that can help you determine which are the nouns, and help you parse it into a tree. I don't remember much, but I'd start there.