NLP 需要什么?
假设我对一切一无所知,并且我今天开始编程,您认为我需要学习什么才能开始使用自然语言处理?
我一直在努力研究一些字符串解析方法,但到目前为止它只是让我烦恼并让我创建丑陋的代码。我正在寻找一些关于如何创建 Remember The Milk API 的新鲜想法,例如解析用户的输入,以便提供一种用于快速数据输入的输入表单,该输入表单不基于字段,而是基于简单的一行短语。
编辑:RTM是待办事项列表系统。因此,为了输入任务,您不需要在每个字段中键入来填充值(任务名称、截止日期、位置等)。您只需输入“牙医预约星期一下午 2 点在无论什么地方的牙医”这样的短语,它就会解析它并为您填写所有字段。
我没有任何技术限制,因为这将是一个个人项目,但我更熟悉 .NET 世界。实际上,我不确定这是语言问题,但如果有必要,我非常愿意学习一门新语言来做到这一点。
我的项目与个人财务相关,因此这些短语更像是“昨晚和我女朋友在咖啡上花了 10 美元”,它会填写位置、$$$ 金额、标签和其他内容。
非常感谢您给我的任何指示!
assuming that I know nothing about everything and that I'm starting in programming TODAY what do you say would be necessary for me to learn in order to start working with Natural Language Processing?
I've been struggling with some string parsing methods but so far it is just annoying me and making me create ugly code. I'm looking for some fresh new ideas on how to create a Remember The Milk API like to parse user's input in order to provide an input form for fast data entry that are not based on fields but in simple one line phrases instead.
EDIT: RTM is todo list system. So in order to enter a task you don't need to type in each field to fill values (task name, due date, location, etc). You can simply type in a phrase like "Dentist appointment monday at 2PM in WhateverPlace" and it will parse it and fill all fields for you.
I don't have any kind of technical constraints since it's going to be a personal project but I'm more familiar with .NET world. Actually, I'm not sure this is a matter of language but if it's necessary I'm more than willing to learn a new language to do it.
My project is related to personal finances so the phrases are more like "Spent 10USD on Coffee last night with my girlfriend" and it would fill location, amount of $$$, tags and other stuff.
Thanks a lot for any kind of directions that you might give me!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
这似乎不需要完整的 NLP。简单的基于模式的信息提取可能就足够了。基本思想是对文本进行标记,然后识别/分类某些关键字,最后识别模式/短语。
在您的示例中,标记化为您提供“牙医”、“预约”、“星期一”、“在”、“2PM”、“在”、“WhateverPlace”。您的工具将识别“星期一”是一周中的某一天,“2PM”是时间等。最后,您可以找到诸如 [at] [TIME] 和 [in] [Place] 之类的模式,并使用它们来填写田野。
像 GATE 这样的框架可能会有所帮助,但即使如此,也可能是比您真正需要的更大的锤子。
This does not appear to require full NLP. Simple pattern-based information extraction will probably suffice. The basic idea is to tokenize the text, then recognize/classify certain keywords, and finally recognize patterns/phrases.
In your example, tokenizing gives you "Dentist", "appointment", "monday", "at", "2PM", "in", "WhateverPlace". Your tool will recognize that "monday" is a day of the week, "2PM" is a time, etc. Finally, you can find patterns like [at] [TIME] and [in] [Place] and use those to fill in the fields.
A framework like GATE may help, but even that may be a larger hammer than you really need.
看看 NLTK,它对于对 NLP 感兴趣的初学者程序员来说是一个很好的资源。
http://www.nltk.org/
它是用Python编写的,Python是最简单的编程语言之一。
现在我明白了你的问题,这是我的解决方案:
你可以开发一种受限词汇表,其中所有金额必须以 $ 符号结尾,或者任何时间必须以 00:00 的形式和/或以 AM/PM 结尾,关于检测项目,您可以使用本体中的对象列表,例如 Open Cyc。 Open Cyc 可以为您提供所有物体的列表,例如啤酒、咖啡、面包和牛奶等。这将帮助您检测短语中的物体。但这仍然是一种非常模糊的方法。
Have a look at NLTK, its a good resource for beginner programmers interested in NLP.
http://www.nltk.org/
It is written in python which is one of the easier programming languages.
Now that I understand your problem, here is my solution:
You can develop a kind of restricted vocabulary, in which all amounts must end witha $ sign or any time must be in form of 00:00 and/or end with AM/PM, regarding detecting items, you can use list of objects from ontology such as Open Cyc. Open Cyc can provide you with list of all objects such beer, coffee, bread and milk etc. this will help you to detect objects in the short phrase. Still it would be a very fuzzy approach.