您认为“快速添加”功能如何? Google 日历中的功能有效吗?
我正在考虑一个项目,该项目可能使用与“快速添加”如何处理将自然语言解析为可以通过某种程度的语义理解的内容类似的功能。我有兴趣更好地理解这一点,并想知道您对如何实施这一点有何想法。
如果您不熟悉什么是“快速添加”,请查看 Google 的相关知识库。
2010 年 6 月 4 日更新
对“自然语言解析”(NLP)的额外研究产生的结果比我认为在“快速添加”之类的东西中实际实现的结果要广泛得多。鉴于此功能需要特定类型的输入而不是真正的自由格式文本,我认为这是 NLP 的一种更狭窄的实现。如果有人可以建议我可以研究更狭窄的主题,而不是整个 NLP 的广度,我将不胜感激。
也就是说,我找到了一个很好的有关 NLP 的资源集合 包括这个很棒的常见问题解答。
Am thinking about a project which might use similar functionality to how "Quick Add" handles parsing natural language into something that can be understood with some level of semantics. I'm interested in understanding this better and wondered what your thoughts were on how this might be implemented.
If you're unfamiliar with what "Quick Add" is, check out Google's KB about it.
6/4/10 Update
Additional research on "Natural Language Parsing" (NLP) yields results which are MUCH broader than what I feel is actually implemented in something like "Quick Add". Given that this feature expects specific types of input rather than the true free-form text, I'm thinking this is a much more narrow implementation of NLP. If anyone could suggest more narrow topic matter that I could research rather than the entire breadth of NLP, it would be greatly appreciated.
That said, I've found a nice collection of resources about NLP including this great FAQ.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我首先决定采用一种标准方式来表示我感兴趣的所有信息:活动名称、开始/结束时间(和日期)、宾客名单、地点。例如,我可能会使用这样的 XML 表示法:
然后,我的目标是构建一个有关日期的日记条目语料库,并用其 XML 形式进行注释。我将如何收集数据?好吧,如果我是谷歌,我可能会有各种各样的方法。因为我是我,所以我可能会首先写下我能想到的所有表达此类内容的方式,然后手工注释。如果我可以通过浏览朋友的电子邮件之类的方式来补充这一点,那就更好了。
现在我有了一个语料库,它可以作为一组单元测试。我需要编写一个解析器来适应测试。解析器应该将自然语言字符串翻译成我的注释的逻辑形式。首先,它应该将字符串拆分为其组成词。这称为标记化,并且有现成的软件可以执行此操作。 (例如,请参阅 NLTK。)为了解释这些词,我会在数据中查找模式:例如,“at”或“in”后面的文本应标记为位置; “X 分钟”意味着我需要将该分钟数添加到开始时间以获得结束时间。统计方法在这里可能有点大材小用 - 最好创建一系列手工编码的规则来表达您自己如何解释该领域中的单词、短语和结构的知识。
I would start by deciding on a standard way to represent all the information I'm interested in: event name, start/end time (and date), guest list, location. For example, I might use an XML notation like this:
I'd then aim to build up a corpus of diary entries about dates, annotated with their XML forms. How would I collect the data? Well, if I was Google, I'd probably have all sorts of ways. Since I'm me, I'd probably start by writing down all the ways I could think of to express this sort of stuff, then annotating it by hand. If I could add to this by going through friends' e-mails and whatnot, so much the better.
Now I've got a corpus, it can serve as a set of unit tests. I need to code a parser to fit the tests. The parser should translate a string of natural language into the logical form of my annotation. First, it should split the string into its constituent words. This is is called tokenising, and there is off-the-shelf software available to do it. (For example, see NLTK.) To interpret the words, I would look for patterns in the data: for example, text following 'at' or 'in' should be tagged as a location; 'for X minutes' means I need to add that number of minutes to the start time to get the end time. Statistical methods would probably be overkill here - it's best to create a series of hand-coded rules that express your own knowledge of how to interpret the words, phrases and constructions in this domain.
看来这个问题确实没有狭隘的方法来解决。我想避免使用整个 NLP 来找出解决方案,但我还没有找到任何替代方案。如果我稍后找到一个非常好的解决方案,我会更新这个。
It would seem that there's really no narrow approach to this problem. I wanted to avoid having to pull along the entirety of NLP to figure out a solution, but I haven't found any alternative. I'll update this if I find a really great solution later.