您认为“快速添加”功能如何？ Google 日历中的功能有效吗？

发布于 2024-09-04 16:56:16 字数 691 浏览 9 评论 0原文

我正在考虑一个项目，该项目可能使用与“快速添加”如何处理将自然语言解析为可以通过某种程度的语义理解的内容类似的功能。我有兴趣更好地理解这一点，并想知道您对如何实施这一点有何想法。

如果您不熟悉什么是“快速添加”，请查看 Google 的相关知识库。

2010 年 6 月 4 日更新
对“自然语言解析”（NLP）的额外研究产生的结果比我认为在“快速添加”之类的东西中实际实现的结果要广泛得多。鉴于此功能需要特定类型的输入而不是真正的自由格式文本，我认为这是 NLP 的一种更狭窄的实现。如果有人可以建议我可以研究更狭窄的主题，而不是整个 NLP 的广度，我将不胜感激。

也就是说，我找到了一个很好的有关 NLP 的资源集合包括这个很棒的常见问题解答。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

小姐丶请自重 2024-09-11 16:56:16

我首先决定采用一种标准方式来表示我感兴趣的所有信息：活动名称、开始/结束时间（和日期）、宾客名单、地点。例如，我可能会使用这样的 XML 表示法：

<event>
    <name>meet Sam</name>
    <starttime>16:30 07/06/2010</starttime>
    <endtime>17:30 07/06/2010</endtime>
</event>

然后，我的目标是构建一个有关日期的日记条目语料库，并用其 XML 形式进行注释。我将如何收集数据？好吧，如果我是谷歌，我可能会有各种各样的方法。因为我是我，所以我可能会首先写下我能想到的所有表达此类内容的方式，然后手工注释。如果我可以通过浏览朋友的电子邮件之类的方式来补充这一点，那就更好了。

现在我有了一个语料库，它可以作为一组单元测试。我需要编写一个解析器来适应测试。解析器应该将自然语言字符串翻译成我的注释的逻辑形式。首先，它应该将字符串拆分为其组成词。这称为标记化，并且有现成的软件可以执行此操作。（例如，请参阅 NLTK。）为了解释这些词，我会在数据中查找模式：例如，“at”或“in”后面的文本应标记为位置； “X 分钟”意味着我需要将该分钟数添加到开始时间以获得结束时间。统计方法在这里可能有点大材小用 - 最好创建一系列手工编码的规则来表达您自己如何解释该领域中的单词、短语和结构的知识。

I would start by deciding on a standard way to represent all the information I'm interested in: event name, start/end time (and date), guest list, location. For example, I might use an XML notation like this:

<event>
    <name>meet Sam</name>
    <starttime>16:30 07/06/2010</starttime>
    <endtime>17:30 07/06/2010</endtime>
</event>

I'd then aim to build up a corpus of diary entries about dates, annotated with their XML forms. How would I collect the data? Well, if I was Google, I'd probably have all sorts of ways. Since I'm me, I'd probably start by writing down all the ways I could think of to express this sort of stuff, then annotating it by hand. If I could add to this by going through friends' e-mails and whatnot, so much the better.

Now I've got a corpus, it can serve as a set of unit tests. I need to code a parser to fit the tests. The parser should translate a string of natural language into the logical form of my annotation. First, it should split the string into its constituent words. This is is called tokenising, and there is off-the-shelf software available to do it. (For example, see NLTK.) To interpret the words, I would look for patterns in the data: for example, text following 'at' or 'in' should be tagged as a location; 'for X minutes' means I need to add that number of minutes to the start time to get the end time. Statistical methods would probably be overkill here - it's best to create a series of hand-coded rules that express your own knowledge of how to interpret the words, phrases and constructions in this domain.

回复收藏 0 原文