Remember the Milk 的字符串匹配是如何工作的?

发布于 2024-12-05 20:39:09 字数 824 浏览 1 评论 0原文

我有兴趣开发一个与 RTM 的智能添加功能类似的解决方案。

对于那些不知道 Remember the Milk 的人来说,这里是它的工作原理:添加任务是通过输入框完成的,该输入框接受字符串并解析出不同的参数,如任务名称、截止日期、优先级、标签等。这些参数通常是前面有特殊符号(^、#、& 等)。 RTM 还接受“周三网球”等变体。

我向您提出的基本问题是,如何设计一个能够智能识别字符串不同部分的系统。我需要研究自然语言处理吗?

到目前为止,我使用一个简单的正则表达式来查找特殊的前置符号(^、#、& 等),然后解析出字符串的不同部分。随着越来越多的无序参数,这变得越来越困难。也许这源于我缺乏正则表达式专业知识。

当尝试转换不同格式的截止日期时('27 May 2008 16:00', '27th May 2008', '16th June 16:00', 'June 16th 12:00', 'today 12:00am '等)到日期时间对象中。我目前正在使用 Python 和正则表达式。我的方法基本上是运行一长串可能的日期和时间组合,并使用 date.strptime 转换匹配表达式。我发现这种方法很难维护;很多误报、剩余字符串等。您可以在这里查看我的代码: https://gist.github.com/ 1233786 这并不漂亮,我们已警告过您。

如果您能提供有关处理此主题的正确方向的任何提示,我将不胜感激。编写日期解析器确实很有趣,但在寻找数百个不同用例中的所有错误之前,我想检查一下是否有更优雅的设计模式。

PS:我想要一些代码示例来深入学习。最好是 Python :)

I'm interested in developing a similar solution to RTM's Smart Add Feature.

For those who don't know Remember the Milk here's how it works: Adding tasks is done by means of an input box that accepts strings and parses out different parameters like task name, due date, priority, tags, etc. The parameters are usually preceded by special symbols ( ^, #, &, etc. ). RTM also accepts variations like 'Tennis on Wednesday'.

My basic question to you is how would you design a system that is capable of intelligently discerning different parts of a string. Will I have to look into natural language processing?

Thus far I'm using a simple regex expression that looks for special preceding symbols ( ^, #, &, etc. ) and then parses out the different parts of the string. This gets increasingly difficult with more and more unordered parameters. maybe that stems from my lack of regex expertise.

A similar problem arises when trying to convert different formats of due dates ( '27 May 2008 16:00', '27th May 2008', '16th June 16:00', 'June 16th 12:00', 'today 12:00am', etc) into datetime objects. I'm currently using Python and regular expressions. My method is to basically run through a long list of possible date and time combinations and convert the matching expression with date.strptime. I found this approach to be hard to maintain; lots of false positives, leftover strings etc. You can look at my code here: https://gist.github.com/1233786 It's not pretty, you have been warned.

I'd appreciate any hint into the right direction to approach this topic. Coding a dateparser was really fun but I before I hunt down all the bugs in hundreds of different use cases I thought I check if there's a more elegant design pattern.

P.S.: I would love some code samples to sink my teeth in. Preferably Python :)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

完美的未来在梦里 2024-12-12 20:39:09

我假设他们有一些用于解析输入句子的语法。这些语法可以表达各种 NLP 结构,例如实体提取。对于这些语法,可以使用 GATE JAPE(http://gate.ac.uk/sale/tao/splitch8.html#chap:jape) 或 Gexp(http://code.google.com/p/graph-expression/ )

I assume they have some grammars for parsing input sentece. Those grammar can express variety of NLP structures, such es entity extraction. For those grammar one can use GATE JAPE(http://gate.ac.uk/sale/tao/splitch8.html#chap:jape) or Gexp(http://code.google.com/p/graph-expression/)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文