识别php中的时态
我正在寻找一种方法来分析一串文本并找出它的时态,例如:“我要去商店”==当前,“我买了一辆车”==过去等。 ?
关于如何做到这一点有什么建议吗
I'm looking for a way to analyze a string of text and find out in which tense it was written, for example : "I'm going to the store" == current, "I bought a car" == past ect..
Any tips on how I could this done?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
您可以在 Ian Barber 的 PHP/ir 站点上找到 PHP 的基本 Brill 解析器实现。 该算法将为您的单词添加标签。
如果输入“我认为”,结果将是:
I/NN think/VBP
NN=名词,
VBP= 动词现在时
You can find a basic Brill Parser implementation for PHP at Ian Barber's PHP/ir site. The algorithm will tag your words.
If you enter the words "I think", the result will be:
I/NN think/VBP
NN= Noun,
VBP= Verb Present
是的,这将非常困难...我已经开始为一个快速的周末项目做类似的事情,直到我意识到这一点...尽管如此,我发现这里有一个有用的资源。
从Princeton下载Wordnet 3.0的源代码,它有一个英语单词数据库。 文件 /dict/index.verb 是现在时英语动词的列表,您应该能够轻松地将其作为 CSV 导入数据库。 从那里开始,你就得靠自己了,并且需要弄清楚如何处理英语的奇怪之处。
Yes, this is going to be extremely difficult... I had started to do something similar for what was going to be a quick weekend project until I realized this... nonetheless here is a resource I found to be helpful.
Download the source code of Wordnet 3.0 from Princeton, which has a database of english words. The file /dict/index.verb is a list of present tense english verbs you should be able to import into your database as a CSV without too much trouble. From there, you're on your own, and will need to figure out how to handle the weirdness that is the English language.
这可能是一个任务繁重的过程。 您想要了解多详细? 你只想考虑过去、现在和未来吗? 或者你想考虑简单现在时、现在进行时、简单过去时等?
无论如何,您还必须评估肯定形式、否定形式和问题形式。 可以在 http://www.ego4u 找到可以提供帮助的优秀在线图表.com/en/cram-up/grammar/tenses
注意规则和信号词。
This could be a rather tasking process. How detailed do you want to get? Do you want to consider only past, present, and future? Or do you want to consider Simple Present, Present Progressive, Simple Past, etc?
In any case, you'll also have to evaluate the Affirmative forms, Negative forms, and Question forms. A great chart online that can help can be found at http://www.ego4u.com/en/cram-up/grammar/tenses
Note the rules and signal words.
从数据库/文件中标记/查找动作词(或者至少猜测 - *th=过去,例如)/计算时态命中数?
Tokenize / find action words from db/file (or at least, guess - *th=past, for example) / count tense hits?
对于这样的任务,我相信正则表达式是不够的:这是一项相当困难的任务......
要么你不会从正则表达式中得到任何好处,要么你会以某种超级怪物正则表达式结束即使您也无法理解并能够维护...
这可能需要的不仅仅是正则表达式...我想像某种“语言引擎”之类的东西...
For such a task, I believe Regular expressions won't be enough : it's a pretty difficult task...
Either you won't get anything good at all from regex, or you'll end with some kind of super-monster-regex that not even you will understand and be able to maintain...
This probably requires more than regex... Something like some kind of "linguistic-engine", I suppose...
如果您确实需要它并且不仅仅是玩玩,您可以看看 nltk。 解析是一件复杂的事情。 解析自然语言甚至更加复杂。 解析高度不规则的语言,例如英语,情况更糟。 如果你能缩小问题范围,你就有更好的机会找到解决方案。
你需要它做什么?
If you actually need it and aren't just playing around, you might take a look at nltk. Parsing is a complex matter. Parsing natural languages is even more complex. And parsing a highly irregular language, such as English, is even worse. If you can narrow the problem scope down, you stand a better chance at a solution.
What do you need it for?