使用 Treetop 与 stanford nlp 解析数据和 POS
我正在尝试用 Ruby 解析事件(音乐会、电影等)数据,但无法决定使用什么工具。
我最初认为 stanford 解析器是最好的选择,但后来听说了 Treetop。
我正在努力解决这两个问题,因为让 stanford 解析器在 Windows 上与 Ruby 一起工作已经花费了两天多的搜索和努力,并且在安装它时出现了无穷无尽的错误。
Treetop安装没有问题,但是文档非常有限,从我收集到的信息来看,Treetop似乎最擅长处理语法结构而不是实际内容,但也许我只是没有完全理解Treetop的功能。
(我认为)一件好事是我拥有一个包含乐队和电影名称的大型数据库/语料库(?),以及我想要检索的数据的相当有限的部分。
例如,一个列表是
The Tragically Hip with Guest Hey Rosetta!, Friday Jul 15th, 7:30pm, Deer Lake Park
另一个列表是
07/08/11 - Tacoma Dome, New Kids on the Block & Backstreet Boys w/ Matthew Morrison, 7:30pm, Tacoma, WA
对于每个列表,我都试图获取一组相当具体的细节,包括谁/什么、日期、时间、城市、地点。
鉴于我已经有了乐队名称的数据集,并且城市名称应该相当容易获得列表,挑选其他详细信息应该“相当”容易,我只是不确定我应该使用哪个工具是时候了,或者是否有更好的方法来做到这一点?
有什么建议吗?
I'm trying to parse event (concerts, movies, etc. etc.) data in Ruby and can't decide on what tool to use.
I thought the stanford parser was the way to go initially, but then heard of treetop.
I'm struggling with both, as getting the stanford parser to work with Ruby on Windows has taken up two+ days of searching and struggling and no end of errors in just getting it installed.
Treetop installed no problem, but the documentation is very limited, and from what I can gather, it seems that treetop is best at dealing with a grammar structure than the actual content, but maybe I'm just not completely understanding Treetop capabilities.
One of the nice things (I think) is that I have is a large database/corpus(?) of band and movie names, and a fairly limited parts of data that I'm looking to retrieve.
For instance one listing is
The Tragically Hip with Guest Hey Rosetta!, Friday Jul 15th, 7:30pm, Deer Lake Park
Another listing is
07/08/11 - Tacoma Dome, New Kids on the Block & Backstreet Boys w/ Matthew Morrison, 7:30pm, Tacoma, WA
With each listing I'm trying to grab a rather specific group of details, being who/what, date, time, city, venue.
Seeing as I already have a dataset of band names, and city names should be fairly easy to get a listing of, it should be 'fairly' easy to pick out the other details, I'm just not sure which tool I should dedicate my time to, or if there is a better way to do this?
Any suggestions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
不,
treetop
用于解析更结构化的语言(如计算机语言)。对于自然语言解析(NLP),你最好使用斯坦福解析器或类似的东西。看一下这篇关于 NLP 与 Ruby 结合的博客文章:http: //mendicantbug.com/2009/09/13/nlp-resources-for-ruby/
No,
treetop
is used to parse more structured languages (like computer languages). For Natural Language Parsing (NLP), you'd better use The Stanford Parser or something like it. Have a look at this blog entry about NLP in combination with Ruby:http://mendicantbug.com/2009/09/13/nlp-resources-for-ruby/