意见挖掘 - 什么数据库类型?
我正在进入一个项目来进行意见挖掘(数据挖掘 -> 网络挖掘 -> 意见挖掘)以获得所包含单词的语义方向。我们将使用爬虫来获取页面意见。现在的问题是,我应该使用什么类型的数据库(OO、关系型、层次结构等),最好在此类项目中使用。 我知道这是一个具体的问题,我不期待每个人的回应,但至少有人已经这样做了,这会有所帮助。
问候!
I am entering a project to make a Opinion Mining (Data Mining -> Web Mining -> Opinion Mining) to get semantic orientation of the words contained. We will use a crawler to get the pages opinion. Now the question is, what type of DataBase should I use (OO, Relational, hierachycal, etc), is best to use in this type of project.
I know this is a specific question, Im not expecting everybodies response but at least someone that already did it, that would help.
Regards!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您需要大规模且响应迅速的东西,您可能需要使用 Google 的 BigTable 或其他东西具有这种性质。在原型级别,我确信您可以使用传统的关系数据库,但在某些时候您会遇到性能障碍。请参阅布鲁尔 CAP 定理。
If you need something large scale and responsive, you would probably need to go for Google's BigTable or something of that nature. At the prototype level, I am sure you can use traditional relational databases, but at certain point you'd hit the performance wall. See Brewer's CAP Theorem.
根据我在此类场景中的经验,关系数据库可以很好地满足您的目的。在存储其中的 Web 内容部分时,您需要格外小心 - 无论您是想使用数据库来存储它,还是像文件系统一样简单地存储它。 BLOB 特别需要额外小心,并且它们会增加您的维护工作。
同样基于项目的性质,您肯定会使用许多已经内置的组件等。其中许多已经支持/易于扩展以使用关系数据库作为数据存储。
From my experience in such kind of scenarios a relational database can serve your purpose pretty well. You need to be extra careful when storing the web content part of it - whether you want to at all use a database to store it or will storing on as simple as a file system can do. BLOBs specially require extra care and they increase your maintenance work.
Also based on the nature of the project, you would certainly be using a lot of already built in components etc. many of which would already support/easy to extend to use a relational DB as a data store.