用于网络分析的数据库
选择什么数据库来存储有关网站访问的信息,关键特征:数据量大,每秒页面请求多,数据呈现不同的报告,我认为使用MySql,有什么建议吗?
what database to choose to store information about site visits, key characteristics: big amount of data, many page requests per second, different reports for data presentation, i think to use MySql, any suggestions?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
考虑让服务器记录请求并异步解析它们。您不需要 ACID 进行分析,也不需要在与客户交谈时处理它们。
Consider letting the server log the requests and parsing them asynchronously. You don't need ACID for analytics, and you don't need to process them while talking to a client.
大多数主流数据库都适合于此(包括 mysql、postgres、oracle 等)。不过 MySql 很好,特别是如果您以前使用过它的话。
一定要查看许可证:MySql 是 GPL(数据库和连接器),Postgres 是 BSD,Oracle(以及其他一些)需要付费。
Most mainstream databases are good for that (including mysql, postgres, oracle etc). MySql is fine though, especially if you've used it before.
Be sure look at licenses as well: MySql is GPL (the database and the connectors), Postgres is BSD, Oracle (and a few others) you need to pay for.
大多数网络分析公司使用某种分布式文件系统来存储日志,例如HDFS、QFS……原因是数据对于传统数据库来说太大了。
分析报告通过 MapReduce 作业生成。
如果你想做一个即席查询,你通常会使用 Hive/Pig/Sawzall 之类的东西。
Most web analytics companies use some kind of distributed file system to store logs, such as HDFS, QFS... The reason is that the data is too big for the traditional database.
Analytics reports are generated via MapReduce job.
If you want to do an adhoc query, you normally use something like Hive/Pig/Sawzall.