分析仪表板策略
我们目前正在开发一个 API,我们希望为客户提供一个分析仪表板,以查看他们每月/每天/每小时的呼叫指标。
我们当前认为的策略是将每个呼叫保存到客户端单独的表(例如,calls_{client_id})中,因为历史原因,并有一个汇总表(例如,calls_summary),其中包含每个呼叫在一天中给定小时的呼叫数量。客户。
然后,每天 cron 作业将创建一个 xml 文件,其中包含每个客户端最后一天的调用摘要,仪表板将使用它们而不是数据库。因此,唯一使用数据库的分析任务将是 cron 作业。
对于基础设施,我们正在考虑 MySQL 复制和从属数据库作为分析数据库。
该策略对于真实的网络统计有用且有效吗? 你能提出任何调整,甚至是完全不同的调整吗?
We are currently developing an API and we want to provide an analytics dashboard for our clients to view metrics about their calls per month/day/hour.
The current strategy we thought is to save each call to a client separate table (eg. calls_{client_id}) for historic reasons and have a summary table (eg. calls_summary) containing the number of calls for a given hour of a day for each client.
Then, each day a cron job will create a xml file with the summary of last day's calls for each client and the dashboard will use them instead of the database. So, the only analytics task which will use the database will be the cron job.
For infrastructure we are thinking MySQL replication and the slave as the analytics database.
Is that strategy useful and valid for real web statistics?
Can you propose any tuning on that or even a totally different one?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
否。除非有充分的理由,否则不要违反规范化规则。它不会提高性能,实际上可能非常有害。它肯定会使您的代码更加复杂,从而降低可靠性。
逐个周期地归档旧记录可能是值得的,但除非您知道会遇到性能问题,否则我建议您不要这样做。
无论如何,将数据预先合并到另一个表中(前提是行数减少至少 95%)。但是,除非您需要该格式的数据,否则不必费心将其转换为 XML。
至于如何预合并......要么使用基于期间的合并(例如按日期汇总),要么使用标记来记录哪些记录已经合并。
运行整合的频率越低,对性能的影响就越大。但如果运行得太频繁,就会遇到争用/锁定问题。
如果不了解数据的结构和数量,或者预算、可用性和及时性方面的限制,就很难提供最佳的解决方案。但如果是我,我可能会选择 3 层 mysqld - 一层提供事务写入功能,一层复制该数据并生成整合数据,一层提供对整合数据的读取访问(master <-> master <-> 奴隶)
No. Don't break the rules of normalization unless you've got good reason. It won't improve performance and could actually be very detrimental. It'll certainly make your code more complex and therefore less reliable.
It might be worth archiving off old records on period-by-period basis, but unless you know that you'rew going to run into performance problems I'd advise against this.
By all means pre-consolidate the data into another table (provided you are getting a reduction ni number of rows of at least 95%). But don't bother converting it to XML unless and until you need the data in that format.
As for how you pre-consolidate....either use period based consolidations (e.g. roll up by date) or use flagging to record which records have already been consolidated.
The less frequently you run the consolidation the more the impact on performance. But run it too frequently and you'll have problems with contention / locking.
Without knowing a lot about the structure and volume of data or the constraints in terms of budget, availability and timeliness it's hard to provide an optimal solution. But if were me, I'd probably go with 3 mysqld tiers - one providing the transactional write facility, one replicating this data and generating the consolidated data, and one to provide read access to the consolidated data (master <-> master <-> slave)
就性能而言,为每个客户端创建单独的表不是一个好主意。经典的方法如下:
现在,如果您想检索客户端的所有调用,请像这样:
或者:
我没有看到任何使用 xml 的原因,您的 cron 可能只是更新 requests_summary 表。
Performance wise, it is a bad idea to create a separate table for each client. The classic approach for that would be the folowing:
Now if you want to retrieve all the calls of a client, you go like this:
Or :
I don't see any reason for using xml, your cron could be just updating the calls_summary table.