分析/报告 - 相同或单独的数据库,哪个数据库?

发布于 2024-10-09 23:22:40 字数 322 浏览 7 评论 0原文

我有一个具有一些业务功能的用户内容网站。所有表都在 1 个数据库中。现在,我根据活动和用户日志表添加部门报告分析 - 将其分解为按一年中的每一天、每种活动类型等进行部门报告。问题是:我是否创建一个单独的数据库用于分析(或人们称之为数据仓库)还是我只是将这些新表添加到现有数据库中?如果我必须为此创建一个单独的数据库,那么这意味着我需要将主数据库中的所有数据加载到分析数据库中的临时表中,然后将该数据加载到我假设的分析表中?

分析要求尽可能接近实时,因此基于此,如果我确实要使用单独的数据库,我不确定要使用哪个数据库。我使用的 MySQL 是否可以提供实时分析,即用户采取操作,下一秒如果他查看报告,数字将已经聚合?

I have a user content website with some business functionality. All tables are in 1 database. Now I am adding analytic with in dept reporting based on the activity and user log tables - breaking it down to have in dept reports by each day of the year, by each activity type, etc. The question is: do I create a separate database for analytic (or as people call it the data warhouse) or do I just add those new tables to existing database? If I have to create a separate DB for this, then that means I need to load in all the data from the main DB into temp tables in the Analyic DB then load that data into the analytic tables I assume?

The analytic requirements is as close to real time as possible so based on this I am not sure which DB to use if I do go for a separate one. Can MySQL that I use do the job of providing real time analytic, that is user takes an action and the next second if he views a report the numbers will be already aggregated?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

小苏打饼 2024-10-16 23:22:40

这取决于您期望的报告数量。事务处理数据库通常采用 3NF 设计,以实现高效插入。

由于所需的联接数量较多,报告更加复杂。此外,从报告中添加大量额外的 SELECT 事务可能会降低性能,这就是使用报告数据库的原因。

您可以权衡可能的报告负载和性能影响以及设置报告副本和 ETL 来填充它。您还需要确定是否有副本以及复制的频率。您可以使用一个论点来反对“实时”要求,即如果业务根据固定的数据快照(例如每日副本)进行报告,则业务报告可能会更加“一致”。

有关方法,请参阅填充报告/数据仓库数据库的策略将数据加载到报告数据库中。

It depends on the amount of reporting you're expecting. Transaction processing databases are generally designed in 3NF for efficient inserts.

Reporting is more complex due to the number of joins required. Also adding an significant extra number of SELECT transactions from reporting has the potential to degrade performance, hence why reporting databases are used.

It's up to you to weigh the likely reporting load and performance impact against setting up a reporting replica and ETL to populate it. Also you need to determine if you have a replica, how often to replicate. There is an argument you can use against the 'real time' requirement that the business reporting may be more 'consistent' if the business are reporting against a fixed snapshot of data (e.g. a daily copy).

See Strategies for populating a Reporting/Data Warehouse database for approaches to load data into a reporting database.

旧时光的容颜 2024-10-16 23:22:40

此时,一切都与硬件有关。如果您要将分析数据库与应用程序放置在同一系统(硬盘驱动器)上,那么无论您削减它,您都不会看到很大的性能改进。磁盘扫描会降低您的速度...无论数据库分离如何,一个磁盘的扫描速度都非常快。

但是,如果我必须实现实时报告...我会创建第二个数据库用于用户活动和监控。当用户执行活动时,我会插入到该数据库中。在报告过程中,我会跨数据库加入(这会减慢系统速度,但由于您在同一个盒子上,所以别无选择)。我会将用户活动放在一个单独的数据库中,这样随着时间的推移,备份和清理旧数据会更容易。

如果您能够将报告系统与应用程序系统分开,我会执行 15 ETL/同步作业,仅将您需要的表复制到不同系统上的报告数据库。然后我会报告该系统。显然,用户有 15 分钟的延迟,但这允许更快的报告。但是,这不是真正的数据仓库,而是满足您的特定需求的临时解决方案。

It's really all about the hardware at this point. If you are going to locate the analytical database on the same system(hard-drive) as the application, you are not going to see a whole lot of performance improvement anyway you cut it. Your speed is slowed by your disk scans...one disk is only going to scan so fast, regardless of database separation.

However, if I had to implement with real-time reporting... I would do a second database for user activity and monitoring. I'd insert into this database as users perform activities. During reporting, I'd join across databases (which will slow the system down, but since you are on the same box, you have no alternative). I would put the user activities on a separate database, so it's easier to backup and clean old data out as time goes by.

If you are able to separate your reporting system from your application system, I would do a 15 ETL/sync job which copies only the tables you need over to the reporting database on a different system. I would then report off that system. Obviously the users have a 15 min delay, but this allows swifter reporting. However, this would not be a true data warehouse, but an ad-hoc solution to meet your specific need.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文