想要创建一个数据仓库...新数据库还是只是将表堆积到现有数据库中?

发布于 2024-10-09 23:47:09 字数 446 浏览 5 评论 0原文

我已经为网络社区建立了一个 MySQL 数据库,它是一个潜在的统计金矿。目前,我正在通过对我的标准化数据库进行各种令人讨厌的查询来提供统计数据。我在共享主机上遇到了此类查询的“耐心限制”,并且希望转向数据仓库和每日 cron 作业,从而牺牲即时更新以将统计深度增加 100 倍。

我刚刚开始阅读有关数据仓库的内容,特别是星型模式,这一切似乎都非常简单。

我的问题本质上是 - 我应该将所有这些垃圾扔到一个新数据库中,还是只是将表堆到我现有的 MySQL 数据库中?当前数据库有47张表,其中最大的有30k条记录。我意识到这与您的普通企业应用程序相比微不足道,但您的普通企业应用程序不会(我希望!)在共享托管上运行!

那么,考虑到我的硬件限制,哪种方法更好?

我真的对此不太了解,但我认为在同一个数据库中读取表 A、计算然后更新表 B 比跨数据库容易得多,对吗?

我是否应该关心我的数据库有多少张表?

So I've got a MySQL database for an web community that is a potential stats goldmine. Currently I'm serving stats built via all sorts of nasty queries on my well-normalized database. I've run into the "patience limit" for such queries on my shared hosting, and would like to move to data warehousing and a daily cron job, thereby sacrificing instant updates for a 100-fold increase in statistical depth.

I've just started reading about data warehouses, and particularly the star schema, and it all seems pretty straight-forward.

My question essentially is - should I toss all that crap into a new database, or just pile the tables into my existing MySQL database? The current database has 47 tables, the largest of which has 30k records. I realize this is paltry compared to your average enterprise application, but your average enterprise application does not (I hope!) run on shared-hosting!

So, keeping my hardware limits in mind, which method would be better?

I really don't know much about this at all, but I assume reading Table A, calculating, then updating Table B is a lot easier in the same database than across databases, correct?

Should I even care how many tables my DB has?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

深海少女心 2024-10-16 23:47:09

如果您只需要提高性能,您应该创建一组预先准备好的报告表。工作量小,性能提升大。根据您所描述的数据量,这甚至不会对您的网络社区的用户产生明显的影响。

不同的数据库方法有几个好处(见下文),但我认为当您位于共享数据库主机上时,您不会获得任何好处。

  • 您可以为 DW 和网站支持不同的 SLA
  • DW 和 Web 数据库可以有不同的配置
  • DW 数据库在一天的大部分时间里基本上是只读的
  • DW 和 Web 数据库可以有不同的发布周期(这很大)
  • 典型的 DW 查询(大)数据量)不会杀死 Web DB 的缓存。

If you just need to improve performance, you should just create a set of pre-cocked reporting tables. Low effort and big performance gains. With the data volume you described, this won't even have an noticable impact on the users of your web community.

The different database approach has several benefits (see below) but I don't think you will gain any of them as you are on a shared database host.

  • You can support different SLA for DW and web site
  • DW and Web database can have different configurations
  • DW database is basically read-only for large portion of the day
  • DW and Web database can have different release cycles (this is big)
  • Typical DW queries (large amount of data) don't kill the cache for web DB.
乱了心跳 2024-10-16 23:47:09

特定数据库中的表数量通常不会成为问题,除非您拥有数千(或数万)个表,并且这些问题通常是由于与目录中最大文件数相关的文件系统限制而产生的。

你没有说你正在使用什么存储引擎。一般来说,您希望数据库中的索引适合内存以获得良好的插入/更新/删除性能,因此键缓冲区或缓冲池的大小必须足够大以容纳索引的“热”部分。

The number of tables in a particular database does not usually become a problem until you have thousands (or tens of thousands) of tables, and these problems usually come into play due to filesystem limits related to the maximum number of files in a directory.

You don't say what storage engine you are using. In general, you want the indexes in your database to fit into memory for good insert/update/delete performance, so the size of your key buffer or buffer pool must be large enough to hold the "hot" part of the index.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文