mysql 表结构 - 一张非常大的表或单独的表?

发布于 2024-07-14 13:01:29 字数 235 浏览 8 评论 0原文

我正在从事一个本质上与网站访问者分析类似的项目。 它将被数百个网站使用,平均每个网站每天的浏览量为10,000到100,000,因此数据量将非常大。

我应该使用带有 websiteid 的单个表还是为每个网站使用单独的表?

对具有数百个网站(每个网站都有单独的表)的实时服务进行更改似乎是一个大问题。 另一方面,对于如此大的数据,性能和可扩展性可能会成为问题。 任何建议、意见或建议都是非常受欢迎的。

I'm working on a project which is similar in nature to website visitor analysis.
It will be used by 100s of websites with average of 10,000s to 100,000s page views a day each so the data amount will be very large.

Should I use a single table with websiteid or a separate table for each website?

Making changes to a live service with 100s of websites with separate tables for each seems like a big problem. On the other hand performance and scalability are probably going to be a problem with such large data. Any suggestions, comments or advice is most welcome.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(5

世界等同你 2024-07-21 13:01:29

一张表如何由网站 FK 分区

How about one table partitioned by website FK?

白云不回头 2024-07-21 13:01:29

我想说的是,根据您的数据,使用最有意义的设计 - 在本例中是一张大表。

这些记录都是相同的类型,具有相同的列,因此从数据库规范化的角度来看,将它们放在同一个表中是有意义的。 索引使选择特定行变得容易,特别是当单个索引中的数据可以满足整个查询时(通常是这种情况)。

请注意,访问者分析必然涉及大量操作,除了一次对大量行进行操作之外,没有简单的优化方法 - 例如:计数、求和和平均值。 像这样的资源密集型统计数据通常是预先计算和存储的,而不是实时获取的。 这是你想要考虑的事情。

I would say use the design that most makes sense given your data - in this case one large table.

The records will all be the same type, with same columns, so from a database normalization standpoint they make sense to have them in the same table. An index makes selecting particular rows easy, especially when whole queries can be satisfied by data in a single index (which can often be the case).

Note that visitor analysis will necessarily involve a lot of operations where there is no easy way to optimise other than to operate on a large number of rows at once - for instance: counts, sums, and averages. It is typical for resource intensive statistics like this to be pre-calculated and stored, rather than fetched live. It's something you would want to think about.

递刀给你 2024-07-21 13:01:29

如果数据统一,就用一张表。 如果您需要在所有网站上进行选择
拥有多个表是一件痛苦的事情。 但是,如果您编写足够的脚本,则可以使用多个表来完成。

您可以使用 MySQL 的 MERGE 存储引擎跨表执行 SELECT(但不要指望良好的性能,并注意 Windows 对打开文件数量的硬限制 - 在 Linux 中您可能必须使用 ulimit 来提高限制。有在 Windows 中无法做到这一点)。

我已将一个巨大的表分成许多(数百个)表并使用 MERGE 来选择。 我这样做是为了可以对每个小表执行离线创建和优化。 (例如优化或更改表...排序依据)。 然而,SELECT 和 MERGE 的性能促使我编写自己的自定义存储引擎。 (描述http://blog.coldlogic.com/categories/coldstore/'>这里)

If the data is uniform, go with one table. If you ever need to SELECT across all websites
having multiple tables is a pain. However if you write enough scripting you can do it with multiple tables.

You could use MySQL's MERGE storage engine to do SELECTs across the tables (but don't expect good performance, and watch out for the Windows hard limit on the number of open files - in Linux you may haveto use ulimit to raise the limit. There's no way to do it in Windows).

I have broken a huge table into many (hundreds) of tables and used MERGE to SELECT. I did this so the I could perform off-line creation and optimization of each of the small tables. (Eg OPTIMIZE or ALTER TABLE...ORDER BY). However the performance of SELECT with MERGE caused me to write my own custom storage engine. (Described http://blog.coldlogic.com/categories/coldstore/'>here)

夜灵血窟げ 2024-07-21 13:01:29

使用单一数据结构。 一旦开始遇到性能问题,有很多解决方案,例如您可以按网站 ID 分区表,也称为水平分区,或者也可以使用复制。 这一切都取决于读取与写入的比率。

但首先要保持简单,并使用一个具有适当索引的表。 您还可以确定是否需要交易。 您还可以利用各种不同的 mysql 存储引擎,如 MyIsam 或 NDB(内存集群)来提高性能。 缓存在减轻数据库负载方面也发挥着非常好的作用。 大多数只读且易于计算的数据通常放入缓存中,缓存为请求提供服务,而不是访问数据库,只有必要的查询才会访问数据库。

Use the single data structure. Once you start encountering performance problems there are many solutions like you can partition your tables by website id also known as horizontal partitioning or you can also use replication. This all depends upon the the ratio of reads vs writes.

But for start keep things simple and use one table with proper indexing. You can also determine if you need transactions or not. You can also take advantage of various different mysql storage engines like MyIsam or NDB (in memory clustering) to boost up the performance. Also caching plays a very good role in offloading the load from the database. The data that is mostly read only and can be computed easily is usually put in the cache and the cache serves the request instead of going to the database and only the necessary queries go to the database.

書生途 2024-07-21 13:01:29

除非 MySQL 存在性能问题,否则请使用一张表。

这里没有人不能回答性能问题,你应该自己做性能测试来了解,一张大表是否足够。

Use one table unless you have performance problems with MySQL.

Nobody here cannot answer performance questions, you should just do performance tests yourself to understand, whether having one big table is sufficient.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文