当前位置：文江博客话题详情

对MySQL中表的最大记录数感到困惑

发布于 2024-09-30 05:54:27 字数 748 浏览 9 评论 0原文

我正在使用一个网站分析器，它将用于根据 tomcat 的日志分析我们自己的网站。

现在，我们每天将日志从tomcat推送到数据库（MySQL），现在运行良好。然而我发现了一个潜在的致命问题！

到目前为止，我们将日志推送到数据库中的单个表中，但是日志项很快就会增加，尤其是当我们容纳更多用户时，显然单个表无法保存这么多日志项（而且当使用时也会导致性能低下）从large表执行查询操作）。

并且我们使用hibernate作为持久层，日志表中的每一行都映射到应用程序中LogEntry的java对象。

我想过每个月创建一张新表，但是如何让LogEntry映射到多个表并跨表查询？

另外，每个月的日志数可能不一样，一个极端的例子，日志数（表中的记录）大于db中表的最大容量怎么办？

然后我想到设置一个属性来限制hibernate推送日志到数据库时推送日志的最大数量。如果是这样，我不知道告诉休眠创建一个新表并自动跨表查询。

有什么想法吗？

更新Sandy：

我知道你的意思，也就是说表的最大容量是由操作系统决定的，如果我使用分区，最大容量可能会增加，直到达到最大容量我的磁盘。然而，即使我使用分区，似乎我也不需要关心表的最大容量，但如果表保存太多记录，则会导致性能低下。（顺便说一句，我们还没有决定删除旧日志。）我想到的另一种方法是创建多个具有相同结构的表，但我使用的是hibernate，所有日志插入和查询都将通过hibernate，并且可以实体（POJO）映射到多个表？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

把人绕傻吧 2024-10-07 05:54:27

我想过每个月创建一张新表，但是如何让LogEntry映射到多个表并跨表查询？

查看Hibernate Shards（数据库分片是一种水平分区的方法）。虽然这个su项目不是很活跃并且有一些限制（请参阅文档），但它很稳定且可用（Hibernate Shards已由 Google 的 Max Ross 贡献，他正在内部使用它）。

另外，每个月的日志数可能不一样，一个极端的例子，日志数（表中的记录）大于db中表的最大容量怎么办？

监视您的数据库/表并预测所需的维护。

如果是这样，我不知道告诉休眠创建一个新表并自动查询跨表。

Hibernate 不会自动执行此操作，这将是数据库和分片配置维护的一部分（另请参阅有关虚拟分片）。

回复收藏 0 原文

谁许谁一生繁华 2024-10-07 05:54:27

我认为你应该考虑水平分区。

水平分区
这种形式
对段表行进行分区
不同的身体群体
基于行的数据集的形成可以
单独处理（一个
分区）或集体（一对多
分区）。所有列定义为
每组中都可以找到表
分区所以没有实际的表
属性缺失。一个例子
水平分区可能是
包含十年价值的表
历史发票数据为
分为十个不同的
分区，其中每个分区
包含一年的价值
数据.数据。
提高性能 - 扫描期间
MySQL优化器知道的操作
哪些分区包含以下数据
将满足特定查询并且
将仅访问那些必要的
查询执行期间的分区。对于
例如，一百万行表可能是
分成十个不同的
以范围样式进行分区，以便每个
分区包含 100,000 行。 *如果
发出只需要数据的查询
从其中一个分区，以及
表扫描操作是必要的，
只会访问 100,000 行
而不是一百万。显然，它是
MySQL 采样速度更快
100,000 行比 100 万行多，所以
查询将更快完成。这
索引也能带来同样的好处
可以作为本地访问
分区索引是为
分区表。最后，就是
可以对分区表进行条带化
跨不同的物理驱动器
指定不同的文件
特定的系统/目录路径
分区。这允许物理 I/O
多个时减少争用
分区同时被访问
时间。

查看这篇文章通过分区提高数据库性能

更新

看起来水平分区可以处理大表，但是如果记录数大于表的最大大小怎么办？
实际上，mysql表的最大大小是由操作系统限制决定的。看看这个，然后自己确定。
另一种选择是定期清除旧日志记录（仅当分析不需要它们时）。
创建一个 cron 作业或任何计划任务来执行删除。

I think you should consider horizontal partitioning.

Horizontal Partitioning
this form of
partitioning segments table rows so
that distinct groups of physical
row-based datasets are formed that can
be addressed individually (one
partition) or collectively (one-to-all
partitions). All columns defined to a
table are found in each set of
partitions so no actual table
attributes are missing. An example of
horizontal partitioning might be a
table that contains ten years worth of
historical invoice data being
partitioned into ten distinct
partitions, where each partition
contains a single year's worth of
data.data.
Increased performance - during scan
operations, the MySQL optimizer knows
what partitions contain the data that
will satisfy a particular query and
will access only those necessary
partitions during query execution. For
example, a million row table may be
broken up into ten different
partitions in range style so that each
partition contains 100,000 rows. *If a
query is issued that only needs data
from one of the partitions, and a
table scan operation is necessary,
only 100,000 rows will be accessed
instead of a million. Obviously, it is
much quicker for MySQL to sample
100,000 rows than one million so the
query will complete much sooner. The
same benefit is derived should index
access be possible as local
partitioned indexes are created for
partitioned tables. Finally, it is
possible to stripe a partitioned table
across different physical drives by
specifying different file
system/directory paths for specific
partitions. This allows physical I/O
contention to be reduced when multiple
partitions are accessed at the same
time.

Checkout this article Improving Database Performance with Partitioning

Update

It seems that the Horizontal Partitioning can handle the large table, but how about if the number of the record is greater than the max size of the table?
Actually, max size of mysql table is determined by Operating System constraints. Have a look at this, and determine yourself.
Alternative option is to purge old log records periodically, only if they are not required for analysis.
Create a cron job or any scheduled task to do the deleting.

回复收藏 0 原文

~没有更多了~