处理巨大的 MYSQL 表

发布于 2024-11-16 18:02:52 字数 2073 浏览 3 评论 0原文

希望你们都做得很好。我们有一个巨大的 mysql 表,名为“posts”。它有大约 70,000 条记录,大小已达到大约 10GB。

我的老板说必须采取一些措施来让我们轻松处理这个巨大的表,因为如果该表损坏了,那么我们将花费大量时间来恢复该表。有时也很慢。

有哪些可能的解决方案可以使处理该表在各个方面都变得更容易。

该表的结构如下:

CREATE TABLE IF NOT EXISTS `posts` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `thread_id` int(11) unsigned NOT NULL,
  `content` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
  `first_post` mediumtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `publish` tinyint(1) NOT NULL,
  `deleted` tinyint(1) NOT NULL,
  `movedToWordPress` tinyint(1) NOT NULL,
  `image_src` varchar(500) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
  `video_src` varchar(500) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `video_image_src` varchar(500) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `thread_title` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `section_title` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `urlToPost` varchar(280) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `posts` int(11) DEFAULT NULL,
  `views` int(11) DEFAULT NULL,
  `forum_name` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `subject` varchar(150) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `visited` int(11) DEFAULT '0',
  `replicated` tinyint(4) DEFAULT '0',
  `createdOn` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `urlToPost` (`urlToPost`,`forum_name`),
  KEY `thread_id` (`thread_id`),
  KEY `publish` (`publish`),
  KEY `createdOn` (`createdOn`),
  KEY `movedToWordPress` (`movedToWordPress`),
  KEY `deleted` (`deleted`),
  KEY `forum_name` (`forum_name`),
  KEY `subject` (`subject`),
  FULLTEXT KEY `first_post` (`first_post`,`thread_title`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=78773 ;

谢谢。

更新

注意:虽然我对回复很满意,但几乎所有答案都是关于优化当前数据库,而不是关于如何处理大型表。虽然我可以根据收到的回复来优化数据库,但它确实没有回答有关处理大型数据库的问题。现在我谈论的是 70,000 条记录,但在接下来的几个月甚至几周内,我们的数量将会增长。每条记录的大小约为 300kb。

Hope you are all doing great. We have a huge mysql table called 'posts'. It has about 70,000 records and has gone up to about 10GB is size.

My boss says that something has to be done to make it easy for us to handle this huge table because what if that table gets corrupted then it would take us a lot of time to recover the table. Also at times its slow.

What the are possible solutions so that handling this table becomes easier for as in all aspects.

The structure of the table is as follows:

CREATE TABLE IF NOT EXISTS `posts` (
  `id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `thread_id` int(11) unsigned NOT NULL,
  `content` longtext CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL,
  `first_post` mediumtext CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `publish` tinyint(1) NOT NULL,
  `deleted` tinyint(1) NOT NULL,
  `movedToWordPress` tinyint(1) NOT NULL,
  `image_src` varchar(500) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL DEFAULT '',
  `video_src` varchar(500) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `video_image_src` varchar(500) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `thread_title` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `section_title` text CHARACTER SET utf8 COLLATE utf8_unicode_ci,
  `urlToPost` varchar(280) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `posts` int(11) DEFAULT NULL,
  `views` int(11) DEFAULT NULL,
  `forum_name` varchar(50) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `subject` varchar(150) CHARACTER SET utf8 COLLATE utf8_unicode_ci DEFAULT NULL,
  `visited` int(11) DEFAULT '0',
  `replicated` tinyint(4) DEFAULT '0',
  `createdOn` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  PRIMARY KEY (`id`),
  UNIQUE KEY `urlToPost` (`urlToPost`,`forum_name`),
  KEY `thread_id` (`thread_id`),
  KEY `publish` (`publish`),
  KEY `createdOn` (`createdOn`),
  KEY `movedToWordPress` (`movedToWordPress`),
  KEY `deleted` (`deleted`),
  KEY `forum_name` (`forum_name`),
  KEY `subject` (`subject`),
  FULLTEXT KEY `first_post` (`first_post`,`thread_title`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=78773 ;

Thanking You.

UPDATED

Note: although I am great-full for the replies but almost all answers have been about optimizing the current database and not about how to generally handle large tables. Although I can optimize the database based on the replies I got, it really does not answer the question about handling huge databases. Right now I am talking about 70,000 records but during the next few months if not weeks we are going to grow a magnitude. Each record can be about 300kb in size.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

半衬遮猫 2024-11-23 18:02:52

我的回答也是对之前两条评论的补充。

您已经为一半的表编制了索引。但是,如果您查看某些索引(publish、deleted、movedToWordPress),您会发现它们是 1 或 0,因此它们的选择性很低(行数除以该列的不同值的数量)。这些索引浪费空间。

有些事情也是没有意义的。
tinyint(4) - 这实际上并没有使其成为 4 位整数。数字有显示长度。 tinyint 是 1 个字节,因此它有 256 个可能的值。我假设那里出了问题。

另外,10 场演出的规模只有 75,000 张唱片?你是如何测量尺寸的?另外,你的硬件是什么?

编辑您更新的问题:

有很多方法可以扩展数据库。我将链接一个问题/答案,以便您了解可以做什么:这里
您可以做的另一件事是获得更好的硬件。通常,数据库在大小增加时速度缓慢的原因是 HDD 子系统和用于处理数据集的可用内存。你拥有的内存越多,速度就越快。

您可以做的另一件事是将表分成两个表,一个表保存文本数据,另一个表保存与系统执行某些搜索或匹配所需的数据相关的数据(您将整数字段放在那里)。
使用 InnoDB,如果两个表通过某种指向主键的外键连接,您将获得巨大的性能提升。由于 InnoDB 的主键查找速度很快 - 您正在为您可以对数据集执行的操作打开几种新的可能性。如果你的数据变得越来越大,你可以获得足够的 RAM,InnoDB 会尝试在 RAM 中缓冲数据集。有一个有趣的东西叫做 HandlerSocket这对拥有足够 RAM 并使用 InnoDB 的服务器起到了一些巧妙的作用。

最后,这实际上归结为您需要做什么以及如何做。由于您没有提到这一点,因此很难在这里估计您应该做什么
我优化的第一步肯定是调整 MySQL 实例并支持那个大表。

My answer's also an addition to two previous comments.

You've indexed half of your table. But if you take a look at some indexes (publish, deleted, movedToWordPress) you'll notice they are 1 or 0, so their selectivity is low (number of rows divided by number of distinct values of that column). Those indexes are a waste of space.

Some things also make no sense.
tinyint(4) - that doesn't actually make it a 4 digit integer. Number there is display length. tinyint is 1 byte, so it's got 256 possible values. I'm assuming something went wrong there.

Also, 10 gigs in size for just 75k records? How did you measure the size? Also, what's the hardware you got?

Edit in regards to your updated question:

There are many ways to scale databases. I'll link one SO question/answer so you can get the idea what you can do: here it is.
The other thing you might do is get better hardware. Usually, the reason why databases are slow when they increase in size is the HDD subsystem and available memory left to work with the dataset. The more RAM you have - the faster it all gets.

Another thing you could do is split your table into two in such a way that one table holds the textual data and the other holds the data relevant to what your system requires to perform certain searching or matching (you'd put integer fields there).
Using InnoDB, you'd gain huge performance boost if the two tables were connected via some sort of a foreign key pointing to primary key. Since InnoDB is such that primary key lookups are fast - you are opening several new possibilities to what you can do with your dataset. In case your data gets increasingly huge, you can get enough RAM and InnoDB will try to buffer the dataset in RAM. There's an interesting thing called HandlerSocket that does some neat magic with servers that have enough RAM and are using InnoDB.

In the end it really boils down to what you need to do and how you are doing it. Since you didn't mention that, it's hard to give an estimate here of what you should do.
My first step to optimizing would definitely be to tweak MySQL instance and to back that big table up.

魔法少女 2024-11-23 18:02:52

我想你必须更改一些列。

您可以从减少 var char 变量开始。

image_src/video_src/video_image_src VARCHAR(500) 我认为有点太多了。 (我想说 100 个 varchar 就足够了)

thread_title 是文本,但应该是 VARCHAR(200?) 如果你说我
与section_title相同 好的

,这就是你的问题
content longtext

你真的需要长文本吗?长文本最多有 4GB 空间。我认为如果您将此列更改为文本,它会小很多

    TINYTEXT    256 bytes    
    TEXT    65,535 bytes    ~64kb
    MEDIUMTEXT   16,777,215 bytes   ~16MB
    LONGTEXT    4,294,967,295 bytes ~4GB

编辑:我看到您使用全文索引。我确信这会节省大量数据。您应该使用另一种机制来搜索全文。

I guess you have to change some columns.

You can start by reduce your var char variables.

image_src/video_src/video_image_src VARCHAR(500) is a little too much i think. (100 varchars is enough i would say)

thread_title is text but should be a VARCHAR(200?) if you say me
same with section_title

Ok here is your problem
content longtext

Do you really need longtext here? longtext is up to 4GB of space. I think if you change this column to text it would be a lot smaller

    TINYTEXT    256 bytes    
    TEXT    65,535 bytes    ~64kb
    MEDIUMTEXT   16,777,215 bytes   ~16MB
    LONGTEXT    4,294,967,295 bytes ~4GB

Edit: i see you use a fulltext index. I am quite sure that is saving a lot a lot a lot of data. You should use another mechanism for searching full text.

很酷不放纵 2024-11-23 18:02:52

除了 Michael 的评论之外,速度缓慢可能是一个问题,具体取决于查询的优化程度以及匹配的适当索引。我会尝试找到一些花费比您希望的时间更长的罪魁祸首查询,并在 S/O 上发布,看看是否有人可以帮助优化选项。

In addition to what Michael has commented, the slowness can be an issue based on how well the queries are optimized, and proper indexes to match. I would try to find some of the culprit queries that are taking longer time than you hope and post here at S/O to see if someone can help in optimizing options.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文