多少行数据算过多行数据?

发布于 2024-07-15 10:59:38 字数 114 浏览 8 评论 0原文

对于 SQL 表来说多大才算太大,是否有一些硬性规定?

我们以名称/值对格式存储 SCORM 跟踪数据,每个课程每个用户可能有 4-12 行,以后这会是一件坏事吗,因为有数百个课程和数千个用户?

Is there some hard and fast rule about how big is too big for a SQL table?

We are storing SCORM tracking data in a name/value pair format and there could be anywhere from 4-12 rows per user per course, down the road is this going to be a bad thing since there are hundreds of courses and thousands of users?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

孤星 2024-07-22 10:59:38

这个神奇的数字是数十亿。 在获得数十亿行数据之前,您根本不是在谈论太多数据。

算一算。

每个课程每个用户 4-12 行,...数百个课程和数千个用户?

400,000 到 1,200,000 行。 假设每行 1000 字节。

即 400Mb 到 1.2Gb 的数据。 您可以在 Apple 商店以 299 美元的价格购买 100Gb 驱动器。 您很容易就会花费超过 299 美元的计费时间来处理那些不再重要的细节。

在达到 1Tb 的数据 (1,000 Gb) 之前,您所说的数据根本不算太多。

The magic number is billions. Until you get to billions of rows of data, you're not talking about very much data at all.

Do the math.

4-12 rows per user per course,... hundreds of courses and thousands of users?

400,000 to 1,200,000 rows. Let's assume 1000 bytes per row.

That's 400Mb to 1.2Gb of data. You can buy 100Gb drives for $299 at the Apple store. You can easily spend more than $299 of billable time sweating over details that don't much matter any more.

Until you get to 1Tb of data (1,000 Gb), you're not talking about much data at all.

思慕 2024-07-22 10:59:38

我个人曾在生产中使用过包含 5000 万行的表,这与我听说的相比很小。 您可能需要通过分区来优化结构,但在您的环境中测试系统之前,您不应该浪费时间这样做。 您所描述的内容非常小,恕我直言,

我应该补充一下,我正在使用 SQL Server 2000 和 SQL Server 2000。 2005 年,每个 DBMS 都有自己的大小限制。

I personally have had tables in production with 50 million rows, and this is small compared with I have heard. You might need to optimize your structure with partitioning but until you test your system in your environment you shouldn't waste time doing that. What you described is pretty small IMHO

I should add I was using SQL Server 2000 & 2005, each DBMS has its own sizing limitations.

貪欢 2024-07-22 10:59:38

100(课程)* 1000(用户)* 10(记录)才一百万条。 这是低端的,但一个像样的数据库应该可以很好地处理它。

听起来不太确定的是名称/值对。 这将限制您正确索引内容的能力,而这对于良好的性能至关重要。

100 (courses) * 1000 (users) * 10 (records) is only a million. That's the low end, but a decent database ought to handle it okay.

What sounds iffy are Name/Value pairs. That will limit your ability to correctly index things, which will be critical to good performance.

旧时模样 2024-07-22 10:59:38

没有硬性规定,但有一种硬性方法可以获取号码。

编写一个程序,用大致接近实际数据预期形式的虚拟数据填充表(例如,相似的规律性、字符、模式等)。使用虚拟数据的实际查询对其运行性能测试,逐渐增加行数在表中,可能以 1000 或 10000 行为步长。

当查询性能(例如每秒完成的查询)变得不可接受时,您将拥有“太大”的行数。

No hard and fast rule, but there is a hard and fast way to get a number.

Write a program to populate your table with dummy data roughly approximating the expected form of the actual data (e.g. similar regularity, characters, patterns, etc.) Run performance tests against it using actual queries with the dummy data, gradually increasing the number of rows in the table, perhaps by steps of 1000 or 10000 rows.

At the cusp of when the query performance (e.g. queries completed per second) becomes unacceptable, you'll have your "too big" number of rows.

小红帽 2024-07-22 10:59:38

我曾经开发过一个 Web 表单系统,其名称/值对表中有超过 3 亿行。 许多表单每次提交的行数超过 300 行。 实际上性能并不算太差,但查询起来简直就是一个 PITA! 在这次演出中,我的 SQL 编写能力确实得到了提高。

但恕我直言,如果您有任何意见,请摆脱它,而采用标准标准化表。

I once worked on a web form system with over 300 million rows in their name/value pair table. Many of the forms had over 300 rows per form submission. Performance wasn't too bad actually, but it was a total PITA to query from! My sql writing ability definitely improved over the life of this gig.

But IMHO, if you have any say get rid of it in favor of a standard normalized table.

故人爱我别走 2024-07-22 10:59:38

并不真地。 这完全取决于您的业务需求,并且您必须购买支持您的估计行数的产品。

Not really. It all depends on your business needs, and you'll have to buy the product that supports your estimated row count.

不再让梦枯萎 2024-07-22 10:59:38

不,关于表中可以有多少行实际上没有任何硬性规则,这在很大程度上取决于行中有多少数据以及数据的索引程度。

对您所说的数字进行快速估计会得出大约数千万行。 这当然不算太多,但足够了,如果你不小心的话,可能会成为一个问题。

也许该表可以标准化? 相同的名称是否出现很多,因此您可以将名称放在单独的表中并使用表中的 id 吗?

No, there isn't really any hard rule about how many rows you can have in a table, it depends a lot on how much data there is in the rows, and how well the data can be indexed.

A quick estimate on the figures that you stated gives something like tens of millions of rows. That's certainly not too much, but it's enough that it could be a problem if you aren't a bit careful.

Perhaps the table could be normalized? Does the same names occur a lot, so that you could put the names in a separate table and use the id in the table?

对你再特殊 2024-07-22 10:59:38

我认为这里没有真正的限制,但驱动器空间。 但是请在索引很小的时候添加好的索引,因为当表很大时,添加索引将花费更长的时间。 另外,如果索引不好,查询会随着速度的加快而变慢,当实际上没有任何问题,但索引很糟糕甚至没有索引时,人们会抱怨。

I don't think there is really a limit here, but drive space. BUT PLEASE add good indexes while its small, becuase when the table is huge indexes will take a lot longer to add. Plus if you have bad indexes queries will slow down as it gorws and people will complain when there is really nothing wrong, but a crappy to no index.

澉约 2024-07-22 10:59:38

我曾经在数据库上工作过,我们试图创建包含 2B 行数据的表 - 但这不起作用,我们达到了 500M 并重新设计。 使用如此大的表的最大问题之一是删除所需的时间 - 我经常看到将旧记录存档然后从主表中删除的方法。 如果表足够大,则在重建索引时,删除操作将运行多个小时。

不确定截止点在哪里,但直觉表明表格> > 10M 行可能太大了。 我们的方法是按日期对数据进行分区,因此我们最终得到了一个包含一周数据的表,另一个包含几个月数据的汇总表,以及另一个包含数年数据的汇总表 - 这在数据仓库中很常见。 顺便说一句,这是在 SQL 7.0 上进行的,有兴趣知道数据库是否更擅长这种类型的东西吗?

I've worked on databases where we tried to create tables with 2B rows of data - that doesn't work, we got to 500M and re-designed. One of the biggest gotchas of working with such large table was the time taken to do deletions - I often see the approach where old records are archived and then deleted from the main table. If the table is big enough that deletion will run for many hours as the indexes are rebuilt.

Not sure where the cut off is but gut feel indicates a table > 10M rows is probably too big. Our approach was to partition data by date, so we ended up with a table for a week of data, and another summary table for months, and another summary for years - very common in DataWarehousing. BTW this was on SQL 7.0, interested to know if the DB's are better at this type of stuff yet?

昔日梦未散 2024-07-22 10:59:38

你的问题提出的问题多于答案。

  • 你使用什么数据库引擎? 如果没有这个,很难给你一个好的答案。
  • 表结构是什么? 根据您的数据类型,您的表在磁盘上的布局方式将取决于此。
  • 为什么不为每个用户/课程存储一条记录? 当您存储 SCORM 数据时,我认为这意味着您正在存储标准 SCORM 数据,例如完成、成功、尝试、总时间等。无需为此创建多行。

我已经构建了一些存储 SCORM 数据的数据库,并且我从未像您建议的那样使用标签/值系统。

但您要记住的一件事是,它不是表中的行数,而是表的大小(以字节为单位)。 简单来说:

表大小 = 行大小 (avg) * 行数

要问的问题是,“表多大才算太大”?

Your question prompts more questions than answers.

  • what database engine are you using? Its hard to fashion you a good answer without this.
  • what is the table structure? Depending on your datatype, how your table will layout on disk will depend on this.
  • why not store one record per user/course? As you are storing SCORM data, I assume this means you are storing standard SCORM data like completion, success, attempts, totaltime, etc. There's no need to create multiple rows for this.

I've built a few databases storing SCORM data, and I've never had to go with a tag/value system like you suggest.

One thing you want to remember though is its not the # of rows in the table, its the SIZE (in bytes) of the table. Simply:

table size = row size (avg) * number of rows

The question to ask is, "how big a table is too big"?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文