这些表对于 SQL Server 或 Oracle 来说太大了吗?

发布于 2024-08-13 01:40:01 字数 495 浏览 6 评论 0原文

我不是一个数据库专家,所以我想要一些建议。

背景

我们当前有 4 个表存储在 Sybase IQ 中。目前我们对此没有任何选择,我们基本上只能接受别人为我们做出的决定。 Sybase IQ 是一个面向列的数据库,非常适合数据仓库。不幸的是,我的项目需要进行大量事务更新(我们更多的是操作数据库),因此我正在寻找更主流的替代方案。

问题

  1. 考虑到这些表的维度,有人会认为 SQL Server 或 Oracle 是可行的替代方案吗?

    • 表 1:172 列 * 3200 万行
    • 表 2:453 列 * 700 万行
    • 表 3:112 列 * 1300 万行
    • 表 4:147 列 * 250 万行
  2. 考虑到数据的大小,我应该做什么在数据库选择、服务器配置、内存、平台等方面需要关注吗?

I'm not much of a database guru so I would like some advice.

Background

We have 4 tables that are currently stored in Sybase IQ. We don't currently have any choice over this, we're basically stuck with what someone else decided for us. Sybase IQ is a column-oriented database that is perfect for a data warehouse. Unfortunately, my project needs to do a lot of transactional updating (we're more of an operational database) so I'm looking for more mainstream alternatives.

Question

  1. Given these tables' dimensions, would anyone consider SQL Server or Oracle to be a viable alternative?

    • Table 1 : 172 columns * 32 million rows
    • Table 2 : 453 columns * 7 million rows
    • Table 3 : 112 columns * 13 million rows
    • Table 4 : 147 columns * 2.5 million rows
  2. Given the size of data what are the things I should be concerned about in terms of database choice, server configuration, memory, platform, etc.?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(10

灼痛 2024-08-20 01:40:01

是的,两者都应该能够处理您的桌子(如果您的服务器适合的话)。但是,我会考虑稍微重新设计你的数据库。即使在对数据进行非规范化的数据仓库中,具有 453 列的表也是不正常的。

Yes, both should be able to handle your tables (if your server is suited for it). But, I would consider redesigning your database a bit. Even in a datawarehouse where you denormalize your data, a table with 453 columns is not normal.

写下不归期 2024-08-20 01:40:01

这实际上取决于列中的内容。如果有很多大的 VARCHAR 列——并且它们经常被填满接近容量——那么您可能会遇到一些问题。如果都是整数数据那么你应该没问题。

453 * 4 = 1812      # columns are 4 byte integers, row size is ~1.8k
453 * 255 = 115,515 # columns are VARCHAR(255), theoretical row size is ~112k

经验法则是行大小不应超过磁盘块大小,通常为 8k。正如您所看到的,如果您的大表完全由 4 字节整数组成,那么在这方面它不是问题,但如果它由 255 个字符的 VARCHAR 列组成,那么您可能会大大超出限制。这个 8k 限制曾经是 SQL Server 中的硬限制,但我认为现在它只是一个软限制和性能指南。

请注意,VARCHAR 列消耗的内存不一定与您为其指定的大小相称。这是最大尺寸,但它们只消耗所需的量。如果 VARCHAR 列中的实际数据始终为 3-4 个字符长,则大小将与整数列类似,无论您将它们创建为 VARCHAR(4) 还是 VARCHAR(255)。

一般规则是您希望行大小较小,以便每个磁盘块有很多行,这会减少扫描表所需的磁盘读取次数。一旦超过 8k,每行就有两次读取。

Oracle 还有另一个潜在的问题,即 ANSI 连接对连接中所有表的列总数有硬性限制。您可以通过避免使用 Oracle ANSI 连接语法来避免这种情况。 (有些等效项不会受到此错误的影响。)我不记得限制是什么或它适用于哪些版本(我认为它尚未修复)。

假设您有足够的硬件,您所讨论的行数应该完全没有问题。

It really depends on what's in the columns. If there are lots of big VARCHAR columns -- and they are frequently filled to near capacity -- then you could be in for some problems. If it's all integer data then you should be fine.

453 * 4 = 1812      # columns are 4 byte integers, row size is ~1.8k
453 * 255 = 115,515 # columns are VARCHAR(255), theoretical row size is ~112k

The rule of thumb is that row size should not exceed the disk block size, which is generally 8k. As you can see, your big table is not a problem in this regard if it consists entirely of 4-byte integers but if it consists of 255-char VARCHAR columns then you could be exceeding the limit substantially. This 8k limit used to be a hard limit in SQL Server but I think these days it's just a soft limit and performance guideline.

Note that VARCHAR columns don't necessarily consume memory commensurate with the size you specify for them. That is the max size, but they only consume as much as they need. If the actual data in the VARCHAR columns is always 3-4 chars long then size will be similar to that of integer columns regardless of whether you created them as VARCHAR(4) or VARCHAR(255).

The general rule is that you want row size to be small so that there are many rows per disk block, this reduces the number of disk reads necessary to scan the table. Once you get above 8k you have two reads per row.

Oracle has another potential problem which is that ANSI joins have a hard limit on the total number of columns in all tables in the join. You can avoid this by avoiding the Oracle ANSI join syntax. (There are equivalents that don't suffer from this bug.) I don't recall what the limit is or which versions it applies to (I don't think it's been fixed yet).

The numbers of rows you're talking about should be no problem at all, presuming you have adequate hardware.

迷迭香的记忆 2024-08-20 01:40:01

有了合适大小的硬件和 I/O 子系统来满足您的需求,两者都足够了 - 虽然您有很多列,但行数确实非常低 - 我们经常使用以数十亿而不是数百万表示的数据集。 (只是不要在 SQL 2000 上尝试:))

如果您知道您的用途和 I/O 要求,大多数 I/O 供应商会将其转换为您的硬件规格。内存、处理器等再次取决于只有您可以建模的工作负载。

With suitable sized hardware and I/O subsystem to meet your demands both are quite adequate - Wihlst you have a lot of columns the row counts are really very low - we regularily use datasets that are expressed in billions, not millions. (Just do not try it on SQL 2000 :) )

If you know your usages and I/O requirements, most I/O vendors will translate that into hardware specs for you. Memory, processors etc again is dependant on workloads that only you can model.

酒绊 2024-08-20 01:40:01

Oracle 11g对于这样的数据和结构没有任何问题。

更多信息请访问:http://neworacledba.blogspot.com/2008/05/ database-limits.html

问候。

Oracle 11g has no problems with such data and structure.

More info at: http://neworacledba.blogspot.com/2008/05/database-limits.html

Regards.

诠释孤独 2024-08-20 01:40:01

Oracle 限制

SQL Server 限制

您可能会关闭 SQL Server,具体取决于数据类型您拥有该 453 列表(请注意每行的字节数限制,但还要阅读脚注)。我知道您说过这是标准化的,但我建议您查看您的工作流程并考虑减少列数的方法。

此外,这些表足够大,因此硬件考虑因素是性能的主要问题。您需要经验丰富的 DBA 来帮助您规范和设置使用任一 RDBMS 的服务器。正确配置磁盘子系统至关重要。您可能还需要考虑表分区以帮助提高性能,但这完全取决于数据的使用方式。

Oracle limitations

SQL Server limitations

You might be close on SQL Server, depending on what data types you have in that 453 column table (note the bytes per row limitation, but also read the footnote). I know you said that this is normalized, but I suggest looking at your workflow and considering ways of reducing the column count.

Also, these tables are big enough that hardware considerations are a major issue with performance. You'll need an experienced DBA to help you spec and set up the server with either RDBMS. Properly configuring your disk subsystem will be vital. You will probably also want to consider table partitioning among other things to help with performance, but this all depends on exactly how the data is being used.

陌路终见情 2024-08-20 01:40:01

根据您在其他答案中的评论,我认为我的建议是:

1)隔离哪些数据实际更新,哪些数据或多或少只读(或不频繁)
2)将更新的数据移动到以 id 连接的单独表到更大的表(从大表中删除这些列)
3) 针对更小、更相关的表进行 OLTP 事务
4) 必要时使用内部联接连接到大表以检索数据。

正如其他人所指出的,您正在尝试让数据库同时执行 OLTP 和 OLAP,但这很困难。对于这两种情况,服务器设置都需要进行不同的调整。

SQL Server 或 Oracle 都应该可以工作。我也使用人口普查数据,我的 giganto 表有大约 300 多列。我使用 SQL Server 2005,它抱怨如果所有列都被填充到其容量,它将超过记录的最大可能大小。我们以 OLAP 方式使用人口普查数据,因此拥有这么多列并不是什么大问题。

Based on your comments in the other answers I think what I'd recommend is:

1) Isolate which data is actually updated vs. which data is more or less read only (or infrequently)
2) Move the updated data to separate tables joined on an id to the bigger tables (deleting those columns from the big tables)
3) Do your OLTP transactions against the smaller, more relational tables
4) Use inner joins to hook back up to the big tables to retrieve data when necessary.

As others have noted you are trying to make the DB do both OLTP and OLAP at the same time and that is difficult. Server settings need to be tweaked differently for either scenario.

Either SQL Server or Oracle should work. I use census data as well and my giganto table has around 300+ columns. I use SQL Server 2005 and it complains that if all the columns were to be filled to their capacity it would exceed that max possible size for a record. We use our census data in an OLAP fashion, so it isn't such a big deal to have so many columns.

女中豪杰 2024-08-20 01:40:01

您的应用程序是否更新了所有这些表中的所有列?

您可以考虑拥有白天更新的数据集市(又名操作或在线数据存储),然后在晚上将新记录迁移到主仓库中?我这样说是因为具有大量列的行的插入和更新速度会更慢,因此您可能需要考虑根据应用程序的更新要求定制特定的在线架构。

Are all of the columns in all of those tables updated by your application?

You could consider having data marts (AKA operational or online data store) that are updated during the day, and then the new records are migrated into the main warehouse at night? I say this because rows with massive amounts of columns are going to be slower to insert and update, so you may want to consider tailoring your specific online architecture to your application's update requirements.

天暗了我发光 2024-08-20 01:40:01

要求一个数据库同时充当运营和仓库系统仍然是一项艰巨的任务。我会考虑使用 SQL Server 或 Oracle 作为操作系统,并使用单独的 DW 来进行报告和分析,可能会保留您拥有的系统。

预计操作方面会进行一些表重新设计和规范化,以适应基于行的存储的每页一行限制。

如果您需要快速更新DW,您可以考虑EP for ETL 方法,而不是标准(预定)ETL。

考虑到您还处于早期阶段,请查看 Microsoft 麦迪逊项目,这是一个可自动扩展至 100 TB 的 DW 设备。他们已经运送了一些安装。

Asking one DB to act as an operational and warehouse system at the same time is still a bit of a tall order. I would consider using SQL server or Oracle for operational system and having a separate DW for reporting and analytic, probably keeping the system you have.

Expect some table re-design and normalization to happen on the operational side to fit one-row per page limitations of row-based storage.

If you need to have fast updates of the DW, you may consider EP for ETL approach, as opposed to standard (scheduled) ETL.

Considering that you are in the early stage of this, take a look at Microsoft project Madison, which is auto-scalable DW appliance up to 100s TB. They have already shipped some installations.

离线来电— 2024-08-20 01:40:01

我会非常仔细地考虑从面向列的数据库切换到关系数据库。面向列的数据库确实不足以满足运营工作的需要,因为更新速度非常慢,但它们对于报告和商业智能支持来说却绰绰有余。

通常,人们必须将操作工作拆分到一个 OLTP 数据库中,其中包含操作所需的当前活动(帐户、库存等),并使用 ETL 流程来填充数据仓库(历史记录、趋势)。面向列的 DW 几乎在任何情况下都会轻松击败关系型 DW,因此我不会轻易放弃 Sybase IQ。也许您可以使用您选择的关系产品(我会选择 SQL Server,但我有偏见)将您的系统设计为具有可操作的 OLTP 部分,并保留您现在拥有的 OLAP 部分。

I would very carefully consider switching from a column oriented database to a relational one. Column oriented databases are indeed inadequate for operational work as updates are very slow, but they are more than adequate for reporting and business intelligence support.

More often than not one has to split the operational work into a OLTP database containing the current activity needed for operations (accounts, inventory etc) and use an ETL process to populate the data warehouse (history, trends). A column oriented DW will beat hands down a relational one in almost any circumstance, so I wouldn't give up the Sybase IQ so easily. Perhaps you can design your system to have an operational OLTP side using your relational product of choice (I would choose SQL Server, but I'm biased) and keep the OLAP part you have now.

心奴独伤 2024-08-20 01:40:01

Sybase 有一款名为 RAP 的产品,它将 IQ 与 ASE(其关系数据库)的内存实例相结合,旨在帮助解决此类情况。

您的数据并没有那么庞大,以至于您无法考虑迁移到面向行的数据库,但是根据数据的结构,您最终可能会使用更多的磁盘空间并减慢多种查询的速度。

免责声明:我确实在 Sybase 工作,但目前不在 ASE/IQ/RAP 方面工作。

Sybase have a product called RAP that combines IQ with an in-memory instance of ASE (their relational database) which is designed to help in situations such as this.

Your data isn't so vast that you couldn't consider moving to a row-oriented database but, depending on the structure of the data, you could end up using considerably more disk space and slowing down many kinds of queries.

Disclaimer: I do work for Sybase but not currently on the ASE/IQ/RAP side.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文