将排序后的加载文件加载到新表中时是否需要 CLUSTER INDEX?

发布于 2024-09-12 14:30:55 字数 886 浏览 9 评论 0原文

INFORMIX-SE:

我的用户定期运行 SQL 脚本 [REORG.SQL],该脚本按排序顺序将表中的所有行卸载到两个单独的文件(活动文件和非活动文件),删除表,重新创建表,加载排序后的加载文件回到它,在我排序卸载文件的同一列上创建一个集群索引,创建其他支持索引并更新其统计信息。

(请参阅 REORG.SQL 脚本:SE:'bcheck -y'异常

(另请参阅: customer.pk_name 加入交易.fk_name 与 customer.pk_id [serial] 加入 transactions.fk_id [integer] 原因是集群索引是按名称而不是 pk_id[serial]=fk_id[int])

使用我的 REORG.SQL 脚本,我'一直存在索引文件一致性问题,因此我怀疑 CLUSTER INDEX 与此有关,并创建了没有集群的索引,然后问题就消失了!

现在我的问题是:如果我设法将所有交易数据(按客户全名排序)加载到新创建的表中,我是否真的有必要创建一个集群索引,而实际上行已经按相同的顺序排序集群将实现什么?..我知道随着新行的添加,集群索引开始失去集群,那么创建集群索引的优点是什么?..查询优化器是否利用集群与非集群的优势当行基本上处于相同的集群顺序时索引?..有人在集群表时遇到 IDX/DAT 文件问题吗?..也许我的 SQL 脚本有问题? (请检查我的 SQL 脚本代码,看看我是否做错了什么?)

INFORMIX-SE:

My users periodically run an SQL script [REORG.SQL] which unloads all rows from a table in sorted order to two separate files (actives and inactives), drops the table, re-creates the table, loads the sorted loadfiles back into it, creates a cluster index on the same column I sorted my unload files by, creates other supporting indexes and updates its statistics.

(See REORG.SQL script at: SE: 'bcheck -y' anomaly)

(Also see: customer.pk_name joining transactions.fk_name vs. customer.pk_id [serial] joining transactions.fk_id [integer] for reason why cluster index is by name and not pk_id[serial]=fk_id[int])

With my REORG.SQL script, I've been having index file consistency problems so I suspected the CLUSTER INDEX had something to do with it and created the index with no clustering and the problems went away!

Now my question is: If I manage to load all my transaction data, sorted by the customers full name into a newly created table, is it really necessary for me to create a CLUSTER INDEX when in fact the rows are already sorted in the same order that the clustering would accomplish?.. I know that a clustered index starts loosing its clustering as new rows are added, so what's the advantage of creating a cluster index?.. does the query optimizer take advantage of clustering vs. a non-clustered index when the rows are essentially in the same clustered order?.. Has anyone encountered IDX/DAT file problems when clustering a table?.. Perhaps my SQL script has something wrong with it? (PLEASE REVIEW MY SQL SCRIPT CODE TO SEE IF I'm DOING SOMETHING WRONG?)

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

御守 2024-09-19 14:30:55

该脚本将活动和非活动事务卸载到两个不同的文件,每个文件按客户名称排序。然后它将它们加载回表中,首先是活动事务,然后是非活动事务。然后根据客户名称创建聚集索引。问题是,在构建聚集索引时,数据库现在必须返回并按客户名称对物理行重新排序。尽管每个卸载文件都是按客户名称单独排序的,但是当将两个文件放在一起时,结果并不是按客户名称排序,从而导致数据库承担更多工作。除非其他地方需要活动和非活动事务的单独文件,否则您可以尝试将所有事务转储到单个文件中,按客户名称排序,然后从该单个文件重新加载表。此时,表中的数据将按客户名称排序,并且创建的聚集索引不必对数据进行重新排序。

至于是否真的需要聚集索引 - 如果您使用该列进行查询,则聚集索引可能很有价值,因为它应该有助于减少获取数据所需的 I/O 数量。通常聚集索引是在单调增加的列上创建的,因此也许 TRX_NUM 可以很好地用作聚集索引上命名的列。

分享并享受。

The script unloads the active and inactive transactions to two different files, with each file sorted by customer name. It then loads them back into the table, active transactions first, followed by inactive transactions. A clustered index is then created on customer name. The problem is that the database now has to go back and re-order the physical rows by customer name when building the clustered index. Although each of the unload files are separately ordered by customer name, when the two are put together the result is not ordered by customer name, causing more work for the database. Unless the separate files for active and inactive transactions are needed elsewhere you might try just dumping all the transactions to a single file, ordered by customer name, and then re-load the table from that single file. At that point the data in the table would be ordered by customer name and the clustered index create wouldn't have to do the re-ordering of the data.

As to whether or not the clustered index is really needed - a clustered index can be of value if you use that column to query with as it should help to reduce the number of I/O's needed to fetch the data. Usually clustered indexes are created on columns which increase monotonically so perhaps TRX_NUM would serve well as the column to be named on the clustered index.

Share and enjoy.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文