如何最好地减少主键值?

发布于 2024-12-02 06:54:24 字数 345 浏览 1 评论 0原文

我正在开发一个应用程序(.Net),支持Oracle、Sql Server 和Informix 作为数据存储库。 Informix 的一个问题是一个表(这是遗留的东西)有一个 2048 个字符的主键,并且 Informix 不允许使用这种宽度的 PK。所以我最初的解决方案是让应用程序从键值派生 MD5 值,并在插入或查找数据时将其用作主键。好吧,这可行,但给我留下了“升级”现有数据库中数据的直接问题,由于各种原因,这必须通过 Sql 脚本来完成。遗憾的是,Informix 没有内置 MD5 函数,因此我很难编写 Sql 脚本来创建新的 PK 列并从现有数据填充它。

所以我的问题是:任何人都可以提出一种更好的方法来显着压缩长字符串值,从而避免这个问题吗?

I am developing an application (.Net) that supports Oracle, Sql Server and Informix as the data repositories. A problem with Informix is that one table (this is legacy stuff) has a primary key of 2048 characters, and Informix will not allow a PK of this width. So my initial solution is to get the app to derive an MD5 value from the key value and use that as the primary key when inserting or looking up data. Okay that works, but leaves me the immediate problem of 'upgrading' the data in existing databases, which for various reasons has to be done by means of a Sql script. Sadly Informix does not have a built-in MD5 function so I will be hard pressed to write a Sql script to create the new PK column and populate it from the existing data.

So my question is: can anyone suggest a better way of significantly compressing a long string value, that will avoid this problem?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

樱娆 2024-12-09 06:54:24

您的方法存在缺陷,因为 PK 的定义必须是唯一的,并且 MD5 可能会产生冲突(重复)。

相反,请考虑使用代理 PK(例如身份或 GUID)。

任何人都可以建议一种更好的方法来显着压缩长字符串值,从而避免此问题

根据定义,您不能压缩任意字符串并保持唯一性。显然,如果字符串具有您了解的某种结构,您可以使用这些知识来创建特定于应用程序的压缩算法。

回应评论:

我也有代理键的问题,它与存储的日期无关 - 糟糕的数据库设计

我知道代理键与自然键是一个有争议的话题,但您提出的 MD5 哈希值本质上肯定是代理键?无论如何,“所有设计都是权衡的”,所以如果没有一些上下文,我不会将数据库设计描述为“糟糕”。恕我直言,如果没有
如果自然键短于 2048 个字符,则代理键可能是一个不错的选择。

还需要考虑性能权衡:使用 MD5 或 GUID 代理 PK,您可能会出现页面拆分,因为新行将插入到表的中间,而不是身份 PK 的末尾。

根据什么定义?

关键词是“任意”。 ZIP 等无损压缩算法不能保证所有输入都达到给定的压缩比 - 想象一下尝试对 ZIP 存档进行压缩。

Your approach is flawed as a PK must be definition be unique, and MD5 may generate collisions (duplicates).

Instead consider using a surrogate PK (e.g. identity or GUID).

can anyone suggest a better way of significantly compressing a long string value, that will avoid this problem

By definition you can't compress arbitrary strings and maintain uniqueness. Obviously if the strings have some structure that you know about, you could use this knowledge to create an application-specific compression algorithm.

In response to comments:

I also have a problem with surrogate keys, which bear no relation to the date being stored - bad database design

I know surrogate vs natural keys is a contentious subject, but surely your proposed MD5 hash is essentially a surrogate key? And in any case "all design is trade-off" so I wouldn't describe a database design as "bad" without some context. IMHO if there is no
natural key shorter than 2048 characters, a surrogate key may well be a good option.

There are also performance tradeoffs to consider: with an MD5 or GUID surrogate PK, you have the potential for page splits as new rows will be inserted in the middle of the table vs at the end for an Identity PK.

By what definition?

The key word is 'arbitrary'. A non-lossy compression algorithm such as ZIP does not guarantee to achieve a given compression ratio on all inputs - think of attempting to ZIP a ZIP archive.

胡渣熟男 2024-12-09 06:54:24

在 Informix 中,如果您创建具有大页面大小的 dbspace(需要使用 12、14 或 16 KiB 页面),则可以在该 dbspace 中最多大约 3 KiB 的键上创建索引(根据经验,5 个键值)必须适合一页索引页)。

但礼貌地说,这么大的钥匙可能效率不高。我很想知道 PK 中列的细分以及为什么它们必须如此之大以至于加起来达到 2 KiB。你不能使用某种代理吗?

In Informix, if you create a dbspace with large page sizes (you'd need to use 12, 14 or 16 KiB pages), you can create indexes on keys up to about 3 KiB in that dbspace (rule of thumb, 5 key values must fit on one index page).

But a key that big is probably not very efficient, to be polite about it. I'd be curious to see the breakdown of the columns in the PK and why they have to be so big that they add up to 2 KiB. Can you not use a surrogate of some sort?

行雁书 2024-12-09 06:54:24

我认为您可以将密钥拆分为两个部分并将该部分存储在两列中,例如“id1”,“id2”。然后就可以创建复合主键了。

I think you can split key on two parts and store that parts in two columns, something like "id1", "id2". And then you can create composite primary key.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文