是否存在永远不会使用主键列的原因？

发布于 2024-09-27 22:50:31 字数 868 浏览 9 评论 0原文

我有一个例程，将创建单独的表（Sql Server 2008）来存储我的应用程序（Asp.net 3.5）生成的报告结果。每个报告都需要自己的表格，因为表格的列会根据报告设置而有所不同。一个表将包含 10-5,000 行，很少超过 10,000 行。

将应用以下使用规则：

数据一旦存储，将永远不会更新。
每当访问表的结果时，都会检索所有数据。
没有其他表需要与该表执行联接。

知道了这一点，还有什么理由在表上创建PK索引列呢？这样做是否有助于以任何方式检索数据的性能，如果可以的话，这是否会超过插入数据时更新索引的额外负载（我知道 10K 记录是一个相对较小的量，但这个解决方案需要能够扩展）。

更新：以下是有关正在处理的数据的更多详细信息，这些详细信息涉及每个报告一个表的当前设计决策：

表将记录一组数值（根据报告设置在运行时设置））对应于一组不同的引用 varchar 值（也在运行时根据报表设置进行设置）。
每当检索数据时，在将输出显示给用户之前，都需要在服务器上进行一些后处理（因此我将始终检索所有值）。

我还会怀疑有人声称他们必须在每次运行报告时创建一个新表。然而，考虑到每次运行报告时都可能需要不同的列（包括数字、名称和数据类型），我没有看到一个很好的替代方案。

我唯一能想到的就是有一个 ID 列（标识 ReportVersionID，对应于另一个表）、ReferenceValues 列（varchar 字段，包含所有引用值，按指定的顺序，用一些分隔符分隔）和 NumericValues 列（与 ReferenceValues 相同，但对于数字），然后当我检索结果时，将所有内容放入系统中的专用对象中，根据定义的分隔符分隔值）。这看起来更可取吗？

原文

I have a routine that will be creating individual tables (Sql Server 2008) to store the results of reports generated by my application (Asp.net 3.5). Each report will need its own table, as the columns for the table would vary based on the report settings. A table will contain somewhere between 10-5,000 rows, rarely more than 10,000.

The following usage rules will apply:

Once stored, the data will never be updated.
Whenever results for the table are accessed, all data will be retrieved.
No other table will need to perform a join with this table.

Knowing this, is there any reason to create a PK index column on the table? Will doing so aid the performance of retrieving the data in any way, and if it would, would this outweigh the extra load of updating the index when inserting data (I know that 10K records is a relatively small amount, but this solution needs to be able to scale).

Update: Here are some more details on the data being processed, which goes into the current design decision of one table per report:

Tables will record a set of numeric values (set at runtime based on the report settings) that correspond to a different set of reference varchar values (also set at runtime based on the report settings).
Whenever data is retrieved, it some post-processing on the server will be required before the output can be displayed to the user (thus I will always be retrieving all values).

I would also be suspicious of someone claiming that they had to create a new table for each time the report was run. However, given that different columns (both in number, name and datatype) could conceivably be needed for every time the report was run, I don't see a great alternative.

The only other thing I can think of is to have an ID column (identifying the ReportVersionID, corresponding to another table), ReferenceValues column (varchar field, containing all Reference values, in a specified order, separated by some delimiter) and NumericValues column (same as ReferenceValues, but for the numbers), and then when I retrieve the results, put everything into specialized objects in the system, separating the values based on the defined delimiter). Does this seem preferable?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

初吻给了烟 2024-10-04 22:50:31

主键不是任何和所有数据表都必须的。确实，它们通常非常有用，放弃它们是不明智的。但是，除了速度的主要任务（我同意这无疑会受到积极影响）之外，还有独特性。为此，并重视您已经明显采取的考虑，我建议主键的唯一需要是控制表的预期唯一性。

更新：
您在评论中提到，如果您进行 PK，它将包含一个目前不存在且不需要的 Identity 列。在这种情况下，我建议完全不要PK。正如@RedFilter 指出的，代理键永远不会添加任何值。

回复收藏 0 原文

话少情深 2024-10-04 22:50:31

我会保持简单，只需将转换为 json 或 xml 的报告结果存储在 VARCHAR(MAX) 列中

回复收藏 0 原文

故人爱我别走 2024-10-04 22:50:31

数据完整性（首先是主键和外键引用）最有用且最少（明确）强调的好处之一是它强制数据和应用程序之间进行“按合同设计”；这可以阻止很多类型的错误对您的数据造成任何损害。这是一个巨大的胜利，也是一件被隐含地视为理所当然的事情（保护它的不是“数据库”，而是您指定的完整性规则；放弃这些规则会使您的数据遭受不同程度的降级）。

这对您来说似乎并不重要（因为您甚至没有讨论什么是可能的主键），并且您的数据似乎与系统的其他部分完全无关（因为您不会与任何其他表进行联接） ;但仍然 - 如果所有事情都相同，我会正确地对数据进行建模，然后如果不使用主键（或其他数据完整性规则），并且如果追求每一个最后的性能，我会考虑将它们放入生产（并测试任何实际收益）。

至于评论说创建表会影响性能 - 这是事实，但您没有告诉我们这些表的临时性如何？一旦创建，它们在废弃之前会被大量使用吗？或者您是否计划为几十个读取操作创建表。

如果您将大量使用这些表并且，如果您将提供干净的机制来管理它们（不使用时删除它们，选择它们等...）我认为动态创建表将是完美的很好（您可以分享有关表格本身的更多详细信息；用例会很好）

其他解决方案的注释：

EAV 模型

很糟糕，除非满足非常具体的条件（例如：灵活性至关重要，自动化 DDL 太麻烦）。远离它（或者非常非常善于预测您将必须处理什么类型的查询并严格验证前端数据）。

如果您将在表示层将数据作为 XML/BLOB 使用（始终读取所有行，始终写入整个“对象”，最后，如果您的表示层喜欢 XML/BLOBS），XML/BLOB 方法

可能适合您）

编辑：
另外，根据使用模式，拥有主键确实可以提高检索速度，并且如果我可以读取数据不会更新的事实，因为“它将被写入一次并读取多次”次”，那么很有可能它确实会超重更新插入索引的成本。

One of the most useful and least emphasized (explicitly) benefits of data integrity (primary keys and foreign key references to start with) is that it forces a 'design by contract' between your data and your application(s); which stops quite a lot of types of bugs from doing any damage to your data. This is such a huge win and a thing that is implicitly taken for granted (it is not 'the database' that protects it, but the integrity rules you specify; forsaking the rules you expose your data to various levels of degradation).

This seems unimportant to you (from the fact that you did not even discuss what would be a possible primary key) and your data seems quite unrelated to other parts of the system (from the fact that you will not do joins to any other tables); but still - if all things are equal I would model the data properly and then if primary keys (or other data integrity rules) are not used and if chasing every last bit of performance I would consider dropping them in production (and test for any actual gains).

As for comments that creating tables is a performance hit - that is true, but you did not tell us how temporary are these tables? Once created will they be heavily used before scrapped? Or do you plan to create tables for just dozen of read operations.

In case you will heavily use these tables and if you will provide clean mechanism for managing them (removing them when not used, selecting them, etc...) I think that dynamically creating the tables would be perfectly fine (you could have shared more details on the tables themselves; use case would be nice)

Notes on other solutions:

EAV model

is horrible unless very specific conditions are met (for example: flexibility is paramount and automating DDL is too much of a hassle). Keep away from it (or be very, very good at anticipating what kinds of queries will you have to deal with and rigorous in validating data on the front end).

XML/BLOB approach

might be the right thing for you if you will consume the data as XML/BLOBs at presentation layer (always read all of the rows, always write the whole 'object' and finally, if your presentation layer likes XML/BLOBS)

EDIT:
Also, depending on the usage patterns, having primary key can indeed increase the speed of retrieval, and if I can read the fact that the data will not be updated as 'it will be written once and read many times' then there is a good chance that it will indeed overweight the cost of updating the index on inserts.

回复收藏 0 原文