当前位置：文江博客话题详情

动态数据库/键-值/实体-键值困境

发布于 2024-12-22 08:28:38 字数 1194 浏览 1 评论 0原文

我多年来一直在对关系数据库进行编程，但现在遇到了一个不寻常且棘手的问题：

我正在构建一个应用程序，需要具有非常快速且易于定义的实体（由用户）。然后可以创建、更新、删除这些实体的实例。

我可以想到两个选项。

选项 1 - 动态创建表

第一个选项是编写一个引擎来动态生成表，并将数据插入其中。然而，这将变得非常棘手，因为每个查询也需要是动态的，或者至少是动态创建的存储过程等。

选项 2 - 实体 - 键 - 值模式

这是我唯一现实的选择可以想到，我有5个表结构：

EntityTypes

EntityTypeID int

EntityTypeName nvarchar(50)

Entities

EntityID int

EntityTypeID int

FieldTypes

FieldTypeID int

FieldTypeName nvarchar(50)

SQLtype int

FieldValues

EntityID int

FIeldID int

Value nvarchar(MAX)

Fields

FieldID int

FieldName nvarchar(50)

FieldTypeID int

“FieldValues”表的工作方式有点像数据仓库事实表，我所有的插入/更新都可以通过填充“键/值”表值参数并将其传递给 SPROC（以避免多次插入/更新）。

所有的表都会被大量索引，我最终会进行许多自连接来获取数据。

我读过很多关于键/值数据库有多糟糕的文章，但对于这个问题它似乎仍然是最好的。

现在我的问题！

除了这两个选项之外，任何人都可以建议另一种方法或模式吗？
对于中等规模的数据集（最多 100 万行），选项二是否可行？
我可以使用选项 2 进行进一步优化吗？

非常感谢任何指导和建议！

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

溺ぐ爱和你が 2024-12-29 08:28:38

就我个人而言，我只会使用“noSQL”（键/值）数据库，例如 MongoDB。

但如果您需要使用关系数据库，则选择 2。这种模型的一个很好的例子是 Alfresco 数据字典（Alfresco 是一个企业内容管理系统）。它的设计与您所描述的类似，尽管它们具有多个字段值列（对于数据库中可用的每个简单类型）。如果您添加一个好的缓存系统（例如 Ehcache），它应该可以正常工作。

回复收藏 0 原文

剩余の解释 2024-12-29 08:28:38

正如其他人建议使用 NoSQL 一样，我想说的是，在我看来，无模式数据库确实最适合没有模式的用例。

从描述和您提出的架构来看，您的案例实际上并不是“无架构”，而是似乎是“用户定义的架构”。

事实上，您提出的模式看起来与关系数据库的内部元模式非常相似。（你有点在关系数据库之上构建一个关系数据库，根据我的经验，这不是一个好主意，因为这个“元数据库”对于任何基本操作来说至少有两倍的开销和复杂性 - 表将变得非常大，这不能很好地扩展，并且数据将难以查询和更新，问题将难以调试，等等。）

对于这样的用例，您可能需要 DDL：数据定义语言。

您没有说明您正在使用哪个 SQL 数据库，但大多数 SQL 数据库（例如 MySQL、PostgreSQL 和 MS-SQL）都支持 SQL 语法的 DDL 扩展的某些方言，这使您可以操作实际的架构。

我过去已经针对像您这样的用例成功地完成了此操作。它适用于架构很少更改且每个用户的数据量相对较低的情况。（对于大量或频繁的架构更新，您可能需要无架构或某种其他类型的 NoSQL 数据库。）

您可能需要一些侧面的表来存储不适合 SQL 架构的其他字段信息 - 您可能想要复制某些架构那里也有信息，因为从实际模式中读回可能很困难或效率低下。

确保对字段信息表和模式的原子更新可能需要事务，而您的数据库引擎可能不支持事务 - PostgreSQL 至少支持事务模式更新。

在安全方面您必须保持警惕 - 您不想让用户创建、存储或删除他们不应该做的事情。

如果它适合您的用例，请考虑不仅使用单独的表，还考虑使用单独的数据库，这些数据库也可以使用 DDL 按需创建和销毁。如果每个客户都拥有不能、不应该或不需要跨客户查询的数据集合的所有权，则这可能适用。（可以说，这些很少见 - 通常，您至少需要跨客户进行分析或某些东西，但在某些情况下，每个客户“拥有”一个独立的、托管的 wiki、商店或某种 CMS/DMS。）

（我在您的文章中看到评论说您已经决定使用 NoSQL，因此为了完整性起见，请在此处发布此选项。）

As others have suggested NoSQL, I'm going to say that, in my opinion, schemaless databases really is best suited for use-cases with no schema.

From the description, and the schema you came up with, it looks like your case is not in fact "no schema", but rather it seems to be "user-defined schema".

In fact, the schema you came up with looks very similar to the internal meta-schema of a relational database. (You're sort of building a relational database on top of a relational database, which in my experience is not a good idea, as this "meta-database" will have at least twice the overhead and complexity for any basic operation - tables will get very large, which doesn't scale well, and the data will be difficult to query and update, problems will be difficult to debug, and so on.)

For use-cases like that, you probably want DDL: Data Definition Language.

You didn't say which SQL database you're using, but most SQL databases (such as MySQL, PostgreSQL and MS-SQL) support some dialect of DDL extensions to SQL syntax, which let you manipulate the actual schema.

I've done this successfully for use-cases like yours in the past. It works well for cases where the schema rarely changes, and the data volumes are relatively low for each user. (For high volumes or frequent schema updates, you might want schemaless or some other type of NoSQL database.)

You might need some tables on the side for additional field information that doesn't fit in SQL schema - you may want to duplicate some schema information there as well, as this can be difficult or inefficient to read back from actual schema.

Ensuring atomic updates to your field information tables and the schema probably requires transactions, which may not be supported by your database engine - PostgreSQL at least does support transactional schema updates.

You have to be vigilant when it comes to security - you don't want to open yourself up to users creating, storing or deleting things they're not supposed to.

If it suits your use-case, consider using not only separate tables, but separate databases, which can also by created and destroyed on demand using DDL. This could be applicable if each customer has ownership of data collections that can't, shouldn't, or don't need to be queried across customers. (Arguably, these are rare - typically, you want at least analytics or something across customers, but there are cases where each customer "owns" an isolated, hosted wiki, shop or CMS/DMS of some sort.)

(I saw in your comment that you already decided on NoSQL, so just posting this option here for completeness.)

回复收藏 0 原文

最美不过初阳 2024-12-29 08:28:38

听起来这可能是寻找问题的解决方案。您的域有可能被重构吗？如果没有——还有希望。

选项 2 的可扩展性很大程度上取决于自定义对象的宽度。可以动态创建多少个字段？当每个实体有 100 个字段时，100 万个实体可能会很麻烦...高效的索引可以使性能可以承受。
对于另一种选择 - 您可以拥有一个数据表，其中包含一些字符串字段、一些双精度字段和一些整数字段。例如，包含 String1、String2、String3、Int1、Int2、Int3 的表。第二个表包含定义用户对象并映射“CustomObjectName”=> 的行。 String1，等等。读取 INFORMATION_SCHEMA 的存储过程和一些动态 sql 将能够读取模式表并返回强类型记录集...
另一个选项（对于最新版本的 SQL Server）是存储带有 id 的行，类型名称和包含对象数据的 XML 文档的 XML 字段。在 MS Sql Server 中，可以直接查询，甚至可以根据架构进行验证。