通过唯一字符串列值解决相关实体(SQL,实体框架核心)

发布于 2025-01-21 17:49:29 字数 1428 浏览 2 评论 0原文

在重构我目前正在进行的项目的代码时,我一直想知道是否可以更优雅地完成某种类型的实体/更新SQL数据库。

我正在处理的模块的 tldr:将大量数据从弱结构化的数据存储(Google表)定期同步到关系数据库中。源存储是数据库-Nostic,不包含数据库ID或外键。数据库(EF核心代码首先)使用标准增量整数ID和外键。

我的问题: Google表中的表很大程度上类似于数据库表。相关实体由(通过设计)唯一字符串值(“名称”)引用。在插入和更新数据库时,这些天然名称需要解决到人工外键。到目前为止,该过程是通过将所有相关实体的ID和名称加载到内存中,找到匹配实体,然后使用数据库中的外国键保存依赖实体来完成此过程。

虽然此过程效果很好,但它需要更多数据库往返(如果相关实体被一个人查找)或一定数量的内存才能将所有[名称,ID]对保存在内存中,并且还将数据库索引呈现名称列无用,因为搜索是在内存中完成的。


例如,来自我的数据模型的摘录:

项目,带有列

  • ID(主键,唯一,群集索引)
  • 名称(字符串,唯一,索引)

位置 ,具有列

  • 键,唯一,群集索引)
  • 名称(字符串,唯一,索引)

  • ID(主 ,唯一的,群集的索引)
  • locationId(外键参考位置,索引)
  • itemID(外键引用项目,索引索引)

我导入的Google Sheep,该表有以下表格:

Sheep ittem ,带有“名称”列。 (字符串)

位置,带有列“名称”(字符串)

pote_items ,带有列“位置名称”(字符串)和“ itemName)和“ itemName”(string)


问题: 假设所有项目和位置都已进口,并获得了自动插入ID,我该如何最优雅 /有效地插入所有放置的项目?

最引人入胜的解决方案是某种方法是在一个往返中分别在数据库上分别将名称完全解析为ID,而无需将任何项目 /位置数据加载到内存中。

对我天真的自我而言​​,这似乎是一种问题,我不是第一个试图解决它的人,并且已经有一个聪明的解决方案,我还没有看到。


  • 我想坚持数据库中的人造ID 尽管字符串是独特的,但它们是为被人类编辑而设计的值,因此可读性是这些字符串的主要问题,而不是DB生成的ID的任何效率。进行读取以及对数据的任何进一步操作时,ID被用作标识和独特性的主要来源。
  • 除了实体框架核心外,我不会解决这个问题(但是如果有这个问题感到惊讶!),所以我也对使用RAW SQL,存储程序和类似的解决方案非常感兴趣

While refactoring the code for a project I am currently working on I have been wondering about whether a certain kind of entity insert/update into an SQL database could be done more elegantly.

TLDR of the module I am working on: A large amount of data is regularily synchronized from a weakly structured data store (Google Sheets) into a relational database. The source store is database-agnostic and contains no database IDs or foreign keys. The database (EF Core code first) uses standard incremental integer IDs and foreign keys.

My problem: The sheets in the Google Sheets largely resemble the database tables. Related entities are referenced by a (by design) unique string value ("Name"). These natural names need to be resolved to the artificial foreign keys when inserting and updating to the database. So far, this process is done by loading the IDs and names of all related entities into memory, finding the matching entity, and then saving the dependant entity with the found foreign keys to the database.

While this process works well, it requires either many more database roundtrips (if the related entity is looked up one by one) or a decent amount of memory to keep all [Name, Id] pairs in memory, and also rendering the database index on the name column useless as the searching is done in memory.


As an example, an excerpt from my data model:

Table Items, with columns

  • Id (Primary Key, unique, clustered index)
  • Name (String, unique, indexed)

Table Locations, with columns

  • Id (Primary Key, unique, clustered index)
  • Name (String, unique, indexed)

Table PlacedItems (Specific Items, which are placed in a specific location), with columns:

  • Id (Primary Key, unique, clustered index)
  • LocationId (Foreign Key referencing Locations, indexed)
  • ItemId (Foreign Key referencing Items, indexed)

The Google Sheet which I import has the following sheets:

Sheet items, with column "Name" (string)

Sheet locations, with column "Name" (string)

Sheet placed_items, with columns "LocationName" (string) and "ItemName" (string)


My question:
Assuming, all Items and Locations have been imported, and have been given auto-incremented IDs, how do I most elegantly / efficiently insert all the Placed Items?

The most intrigueing kind of solution would be some way to resolve the Names to Ids entirely on the database, respectively in one round trip, without having to load any item / location data into memory.

To my naïve self it seems like this is a type of problem that I am not the first that tries to solve it, and which has a smart solution already that I am just not seeing yet.


Additional points

  • I'd like to stick with the artificial IDs in the database. While the strings are unique, they are values designed for being edited by humans and are thus readibility is the prime concern for these strings, and not any kind of efficiency as the DB-generated IDs probably are. When doing reads and any further operations on the data, the IDs are used as primary source for identity and uniqueness of records.
  • I don't except Entity Framework Core to have a solution to this problem (but surprise me if it has!), so I am also very interested in solutions with raw SQL, stored procedures and similar

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文