关系数据建模与非关系数据建模

发布于 2024-11-07 06:29:05 字数 1310 浏览 5 评论 0原文

我有一个用户数据库,每个用户具有以下属性:

  • 用户
  • id
  • 名称
  • zip
  • 城市

关系数据库中,我将在表用户中对其进行建模:

  • 用户
  • id
  • 名称
  • location_id

和 的第二个表

  • 有一个名为 location: location
  • id
  • zip
  • city

location_idlocation 表中条目的外键(引用)。

优点是,如果某个城市的邮政编码发生变化,我只需更改一个条目。

让我们看看非关系数据库,比如 Google App Engine。在这里,我将按照规范对其进行建模。我有一个用户

class User(db.Model):
    name = db.StringProperty()
    zip = db.StringProperty()
    city = db.StringProperty()

优点是我不需要连接两个表,但缺点是如果邮政编码发生变化,我必须运行一个脚本来遍历所有用户条目并更新邮政编码,对吗?

Google App Engine 中还有另一个选项,即使用 ReferenceProperties。我可以有两种,userlocation

class Location(db.Model):
    zip = db.StringProperty()
    city = db.StringProperty()

class User(db.Model):
    name = db.StringProperty()
    location = db.ReferenceProperty(Location)

我有与关系数据库中完全相同的模型。

难道是我刚才做错了?

这是否会破坏非关系数据库的所有优势?

为了获取邮政编码和城市的值,我必须运行第二个查询。但在另一种情况下,要更改邮政编码,我必须遍历所有现有用户。

在像 Google 数据存储这样的非关系数据库中,这两种建模可能性有何含义?

这两种方法的典型用例是什么——我什么时候应该使用其中一种,什么时候应该使用另一种?

I have a user database with the following properties for each user:

  • user
  • id
  • name
  • zip
  • city

In a relational database I would model it in a table user:

  • user
  • id
  • name
  • location_id

and have a second table called location:

  • location
  • id
  • zip
  • city

and location_id is a foreign key (reference) to an entry in the location table.

The advantage is, if the zip code for a certain city changes I only have to change one entry.

Let's go to the non-relational database, say Google App Engine. Here I would model it like in the specifications. I have a kind user:

class User(db.Model):
    name = db.StringProperty()
    zip = db.StringProperty()
    city = db.StringProperty()

The advantage is that I don't need to join two tables, but the disadvantage is that if the zip code changes I have to run a script that goes through all user entries and updates the zip code, correct?

There is another option in Google App Engine, which is to use ReferenceProperties. I could have two kinds, user and location:

class Location(db.Model):
    zip = db.StringProperty()
    city = db.StringProperty()

class User(db.Model):
    name = db.StringProperty()
    location = db.ReferenceProperty(Location)

I have exactly the same model as in the relational database.

Is what I just did wrong?

Does that destroy all the advantages of a non-relational database?

In order to get the value of zip and city I have to run I second query. But in the other case, to make a change in the zip code I have to run through all existing users.

What are the implications of these two modeling possibilities in a non-relational database like Google's datastore?

What are typical use cases for both of them--when should I use one and when the other?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

冬天的雪花 2024-11-14 06:29:05

根据我的经验,最大的区别是非关系数据存储迫使您根据查询方式(由于缺乏联接)以及写入方式(由于事务限制)进行建模。这当然会导致模型非常非规范化。一段时间后,我开始首先定义所有查询,以避免稍后重新考虑模型。

由于关系数据库的灵活性,您可以单独考虑每个数据系列,在它们之间创建关系,并最终按照您的意愿进行查询(在很多情况下滥用联接)。

In my experience, the biggest difference is that non-relational datastores force you to model based on how you'll query, because of the lack of joins, and how you'll write, because of the transaction restrictions. This of course results in very denormalized models. After a while, I started to define all the queries first, to avoid having to rethink the models later.

Because of the flexibility of relational db's, you can think about each data family in separate, create relations between them and in the end query how you wish (abusing joins in so many cases).

千紇 2024-11-14 06:29:05

假设 GAE 有两种数据存储模式:RDMS 模式和非 RDMS 模式。
如果我以您的 ReferenceProperty 示例为例,其目的是“列出所有用户及其所有邮政编码”,并编写一些代码来打印所有这些内容。

对于[虚构的] RDMS 模式数据存储,它可能看起来像:

for user in User.all().join("location"):
    print("name: %s zip: %s" % (user.name, user.location.zip))

我们的 RDMS 系统已经处理了后台数据的非规范化,并很好地返回了我们需要的所有数据询问。这个查询确实有一点开销,因为它必须将我们的两个表拼接在一起。

对于非 RDMS 数据存储区,我们的代码可能如下所示:

for user in User.all():
    location = Location.get( user.location )†
    print("name: %s zip: %s" % (user.name, location.zip))

这里数据存储区无法帮助我们连接数据,我们必须对每个用户实体进行额外的查询在打印之前获取位置

从本质上讲,这就是您希望避免非 RDMS 系统上的数据过度规范化的原因。

现在,每个人都在某种程度上在逻辑上标准化了他们的数据,无论他们是否使用 RDMS,诀窍是为您的用例找到便利性和性能之间的权衡。

†这不是有效的应用程序引擎代码,我只是说明 user.location 将触发数据库查询。另外,没有人应该编写像上面我的极端示例那样的代码,您可以通过预先批量获取位置来解决相关实体的持续获取问题。

如果在非关系数据库中我可以建模与在关系数据库中建模完全相同的模型,那么我为什么要使用关系数据库?

关系数据库擅长存储成千上万行复杂的相互关联的数据模型,并允许您执行极其复杂的查询来重组和访问该数据。

非 RDB 擅长存储数十亿行以上的简单数据,并允许您通过更简单的查询来获取这些数据。

选择实际上应该取决于您的用例。非关系模型的更简单结构和随之而来的设计限制是 AppEngine 能够承诺根据需求扩展应用程序的主要方式之一。

Imagine that GAE has two modes for the Datastore: RDMS-mode and non-RDMS-mode.
If I take your ReferenceProperty example with the aim of "list all the users and all their zip codes" and write some code to print all of these.

For the [fictional] RDMS-mode Datastore it might look like:

for user in User.all().join("location"):
    print("name: %s zip: %s" % (user.name, user.location.zip))

Our RDMS system has handled the de-normalisation of the data behind the senes and done a nice job of returning all the data we needed in one query. This query did have a little bit of overhead as it had to stitch together our two tables.

For the non-RDMS Datastore our code might look like:

for user in User.all():
    location = Location.get( user.location )†
    print("name: %s zip: %s" % (user.name, location.zip))

Here the Datastore cannot help us join our data, and we must make an extra query for each and every user entity to fetch the location before we can print it.

This is in essence why you want to avoid overly normalised data on non-RDMS systems.

Now, everybody logically normalizes their data to some extent wether they are using RDMS or not, the trick is to find the trade off between convenience and performance for your use case.

† this is not valid appengine code, I'm just illustrating that user.location would trigger a db query. Also no-one should write code like my extreme example above, you can work around the continued fetching of related entities by say fetching locations in batches upfront.

if in a non-relation database I can model exactly the same what I can model in a relational database, why should I use a relational database at all?

relational-DB's excel at storing thousands-and-millions of rows of complex inter-related models of data, and allowing you to perform incredibly intricate queries to reform and access that data.

non-RDB's excel at storing billions+ rows of simple data and allowing you to fetch that data with simpler queries.

The choice should lie with your use-case really. The simpler structure of the non-relational model and design restraints that come with it is one of the main ways that AppEngine is able to promise to scale your app with demand.

今天小雨转甜 2024-11-14 06:29:05

您对关系数据库概念的理解是有缺陷的。关系数据库以包含一组相同类型的元组的关系来组织数据。换句话说,数据存储在表中,每行包含相同数量、相同类型、相同顺序的字段。

您提供的使用外键的示例演示了数据库规范化。这是一个可以应用于关系数据库以及其他类型数据库的概念。

抱歉,我无法回答您有关 Google 存储系统的问题,但希望这能澄清您的理解,足以找出答案。

Your understanding of the concept of the relational database is flawed. Relational databases organize their data in relations which contain a set of tuples of the same type. To rephrase, data is stored in tables with each row containing the same number of fields with the same types in the same order.

The example you provided which utilizes a foreign key demonstrates database normalization. This is a concept that can apply to relational as well as other types of databases.

Sorry, I can't answer your questions about Google's storage system, but hopefully this will clarify your understanding enough to find out.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文