预约和行项目

发布于 2024-09-06 23:49:36 字数 1524 浏览 11 评论 0原文

我正在构建一个管理应用程序来帮助管理我的移动汽车美容公司（希望还有其他公司）。我正在努力弄清楚如何对某些数据进行建模。

这个问题与我之前发布的问题相关，但我在下面转载了相关信息：数据库设计 - google app engine

在此应用程序中，存在“约会”和“约会”的概念“行项目。”

预约是员工提供服务所需的地点和时间。

行项目是服务、费用或折扣及其相关信息。可能进入约会的行项目的示例：

Name:                          Price: Commission: Time estimate   
Full Detail, Regular Size:        160       75       3.5 hours 
$10 Off Full Detail Coupon:       -10        0         0 hours 
Premium Detail:                   220      110       4.5 hours 
Derived totals(not a line item): $370     $185       8.0 hours

在我之前实施的此应用程序中，行项目包含在单个约会中。这在大多数情况下工作得很好，但有时会引起问题。举个例子，如果预约因下雨而中途中断，技术人员必须第二天回来完成工作。这种情况需要对同一行项目进行两次预约。在这种情况下，我会通过在第二次预约上设置“行项目”来读取诸如“完成”之类的内容来稍微捏造数据，然后成本将为 0 美元。

在下一个版本中，我正在考虑使行项目与多个约会相匹配，其表结构如下所示：

Appointment
 start_time
 etc...

Line_Item
 appointment_Key_List
 name
 price
 etc...

此结构的一个普遍问题是它很复杂，我什至不确定它是否适合将一个行项目与多个约会相匹配。如果行项目只能是一个约会的一部分，那么我实际上可以在每个约会中放入一个行项目列表，当我获得约会时，我已经获得了行项目。

一个更具体的问题是，我正在使用谷歌应用程序引擎，如果我想查询一组约会及其关联的行项目，我必须首先查询这组约会，然后对该行进行第二次查询items 使用 IN 运算符来测试 Line_Item 的任何约会键是否落入从先前查询返回的约会键集中。如果我有超过 30 个键要求我对查询进行分片，第二个查询将会失败。我可以对数据进行非规范化以避免这种复杂而广泛的读取查询，并且无论如何我可能都必须在某种程度上进行非规范化，但我宁愿在适当的情况下避免复杂性。

我的问题是这种情况通常是如何建模的？一个行项目与多个预约配对是否合适，或者简单地将行项目拆分为每个预约的单独行项目是否正常，例如“两天工作的第一半”和“两天工作的第二半” ”。类似的成功应用程序是如何做到这一点的？在这种情况下，经验法则是什么？哪些实施结果证明问题较少？

谢谢！

原文

I'm building a management application to help manage my mobile auto detailing company (and hopefully others). I'm struggling to figure out how to model some of the data.

This question is related to a previous question that I've posted, but I've reproduced the relevant information below:
Database design - google app engine

In this application, there are concepts of "Appointments" and "Line Items."

Appointments are a place and time where employees are expected to be in order to deliver a service.

Line Items are a service, fee or discount and its associated information. An example of line items that might go into an appointment:

Name:                          Price: Commission: Time estimate   
Full Detail, Regular Size:        160       75       3.5 hours 
$10 Off Full Detail Coupon:       -10        0         0 hours 
Premium Detail:                   220      110       4.5 hours 
Derived totals(not a line item): $370     $185       8.0 hours

In my previous implementation of this application, Line Items were contained by a single appointment. This worked fine most of the time, but caused problems sometimes. An example would be if an appointment got interrupted half-way through because of rain and the technician had to come back out the next day and finish up. This situation required two appointments for the same line item. In cases like this, I would just fudge the data a little by setting the "line item" on the second appointment to read something like "Finish Up" and then the cost would be $0.

In this next version, I am considering enabling Line Items to be matched with more than one appointment with a table structure that looks like this:

Appointment
 start_time
 etc...

Line_Item
 appointment_Key_List
 name
 price
 etc...

A general problem with this structure is that it is complicated and I'm not even sure if its appropriate to match one line item with multiple appointments. If Line Items can only be part of one Appointment, then I can actually just put a list of line items IN each Appointment, when I get Appointments, I'd already be getting Line Items.

A more specific problem is that I am using google app engine and if I want to query for a set of appointments and their associated line items, I'd have to first query for the set of appointments and then do a second query for the line items using the IN operator to test if any of the Line_Item's appointment keys fall into the set of appointment keys the were returned from the previous query. The second query will fail if I have more than 30 keys requiring me to shard the query. I could denormalize the data to avoid this complicated and extensive read query, and I will probably have to denormalize to some degree anyway, but I'd rather avoid complexity where appropriate.

My question is how is this type of situation usually modeled? Is it even appropriate for a Line Item to be paired with more than one appointment, or is it normal to simply split line items into separate ones for each appointment such as "1st half of 2 day job" and "2nd half of two day job." How do similar successful applications do this? What are the rules of thumb in this type of situation? What implementations have turned out to be less problematic?

Thanks!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

似狗非友 2024-09-13 23:49:36

您建议的方法效果很好；您可以将订单项的“appointment_Key_list”建模为列表属性，它将按您的预期工作。您不必使用 IN 运算符 - 这是为了将数据存储中的单个值与您拥有的键列表进行匹配（例如，“WHERE datastore_column IN ('a', 'b', 'c')），而您正在执行相反的操作 - 将单个值与数据存储中的列表进行匹配，

但我建议相反的操作可能更适合您的任务：让每个约会都有一个行项目键列表。同样的方式，但要检索约会的所有数据，您首先要获取约会，然后使用约会实体中的键对行项目进行批量获取。如果您知道约会的键，那么您就可以了。从而避免了进行任何查询的需要，

我一直在试图向 Pindatjuh 解释为什么查询列表属性的效率并不比单值属性低，但显然需要更详细的描述，所以无需多说。，这里是...

关于 App Engine 数据存储区索引的简要入门

尽管 Python 和 Java 为数据存储区提供了各种高级接口，但数据存储区本身采用了较低级别的抽象，称为实体。实体由以下部分组成：

唯一的主键
（名称、值）对的列表

主键是您已经熟悉的数据存储区键。（名称、值）对的列表是 App Engine 对实体中数据的表示。到目前为止就这么简单。具有以下值的实体：

a_string = "Hello, world"
an_int = 123

将被序列化为类似以下内容：

[('a_string', 'Hello, world'), ('an_int', 123)]

但这如何与列表交互？嗯，列表被视为“多值”属性。也就是说，包含 n 个项目的列表存储为 n 个单独的属性。一个例子可能会让这一点更清楚：

a_string = "Hello, world"
an_int = 123
a_list_of_ints = [42, 314, 9]

将被序列化为：

[('a_string', 'Hello, world'), ('an_int', 123), ('a_list_of_ints', 42), ('a_list_of_ints', 314), ('a_list_of_ints', 9)]

如您所见，列表表示一系列值，所有值都具有相同的名称。当您从数据存储区加载数据时，SDK 会看到重复的值并将其转换为列表。

当它与索引交互时，这一点变得很重要。假设您在“a_string”和“an_int”上有一个索引。当您插入或修改一个值时，App Engine 会为其生成一组索引条目；对于上述索引和上述实体，它会在索引中生成一行，如下所示：（

('Hello, world', 123, a_key)

此处的“a_key”是原始实体的键的占位符。）当您执行使用此索引的查询时，它只需要对索引进行查找以查找具有适当前缀的行（例如，'SELECT * FROM Kind WHERE a_string = "Hello, world" ORDER BY an_int'）。

但是，当您为列表建立索引时，App Engine 会插入多个索引行。 “an_int”和“a_list_of_ints”上的索引将为上述实体生成这些行：

(123, 42, a_key)
(123, 314, a_key)
(123, 9, a_key)

同样，查询的工作方式与以前相同 - App Engine 只需在索引中查找具有正确前缀的行。列表中的条目数对查询速度没有影响，只影响生成和写入索引条目所需的时间。事实上，查询规划器不知道“a_list_of_ints”是一个多值属性 - 它只是将其视为任何其他索引条目。

简而言之：

在索引和查询术语中，包含一个元素的列表与单个属性之间没有实际区别。
索引列表的大小会影响索引所需的时间和空间，但不会影响查询所需的时间和空间。
您可以使用简单的等式过滤器执行查询，以匹配列表中具有给定值的任何实体。

The approach you're suggesting will work fine; you can model the line item's 'appointment_Key_list' as a list property and it will work as you expect. You don't have to use the IN operator - that's for matching a single value in the datastore against a list of keys you have (eg, "WHERE datastore_column IN ('a', 'b', 'c')), while you're doing the reverse - matching a single value against a list in the datastore.

I would suggest, though, that the reverse might be better suited to your task: Have each Appointment have a list of line item keys. This operates much the same way, but to retrieve all the data on an appointment, you instead first fetch the appointment, then do a bulk get on the line items, using the keys from the Appointment entity. If you know the key of the Appointment, you're thus avoiding the need to do any queries at all.

I've been trying to explain to Pindatjuh why querying a list property is no less efficient than a singly-valued one, but apparrently a more detailed description is required, so without any further ado, here is...

A brief primer on App Engine Datastore indexing

Although Python and Java provide various high level interfaces to the datastore, the datastore itself speaks a lower-level abstraction, called entities. An entity consist of the following:

A unique primary key
A list of (name, value) pairs

The primary key is the Datastore key you're already familiar with. The list of (name, value) pairs is App Engine's representation for the data in your entity. So far so straightforward. An entity with the following values:

a_string = "Hello, world"
an_int = 123

would be serialized to something resembling this:

[('a_string', 'Hello, world'), ('an_int', 123)]

But how does this interact with lists? Well, lists are treated as 'multiply valued' properties. That is, a list with n items is stored as n separate properties. An example probably makes this clearer:

a_string = "Hello, world"
an_int = 123
a_list_of_ints = [42, 314, 9]

will be serialized as:

[('a_string', 'Hello, world'), ('an_int', 123), ('a_list_of_ints', 42), ('a_list_of_ints', 314), ('a_list_of_ints', 9)]

As you can see, the list gets represented a series of values, all with the same name. When you load data from the datastore, the SDK sees the repeated value and turns it into a list.

Where this gets important is when it interacts with indexing. Suppose you have an index on 'a_string' and 'an_int'. When you insert or modify a value, App Engine generates a set of index entries for it; for the above index and the above entity, it generates a single row in the index that looks something like this:

('Hello, world', 123, a_key)

('a_key' here is a placeholder for the key of the original entity.) When you do a query that uses this index, it just needs to do a lookup on the index to find rows with the appropriate prefix (Eg, 'SELECT * FROM Kind WHERE a_string = "Hello, world" ORDER BY an_int').

When you index a list, though, App Engine inserts multiple index rows. An index on 'an_int' and 'a_list_of_ints' would generate these rows for the above entity:

(123, 42, a_key)
(123, 314, a_key)
(123, 9, a_key)

Again, querying works the same as it did previously - App Engine just has to look up the row with the correct prefix in the index. The number of entries in the list has no impact on how fast the query is - only on how long it took to generate and write the index entries. In fact, the query planner has no idea that 'a_list_of_ints' is a multiply valued property - it just treats it like any other index entry.

So in a nutshell:

There's no practical difference between a list with one element in it and an individual property, in indexing and querying terms
The size of an indexed list affects the time and space required for indexing, but not for querying.
You can do a query that matches any entity with a given value in a list using a simple equality filter.

回复收藏 0 原文

紫竹語嫣☆ 2024-09-13 23:49:36

解决此类问题的通常方法是对模型进行标准化，即采用第一范式。

您的模型（采用标准化形式）将有第三个表，其中引用了 Appointment 和 Line_Item 行：

Appointment
 start_time
 ...

Line_Item
 name
 price
 ...

Appointment_Line_Item
 appointment_key
 line_item_key

但是，有一个问题！由于您使用的是 Google App Engine，其数据存储区非常有限(" GQL 无法执行类似 SQL 的 JOIN”），并且大多需要非规范化。

您建议使用类似列表的字段。可以使用它，但是很难对其进行索引。在数据库中每行的列表中搜索键（appointment_key）并没有真正执行。我提出两种可能性：

重复Line_Item。
<前><代码>Line_Item
预约键
姓名
价格
完成的
...
当员工完成或未完成该项目时，Line_Item 应该具有完成 状态。如果员工尚未完成所有行项目，请将其标记为未完成，创建新约会并复制所有未完成的项目。您可以对所有 Line_Items 上的 appointment_key 字段建立索引，这是一件好事。但是，重复数据可能会出现问题。
Line_Item 的动态字段：
<前><代码>Line_Item
重复键
预约键
姓名
价格
完成的
...
为 Line_Item 创建一个新字段 duplicate_key，该字段指向另一个 Line_Item 或 null（保留此键！）。 Null 表示 Line_Item 是原始的，任何其他值表示此 Line_Item 是字段指向的 Line_Item 的重复项。 Line_Item 标记为重复项的所有字段都会继承原始 Line_Item 的字段（appointment_key 除外）：因此占用的存储空间更少。此外，此解决方案应该有 appointment_key 索引，以加快查找时间。这需要每个重复的 Line_Item 进行一次额外的查询，这可能会产生问题。

现在，这是一个明确的选择：要么更快的速度，要么更好的存储。我会选择第一个，因为它降低了模型的复杂性，并且存储对于现代系统来说从来不是问题。较低的复杂性通常意味着较少的错误和较低的开发/测试成本，这证明了存储需求的成本是合理的。

The usual solution for this kind of problems is normalizing the model, i.e. to the First Normal Form.

Your model, in normalized form, would have a third table, with references to the Appointment and Line_Item rows:

Appointment
 start_time
 ...

Line_Item
 name
 price
 ...

Appointment_Line_Item
 appointment_key
 line_item_key

There is a problem however! Since you are using Google App Engine, and their Datastore is quite limited ("GQL cannot perform an SQL-like JOIN") and mostly requires denormalization.

You suggested using a list-like field. It is a possiblity to use this, but it is very hard to index it. Searching for a key (the appointment_key) in a list per row in the database is not really performing. I propose two possiblities:

Duplicate Line_Item.
```
Line_Item
 appointment_key
 name
 price
 finished
 ...
```
A Line_Item should have a finished state, when the item was finished or not by the employee. If an employee hadn't finished all line items, mark them as unfinished, create a new appointment and copy all items that were unfinished. You can index on the appointment_key field on all Line_Items, which is a Good Thing. However, the duplicated data may be a problem.
Dynamic fields for Line_Item:
```
Line_Item
 duplicate_key
 appointment_key
 name
 price
 finished
 ...
```
Create a new field, duplicate_key, for Line_Item which points to another Line_Item or to null (reserve this key!). Null means that the Line_Item is original, any other value means that this Line_Item is a duplicate of the Line_Item the field points to. All fields of Line_Item marked as a duplicate inherit the fields of the original Line_Item, except the appointment_key: so it will take less storage. Also this solution should have appointment_key indexed, to speed up lookup times. This requires one additional query per duplicated Line_Item, which may be a problem.

Now, it's a clear choice: either better speed or better storage. I would go for the first, as it reduces complexity of your model, and storage is never a problem with modern systems. Less complexity generally means less bugs and less development/testing costs, which justifies the cost of the storage requirement.

回复收藏 0 原文

~没有更多了~