customer.pk_name 加入 transactions.fk_name 与 customer.pk_id [序列] 加入 transactions.fk_id [整数]

发布于 2024-09-06 06:29:53 字数 5371 浏览 1 评论 0原文

当铺应用程序(任何 RDBMS):

一对多关系,其中每个客户(主)可以有许多交易(细节)。

customer(
id serial,
pk_name char(30), {PATERNAL-NAME MATERNAL-NAME, FIRST-NAME MIDDLE-NAME-INITIAL}
[...]
);
unique index on id;
unique cluster index on pk_name;


transaction(
fk_name char(30),
tran_type char(1), 
ticket_number serial,
[...]
);
dups cluster index on fk_name;
unique index on ticket_number; 

有几个人告诉我,这不是将大师与细节结合起来的正确方法。他们说我应该始终将 customer.id[serial] 连接到 transactions.id[integer]。

当顾客典当商品时,店员使用姓名通配符向主人询问。查询通常会返回几个客户,店员滚动直到找到正确的姓名,输入“D”更改为详细交易表,自动查询所有交易,然后店员输入“A”添加新交易。

使用 customer.id 连接 transaction.id 的问题是,虽然 customer 表是按排序名称顺序维护的,但按 fk_id 组对 transaction 表进行聚类 交易通过 fk_id 进行,但它们与客户名称的顺序不同,因此当店员在主数据库中滚动客户名称时,系统必须跳转到整个位置以找到属于每个客户的集群交易。添加每个新客户时,下一个 ID 会分配给该客户,但新客户不会按字母顺序显示。我尝试使用 id join 并确认性能下降。

使用名称联接与 ID 联接的缺点是,如果您更改客户名称,则与其交易的联接将被切断,因此我不允许更新名称。无论如何,多久需要更改一次客户姓名?另一个缺点是名称需要 30 个字符,其中 id 为 INT,因此 .dat 和 .idx 较大。每天早上执行一个 sql 过程,按排序的名称顺序卸载客户和事务,删除/重新创建表,加载卸载的数据,并重新创建所有索引,从而保持性能优化。

如果事务没有名称列,如何使用 id 连接而不是名称连接并仍然按名称保留集群事务顺序?

以下是使用 pk/fk 名称时数据如何位于 customer.dat 和 transactions.dat 中的示例,如上述架构中所述:

customer.id customer.pk_name               transaction.fk_name            transaction.ticket_number
----------- ------------------------------ ------------------------------ -------------
          2|ACEVEDO BERMUDEZ, FRANCISCO J. ACEVEDO BERMUDEZ, FRANCISCO J.|123456
                                           ACEVEDO BERMUDEZ, FRANCISCO J.|123789

          3|ANDUJAR RODRIGUEZ, WILFREDO C. ANDUJAR RODRIGUEZ, WILFREDO C.|101010
                                           ANDUJAR RODRIGUEZ, WILFREDO C.|121212

          1|CASTILLO DIAZ, FRANKLIN J.     CASTILLO DIAZ, FRANKLIN J.    |232323
                                           CASTILLO DIAZ, FRANKLIN J.    |343434

因此,当 clerk wilcard 按客户主名称查询时,会自动快速查询客户交易当职员滚动浏览返回到当前列表中的姓名时显示,因为它们的排序顺序与主人相同。

现在,以下示例是使用 pk/fk id 的相同数据:

customer.pk_id customer.name                  transactions.fk_id transactions.ticket_#
-------------- ------------------------------ ------------------ ---------------------
             2|ACEVEDO BERMUDEZ, FRANCISCO J.                  1|232323
                                                               1|343434

             3|ANDUJAR RODRIGUEZ, WILFREDO C.                  2|123456
                                                               2|123789

             1|CASTILLO DIAZ, FRANKLIN J.                      3|101010
                                                               3|121212

好的,所以现在请记住,我的执行 1 页屏幕包括所有客户列和所有交易列,并且有一个主/详细指令,当店员查询时根据客户名称,自动显示属于该客户的第一个交易行。然后,店员将按“D”使交易成为活动表,并按“A”添加新交易,或者店员可以滚动浏览所有客户交易以更新特定交易或仅向客户提供信息。

当使用 pk/fk 姓名方法时,当店员滚动浏览客户姓名以找到所需的客户时,会立即做出响应。然而,当使用 pk/fk id 方法时,即使支持索引,响应时间也会滞后,因为当店员滚动浏览每个客户名称时,引擎必须跳转到交易表中的不同位置,以找到属于每个客户的相应交易组于师父!

因此,似乎将客户的交易行分组在一起并按照与客户行相同的排序顺序允许索引更快地定位交易,而不是必须跳过每个客户交易的分散组。 如果每个客户都能记住他们的客户编号,那么我的问题将是学术性的,但在现实世界中,我们甚至给每个客户一张带有客户编号的身份证,但大多数人都丢失了卡!

以下是每天早上当铺开门营业之前执行的每日重组的示例:

 {ISQL-SE (customer and transactions table reorg - once-daily, before start of    
  business, procedure}

 unload to "U:\UNL\CUSTOMERS.UNL"
    select * from customer
  order by customer.pk_name; 

 unload to "U:\UNL\TRAN_ACTIVES.UNL" 
    select * from transaction where transaction.status = "A" 
  order by transaction.fk_name, transaction.trx_date; 

 unload to "U:\UNL\TRAN_INACTIVES.UNL" 
    select * from transaction
     where transaction.status != "A" 
       and transaction.trx_date >= (today - 365) 
  order by transaction.fk_name, transaction.trx_date desc; 

 unload to "U:\UNL\TRAN_HISTORIC.UNL" 
    select * from transaction 
     where transaction.status != "A" 
       and transaction.trx_date < (today - 365) 
  order by transaction.trx_date desc; 

 drop table customer;     

 drop table transaction;

 create table customer
 (
  id serial,
  pk_name char(30),
  [...]
 ) 
 in "S:\PAWNSHOP.DBS\CUSTOMER";


 create table transaction
 ( 
  fk_name char(30),
  ticket_number serial,
  tran_type char(1), 
  status char(1), 
  trx_date date, 
  [...]
 )
 in "S:\PAWNSHOP.DBS\TRANSACTION"; 

 load from "U:\UNL\CUSTOMERS.UNL"      insert into customer     {>4800 nrows}
 load from "U:\UNL\TRAN_ACTIVES.UNL"   insert into transaction; {500:600 nrows avg.} 
 load from "U:\UNL\TRAN_INACTIVES.UNL" insert into transaction; {6500:7000 nrows avg.} 
 load from "U:\UNL\TRAN_HISTORIC.UNL"  insert into dss:historic;{>500K nrows} 

 create unique cluster index cust_pk_name_idx on customer(pk_name);
 create        cluster index tran_cust_idx    on transaction(fk_name); 

 {this groups each customers transactions together, actives in 
  oldest trx_date order first, then inactive transactions within the last year in most  
  recent trx_date order. inactives older than 1 year are loaded into historic  
  table in a separate database, on a separate hard disk. historic table  
  optimization is done on a weekly basis for DSS queries.} 

 create unique index tran_ticket_num_idx on transaction(ticket_num); 
 create        index tran_trx_date_idx   on transaction(trx_date); 
 create        index tran_status_idx     on transaction(status); 
 [...;]

 [grant statements...;] 

 update statistics; 

如果您有时间,我挑战任何人来测试这个!...当您有一张大桌子时,它会更加明显。

Pawnshop Application (any RDBMS):

one-to-many relationship where each customer (master) can have many transactions (detail).

customer(
id serial,
pk_name char(30), {PATERNAL-NAME MATERNAL-NAME, FIRST-NAME MIDDLE-NAME-INITIAL}
[...]
);
unique index on id;
unique cluster index on pk_name;


transaction(
fk_name char(30),
tran_type char(1), 
ticket_number serial,
[...]
);
dups cluster index on fk_name;
unique index on ticket_number; 

Several people have told me this is not the correct way to join master to detail. They said I should always join customer.id[serial] to transactions.id[integer].

When a customer pawns merchandise, clerk queries the master using wildcards on name. The query usually returns several customers, clerk scrolls until locating the right name, enters a 'D' to change to detail transactions table, all transactions are automatically queried, then clerk enters an 'A' to add a new transaction.

The problem with using customer.id joining transaction.id is that although the customer table is maintained in sorted name order, clustering the transaction table by fk_id groups
the transactions by fk_id, but they are not in the same order as the customer name, so when clerk is scrolling through customer names in the master, the system has to jump allover the place to locate the clustered transactions belonging to each customer. As each new customer is added, the next id is assigned to that customer, but new customers dont show up in alphabetical order. I experimented using id joins and confirmed the decrease in performance.

The drawbacks of using name joins vs. id joins is if you change customer name, the join with their transactions is severed, so I dont allow updating the name. Anyway, how often does one need to change a customers name? The other draw back is name requires 30 chars where id is INT, so .dat and .idx are larger. Every morning an sql proc is executed which unloads customer and transactions in sorted name order, drops/re-creates the tables, loads the unloaded data and all indexes are re-created which keeps performance optimized.

How can I use id joins instead of name joins and still preserve the clustered transaction order by name if transactions has no name column?

The following is an example of how the data sits in customer.dat and transactions.dat when using pk/fk name, as described in the above schema:

customer.id customer.pk_name               transaction.fk_name            transaction.ticket_number
----------- ------------------------------ ------------------------------ -------------
          2|ACEVEDO BERMUDEZ, FRANCISCO J. ACEVEDO BERMUDEZ, FRANCISCO J.|123456
                                           ACEVEDO BERMUDEZ, FRANCISCO J.|123789

          3|ANDUJAR RODRIGUEZ, WILFREDO C. ANDUJAR RODRIGUEZ, WILFREDO C.|101010
                                           ANDUJAR RODRIGUEZ, WILFREDO C.|121212

          1|CASTILLO DIAZ, FRANKLIN J.     CASTILLO DIAZ, FRANKLIN J.    |232323
                                           CASTILLO DIAZ, FRANKLIN J.    |343434

So, when clerk wilcard queries by customer master name, customers transactions are automatically queried and quickly displayed when clerk scrolls thru names returned into the current list since they are in the same sorted order as the master.

Now, the following example is the same data using pk/fk id:

customer.pk_id customer.name                  transactions.fk_id transactions.ticket_#
-------------- ------------------------------ ------------------ ---------------------
             2|ACEVEDO BERMUDEZ, FRANCISCO J.                  1|232323
                                                               1|343434

             3|ANDUJAR RODRIGUEZ, WILFREDO C.                  2|123456
                                                               2|123789

             1|CASTILLO DIAZ, FRANKLIN J.                      3|101010
                                                               3|121212

OK, so now keep in mind that my perform 1-page screen includes all customer columns and all transactions columns, and there's a master/detail instruction which when the clerk queries by customer name, the first transaction row belonging to that customer is automatically displayed. Then the clerk will press 'D' to make transactions the active table and press 'A' to add a new transaction, or clerk may scroll through all the customers transactions to update one in particular or just provide customer with info.

When using the pk/fk name method, as the clerk scrolls through customer names to locate the desired customer, response is immediate. Whereas when using the pk/fk id method, response time lags, even with supported indexing, because the engine has to jump to different locations in the transactions table to locate the corresponding group of transactions belonging to each customer as clerk scrolls through each customer name in the master!

So, it seems like having the customer's transaction rows grouped together and in the same sorted order as the customer rows allows the indexing to locate the transactions quicker as opposed to having to jump all over scattered groups of each customers transactions.
If each customer could remember their customer i.d. number, then my issue would be academic, but in the realworld, we even gave each customer an i.d. card with their customer number on it, but most of them lost their cards!

Here's an example of the daily reorg executed every morning before pawnshop opens for business:

 {ISQL-SE (customer and transactions table reorg - once-daily, before start of    
  business, procedure}

 unload to "U:\UNL\CUSTOMERS.UNL"
    select * from customer
  order by customer.pk_name; 

 unload to "U:\UNL\TRAN_ACTIVES.UNL" 
    select * from transaction where transaction.status = "A" 
  order by transaction.fk_name, transaction.trx_date; 

 unload to "U:\UNL\TRAN_INACTIVES.UNL" 
    select * from transaction
     where transaction.status != "A" 
       and transaction.trx_date >= (today - 365) 
  order by transaction.fk_name, transaction.trx_date desc; 

 unload to "U:\UNL\TRAN_HISTORIC.UNL" 
    select * from transaction 
     where transaction.status != "A" 
       and transaction.trx_date < (today - 365) 
  order by transaction.trx_date desc; 

 drop table customer;     

 drop table transaction;

 create table customer
 (
  id serial,
  pk_name char(30),
  [...]
 ) 
 in "S:\PAWNSHOP.DBS\CUSTOMER";


 create table transaction
 ( 
  fk_name char(30),
  ticket_number serial,
  tran_type char(1), 
  status char(1), 
  trx_date date, 
  [...]
 )
 in "S:\PAWNSHOP.DBS\TRANSACTION"; 

 load from "U:\UNL\CUSTOMERS.UNL"      insert into customer     {>4800 nrows}
 load from "U:\UNL\TRAN_ACTIVES.UNL"   insert into transaction; {500:600 nrows avg.} 
 load from "U:\UNL\TRAN_INACTIVES.UNL" insert into transaction; {6500:7000 nrows avg.} 
 load from "U:\UNL\TRAN_HISTORIC.UNL"  insert into dss:historic;{>500K nrows} 

 create unique cluster index cust_pk_name_idx on customer(pk_name);
 create        cluster index tran_cust_idx    on transaction(fk_name); 

 {this groups each customers transactions together, actives in 
  oldest trx_date order first, then inactive transactions within the last year in most  
  recent trx_date order. inactives older than 1 year are loaded into historic  
  table in a separate database, on a separate hard disk. historic table  
  optimization is done on a weekly basis for DSS queries.} 

 create unique index tran_ticket_num_idx on transaction(ticket_num); 
 create        index tran_trx_date_idx   on transaction(trx_date); 
 create        index tran_status_idx     on transaction(status); 
 [...;]

 [grant statements...;] 

 update statistics; 

If you have time, I CHALLENGE ANYONE TO TEST THIS!.. It's more noticeable when you have a large table.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

一袭白衣梦中忆 2024-09-13 06:29:53

他们是对的。连接 CHAR(30) 文本字段(尤其是包含人名数据的文本字段)将会很慢、效率极低且极其脆弱。人们确实会改变自己的名字(婚姻就是一个明显的例子),而且多个人可以有相同的名字。

您希望在表上创建适当的索引来支持您希望数据出现的顺序,并且忘记集群。您的性能优化过程听起来像是一场寻找发生地点的灾难。抱歉,但是删除/创建这样的表是自找麻烦。

我将从 customer.id 上的 UNIQUE INDEX、transaction.ticket_number 上的 UNIQUE INDEX 以及交易(id、ticket_number DESC)上的 INDEX(为了性能而不是基数,因此强制唯一性并不是非常重要)开始,并从那里。数据按照其在索引中出现的顺序从事务表返回。

只有当所有其他查询优化途径都用尽时,我才会考虑集群。

They're right. Joining on a CHAR(30) text field - particularly one containing person name data - will be slow, grossly inefficient and incredibly fragile. People do change their names (marriage is the obvious example), and multiple people can have the same name.

You want to create appropriate indexes on your tables to support the order you want data to appear in, and forget clustering. Your performance optimisation procedure sounds like a disaster looking for a place to happen. Sorry, but dropping/creating the tables like that is asking for trouble.

I would start with a UNIQUE INDEX on customer.id, a UNIQUE INDEX on transaction.ticket_number, and an INDEX (for performance rather than cardinality, so enforcing uniqueness not terribly important) on transactions (id, ticket_number DESC), and take it from there. Data is returned from the transaction table in the order it appears in the index.

I would only consider clustering when all other avenues of query optimisation have been exhausted.

薆情海 2024-09-13 06:29:53

对于长名字的人来说,您会遇到一些问题,这些名字不适合 CHAR(30),特别是如果您包含完整的中间名。

我认为您过于关心按名称对事务进行聚类。在您描述的场景中,您选择一个客户列表(因此我可以看到一些使客户可以通过名称轻松访问的要求,尽管索引应该足够了)。然后,针对特定客户访问交易,因此它们是否按客户 ID 或客户名称进行聚类并不重要。

You'll hit some problems with people with long names which won't fit in the CHAR(30), especially if you are including a full middle name.

I think you are overly concerned with clustering transactions by name. In the scenario you described, you pick a list of customers (so I can see some requirement for making customers easily accessible by name, though an index should be sufficient). Then the transactions are accessed for a particular customer, so it shouldn't matter whether they are clustered by the customer id or customer name.

橪书 2024-09-13 06:29:53

对于您提到的任何产品来说,您数据库中的记录数量都是微不足道的。结构正确的数据库不会出现按 ID 返回事务的问题。

在这种情况下,正确的结构意味着 ID 列是客户表中的主键和事务表中的外键。一般来说,外键会自动建立索引,但如果您使用的产品不会发生这种情况,则必须对事务表中的 customer_id 列建立索引。不要在事务表中包含名称字段。

假设您正在使用索引,不必担心数据库“到处乱跳”。数据库并不是以这种方式运行的简单软件。

The number of records you have in your database is trivial for any of the products you mention. A properly structured database would have no problem returning the transactions by ID.

Properly structured in this case means that the ID column is a primary key in the customer table and a foreign key in the transaction table. In general foreign keys are automatically indexed, but if you're using a product in which that doesn't happen, the customer_id column in the transaction table must be indexed. Do not include the name field in the transaction table.

Assuming you're using an index, do not worry about the database "jumping all over the place". Database are not such simple pieces of software that they operate in that fashion.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文