数据库设计问题 - 哪个是最佳解决方案?

发布于 2024-10-03 12:33:18 字数 599 浏览 7 评论 0原文

我正在使用 Firebird 2.1,我正在寻找解决此问题的最佳方法。

我正在编写一个日历应用程序。不同用户的日历条目存储在一个大日历表中。每个日历条目都可以设置一个提醒 - 只有一个提醒/条目。

据统计,随着时间的推移,日历表可能会增长到数十万条记录,而提醒会少得多。

我需要不断查询提醒。

哪个是最好的选择?

A) 将提醒信息存储在日历表中(在这种情况下,我将查询数十万条记录来查找 IsReminder = 1)

B) 创建一个单独的提醒表,其中仅包含设置了提醒的日历条目的 ID ,然后使用 JOIN 操作查询这两个表(或者可能在它们上创建一个视图)

C) 我可以将有关提醒的所有信息存储在 Reminders 表中,然后仅查询该表。缺点是一些信息需要在两个表中重复,例如为了显示提醒,我需要知道事件的开始时间并将其存储在提醒表中 - 因此我维护两个具有相同值的表。

你怎么认为?

还有一个问题:Calendar 表将包含多个用户的日历,仅由 UserID 字段分隔。由于只能有4-5个用户,即使我在这个字段上建立索引,它的选择性也会很差——这对于一个有几十万条记录的表来说是不好的。这里有解决方法吗?

谢谢!

I'm using Firebird 2.1 and I'm looking for the best way to solve this issue.

I'm writing a calendaring application. Different users' calendar entries are stored in a big Calendar table. Each calendar entry can have a reminder set - only one reminder/entry.

Statistically, the Calendar table could grow to hundreds of thousands of records over time, while there are going to be much less reminders.

I need to query the reminders on a constant basis.

Which is the best option?

A) Store the reminders' info in the Calendar table (in which case I'm going to query hundreds of thousands of records for IsReminder = 1)

B) Create a separate Reminders table which contains only the ID of calendar entries which have reminders set, then query the two tables with a JOIN operation (or maybe create a view on them)

C) I can store all information about reminders in the Reminders table, then query only this table. The downside is that some information needs to be duplicated in both tables, like in order to show the reminder, I'll need to know and store the event's starttime in the Reminders table - thus I'm maintaining two tables with the same values.

What do you think?

And one more question: The Calendar table will contain the calender of multiple users, separated only by a UserID field. Since there can be only 4-5 users, even if I put an index on this field, its selectivity is going to be very bad - which is not good for a table with hundreds of thousands of records. Is there a workaround here?

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

哽咽笑 2024-10-10 12:33:18

这三种选择各有优点和缺点。哪一种最好取决于您未提供的详细信息。一般来说,只要您设置的索引允许正确的检索策略,就不必太担心从十万个条目中选择三到四个条目。如果不了解索引,那么无论您做出这三个选择中的哪一个,您都可能会遇到麻烦。

如果是我,我会选择 B。我还会将提醒的任何属性存储在提醒表中。

无论是单独通过 EventId 还是通过 (UserId, EventId) 来标识事件,请务必小心。如果选择后者,则有必要为事件表使用复合主键。不要太担心复合主键,尤其是 Firebird。
如果声明复合主键,请注意声明 (UserId, EventId) 不会产生与声明 (EventId, UserId) 相同的结果。它们在逻辑上是等效的,但是两种情况下自动生成的索引的结构会有所不同。

这反过来会影响“查找给定用户的所有提醒”等查询的速度。

再说一遍,如果是我,我会避免选择 C。在模式中引入有害冗余会带来在更新数据时进行一些非常仔细编程的责任。否则,您最终可能会得到一个在数据库的不同位置存储同一事实的矛盾版本的数据库。

而且,如果您确实想了解对性能的影响,请尝试所有三种方法,加载测试数据,并进行自己的基准测试。

There are advantages and drawbacks to all three choices. Whis one is best depends on details you have not provided. In general, don't worry too much about selecting three or four entries out of a hundred thousand, provided the indexes you have set up allow the right retrieval strategy. If don't understand indexing, you're likely to be in trouble no matter which of the three choices you make.

If it were me, I would go with choice B. I'd also store any attributes of a reminder in the table for reminders.

Be very careful about whether you identify an event by EventId alone or by (UserId, EventId). If you choose the latter, it behooves you to use a compound primary key for the Event table. Don't worry too much about compound primary keys, especially with Firebird.
If you declare a compound primary key, be aware that declaring (UserId, EventId) will not have the same consequences as declaring (EventId, UserId). They are logically equivalent, but the structure of the automatically generated index will be different in the two cases.

This in turn will affect the speed of queries like "find all the reminders for a given user".

Again, if it were me, I'd avoid choice C. the introduction of harmful redundancy into a schema carries with it the responsibility for some very careful programming when you go to update the data. Otherwise, you can end up with a database that stores contradictory versions of the same fact in different places of the database.

And, if you really want to know the effect on perfromance, try all three ways, load with test data, and do your own benchmarks.

半世蒼涼 2024-10-10 12:33:18

我认为您需要创建真实的、虚假的用户数据,并衡量与您期望运行的一些典型查询的差异。

索引、查询优化和您需要的查询结果类型可能会产生很大的影响,
因此,在不了解更多信息的情况下很难说出什么是最好的。

I think you need to create realistic, fake user data and measure the difference with some typical queries you expect to run.

Indexing, query optimization and the types of query results you need can make a big difference,
so it's not easy to say what's best without knowing more.

关于从前 2024-10-10 12:33:18

选择选项 (A) 时,您应该

  • 提供“IsReminder”上的索引(或 IsReminder、UserId 上的组合索引,任何最适合您的预期查询的索引)
  • 确保您的查询使用此索引

如果您有更多选项,则选项 B 优于 A不是每个要存储的提醒的布尔标志(例如,在事件发生之前应通知用户的分钟数)。然而,您应该猜测在您的程序中您需要连接两个表的频率。

如果可以的话,请避免选择 C。如果您不想对所有三种情况进行基准测试,我建议根据所描述的情况从 A 或 B 开始,并且您选择的解决方案可能会足够快,因此您不必必须费心处理其他情况。

When choosing Option (A) you should

  • provide an index on "IsReminder" (or a combined index on IsReminder, UserId, whatever fits best to your intended queries)
  • make sure your queries use this index

Option B is preferable over A if you have more than a boolean flag for each reminder to store (for example, the number of minutes the user shall be notified before the event). You should, however, make some guessing how often in your program you will have to JOIN both tables.

If you can, avoid option C. If you don't want to benchmark all three cases, I suggest start with A or B, according to the described circumstances, and probably the solution you choose will be fast enough, so you don't have to bother with the other cases.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文