当前位置：文江博客话题详情

当优化我的“事件”时表，我应该更关心字段的数量还是相互关联的表的数量？

发布于 2024-12-27 12:18:10 字数 1742 浏览 2 评论 0原文

这个问题是我之前提出的关于如何最好地建模不同类型的时间量和时间范围的问题的延续：在数据库中，如何存储事件发生日期和时间范围以进行快速/优雅的查询？

给定一个事件表，我想要最简单的方法来建模和查询具有这些类型的事件发生次数：

一次性：XY 摇滚乐队于 2014 年 12 月 12 日在 Rockhouse 演出
每年：感恩节早上在施粥处做志愿者
每月< MoMA 享受免费之夜
营业时间

每周：正常

strong> ：
在
/
每个第一个星期六（字符串，例如“每周”、“每月”）
mon（布尔值）
tues
wed
thu
fri
sat
sun（所有布尔值）
Schedule（文本）
Frequency_description（文本）

我预见的一个常见用例是在给定的星期二......比如说， 2016 年 4 月 5 日，我想找到该星期二发生的所有事情......包括所有在常规星期二营业的企业、每月星期二发生的任何事情以及在该特定日期发生的任何事情。

因此，伪代码查询类似于：

SELECT * from events WHERE `tues`=TRUE || DATE(start_datetime) = '2016-04-05'

在应用程序/控制器级别，我可以应用必要的逻辑来排除所有在第一个上不会发生的“每月”星期二事件> 星期二，使用 Frequency_description 中的密钥/存储（为了讨论起见，我将忽略“年度”边缘情况，即 11 月的每个第四个星期四或类似的事情发生）。最好在查询中进行排除，但我不确定如何设计表以允许这样做并仍然保留简单的 SELECT。

我还预测，没有必要执行一个查询来查找所有企业在周二上午 9 点开门营业的情况...因此，各个日期字段可以是节省空间的布尔值，并带有 schedule字段是我的非标准化特定信息的日期存储。应用程序将具有对其进行解析和格式化以供显示的逻辑。

这是否太过分了？假设我的 70% 的活动是一次性的，这就消除了对周一、周二、周三等的需要以及时间表和频率描述文本密钥存储...

我应该有两个表吗？一个用于事件，另一个用于某种 event_relation，其中 day_fields 和 key-store-textfields 连接在一起？

这似乎是对空间的更有效利用...另一方面，我的查询必须是 SELECT 和 JOIN...这可能会更慢。

当处理数量从 10k 到 100k 的记录并进行简单的 EC2 托管时...我是否应该更关心数据库中的有效空间使用（不仅仅是纯粹的数据存储空间，而是与文本字段和大量数据相关的所有相关开销）列）...或者我应该更关心简单的 SELECT 语句？

原文

This question is a folo to a previous question I asked about how to best model different kind of time quantities and timeframes:
In a database, how to store event occurrence dates and timeframes for fast/elegant querying?

Given a table of events, I'd like the simplest way to model and query events that have these kinds of occurrences:

One-time: XY Rock band has a show on Dec. 12, 2014 at the Rockhouse
Annually: Volunteer at the soup kitchen on Thanksgiving morning
Monthly: Free night at the MoMA every first Saturday
Weekly: Regular business hours

I've been kicking around doing a schema in this form:

Name
Description
start_datetime
end_datetime
frequency_type (string, e.g. 'Weekly', 'Monthly')
mon (boolean)
tues
wed
thu
fri
sat
sun (all booleans)
schedule (text)
frequency_description (text)

A common usecase I foresee is that on a given Tuesday...say, 4/5/2016, I want to find everything that is happening on that Tuesday..including all businesses that are open on regular Tuesdays, anything that happens monthly on a Tuesday, and anything happening on that specific date.

So the pseudocode query would be something like:

SELECT * from events WHERE `tues`=TRUE || DATE(start_datetime) = '2016-04-05'

At the application/controller level I could apply the necessary logic to exclude all "monthly" Tuesday events that don't happen on the first Tuesday, using a key/store in frequency_description (I'm going to ignore for discussion's sake, the "annual" edge case in which something happens every fourth thursday of November or some such thing). It'd be nice to do that exclusion in the query but I'm not sure how to design the table to allow that and still keep a simple SELECT.

I'm also predicting that it's not necessary to do a query in which I find all businesses open on Tuesday at 9AM...So the individual day fields can just be space-efficient booleans, with the schedule field being a date-store of my non-normalized specific information. The application will have logic to parse and format it for display.

Is this overkill? Let's say 70% of my events will be one-time, which eliminates the need for the mon,tue,wed, etc. and the schedule and frequency_description text-key-stores...

Should I instead have two tables? One for events, and one for some kind of event_relation in which the day_fields and key-store-textfields are joined?

That seems like a more efficient use of space...on the other hand, my query would have to be a SELECT and JOIN...which may be slower.

When dealing with a magnitude of records numbering from 10k to 100k, and doing simple EC2 hosting...should I care more about efficient space usage in my database (not just pure data storage space, but all the associated overhead with text fields and numerous columns)...or should I care more about simple SELECT statements?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

两人的回忆 2025-01-03 12:18:10

您可以将重复事件插入到“一次”事件表中，并使用一个引用回主重复事件记录（在单独的表中）的键。

虽然这对于空间使用来说不是很好..你可以做一些快捷方式，说“从现在到所有时间结束的每个星期二”发生的事件，结束时间实际上可能默认为从现在开始的 200 年后，这意味着在这种极端情况下您只填充 10k 条记录 (52 * 200)。

这将大大简化您的阅读，因为您只需查找该日期发生的任何“事件”，然后您将根据主重复事件表记录执行所有排除。

所以你有这样的事情：

Events table = Your current schema
Event occurrence table = {event_id, start_datetime, end_datetime}

假设你有 1000 个每周重复事件（如果没有 endDate，我们假设你使用 200 年），也就是说有 10M 条记录，然后你对 start_datetime 字段建立索引事件发生表，即使记录比这多得多，您的查询也会非常快。比较这种成本（写入性能下降和使用更多空间）与查找今天在开始日期和结束日期之间的每个事件，然后计算该事件是否实际上在今天发生。

最终一切都归结为：

“空间成本是多少？”
“您打算多久更新一次记录（您是否想要更新所有记录，包括事件的历史记录）？”
以及“您希望在特定日期运行一次选择的频率是多少？” （可能经常）

You could just make your recurring events insert into the 'once of' event table with a key referencing back to the master recurring event record (in a separate table).

While it's not very good for space usage.. you can make some shortcuts that say that events that occur "every Tuesday from now to the end of all time", the end time might actually default to say 200 years in the future from now, that means you're only populating 10k records (52 * 200) in this extreme case.

This would simplify your reading greatly as you would then just be looking for any 'event' that occurs on that date, and then you would do all your excludes based on the master recurring event table record.

So you have something like this:

Events table = Your current schema
Event occurrence table = {event_id, start_datetime, end_datetime}

Suppose you have 1000 weekly recurring events, (and we assume you go with 200 years if no endDate) that's going to be say 10M records, you then index the start_datetime field of the Event occurrence table and your query will be very quick even with many more records than this. Compare the costs of this (reduced performance on writes and more space used) versus having to find every event that today is between startdate and enddate and then calculate if the event is actually occurring on today.

In the end it all comes down to:

'how much does space cost you?'
'how often are you going to update records (and do you want to update all records including historical records for an event)?'
and 'how often are you going to want to be running a select on a specific date? (likely very often)

回复收藏 0 原文

~没有更多了~