简单统计项目的数据库架构

发布于 2024-10-09 11:41:20 字数 1364 浏览 0 评论 0原文

背景：

我有一个 cvs 文件的文件层次结构，用于多个位置，这些位置按它们覆盖的日期命名......具体是按月。文件夹中的每个 cvs 文件均以位置命名。

例如'，文件夹名称：2010-2 月

包含：位置1.csv location2.csv

每个 CSV 文件都包含这样的记录：

2010-06-28, 20:30:00 , 0
2010-06-29, 08:30:00 , 0
2010-06-29, 09:30:00 , 0
2010-06-29, 10:30:00 , 0
2010-06-29, 11:30:00 , 0

记录列（列名）的含义：

Date, time, # of sessions

我有一个 perl 脚本，可以从这个混乱中提取数据，最初我打算将其存储为 json 文件，但我认为数据库可能会从长远来看更合适......比较逐年趋势......类似的有趣的东西。

第 2 部分 - 我的问题/问题：

所以我现在有一个 REST 服务，可以使用测试数据库生成 json。我的问题是[我不擅长数据库设计]，如何最好地为此设计数据库后端？

我认为下面的表格就足够了，并且保持简单：

Location: (PK)location_code, name 
session: (PK)id, (FK)location_code, month, hour, num_sessions

除了给定的一个月或几个月中的一周中的几天之外，我还需要能够在一周中的几天中计算每小时的平均会话（加上最小和最大）。我一直在使用 Perl 哈希来执行此操作，并尝试确定如何最好地使用数据库来实现此操作。

您认为应该使用存储过程吗？

至于数据库，根据这里收集的信息，它将是 postgresql 或 sqlite。如果没有令人信服的理由使用 postgresql，我会坚持使用 sqlite。

我应该如何以及在哪里将数据与运行时间进行比较。我正在存储时间 yaml 文件中的操作。我目前将数据中的小时与 yaml 中的哈希值“匹配”来执行此操作。数据库会开放更简单的方法吗？我想我会像现在一样进行比较，然后插入数据。可以回忆一下：

SELECT hour, num_sessions FROM session WHERE location_code=LOC1

由于只有营业时间，所以我不需要担心它。我是否应该像现在一样计算所有结果，然后将其存储为统计表不同的“报告”？这，而不是按需处理？这看起来怎么样？

无论如何......我闲逛。

感谢您的阅读！

布布诺夫

原文

Backdrop:

I have a file hierarchy of cvs files for multiple locations named by dates they cover ...by month specifically. Each cvs file in the folder is named after the location.

eg',
folder name: 2010-feb

contains:
location1.csv
location2.csv

Each CSV file holds records like this:

2010-06-28, 20:30:00 , 0
2010-06-29, 08:30:00 , 0
2010-06-29, 09:30:00 , 0
2010-06-29, 10:30:00 , 0
2010-06-29, 11:30:00 , 0

meaning of record columns ( column names ):

Date, time, # of sessions

I have a perl script that pulls the data from this mess and originally I was going to store it as json files, but am thinking a database might be more appropriate long term ...comparing year to year trends ...fun stuff like that.

Pt 2 - My question/problem:

So I now have a REST service that coughs up json with a test database. My question is [ I suck at db design ], how best to design a database backend for this?

I am thinking the following tables would suffice and keep it simple:

Location: (PK)location_code, name 
session: (PK)id, (FK)location_code, month, hour, num_sessions

I need to be able to average sessions (plus min and max) for each hour across days of week in addition to days of week in a given month or months. I've been using perl hashes to do this and am trying to decide how best to implement this with a database.

Do you think stored procedures should be used?

As to the database, depending on info gathered here, it will be postgresql or sqlite.
If there is no compelling reason for postgresql I'll stick with sqlite.

How and where should I compare the data to hours of operation. I am storing the hours
of operation in a yaml file. I currently 'match' the hour in the data to a hash from the yaml to do this. Would a database open simpler methods? I am thinking I would do this comparison as I do now then insert the data. Can be recalled with:

SELECT hour, num_sessions FROM session WHERE location_code=LOC1

Since only hours of operation are present, I do not need to worry about it.
Should I calculate all results as I do now then store as a stats table for
different 'reports'? This, rather than processing on demand? How would this look?

Anyway ...I ramble.

Thanks for reading!

Bubnoff

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

涫野音 2024-10-16 11:41:20

根据我对 SQLite 的了解，它提供了进行所需分析所需的功能（总和、平均值等），并且看起来您将在自己的 api 级别执行此操作，而不是允许最终用户通过界面自行完成。因此，对于简单的设计 + 小数据集，我会考虑将所有数据放入 SQLite。我还将其设置为 SQLite 本身可以理解的格式，以便您可以使用其 SQL 函数，而无需先转换任何内容，也无需创建在 SQL 中使用的特殊函数来进行转换。

除此之外，除了月份和小时字段之外，您的设计对我来说看起来不错。我会将它们保留为完整的日期和时间字段，或者如果有合适的 SQLite 数据类型，则可能将它们组合到一个 date_time 字段，并将完整的日期/时间数据放入其中（以防您稍后需要它）。然后使用 SQLite 时间函数从完整日期/时间字段中提取适当的月份和小时。为了方便起见，如果 SQLite 支持它，您可以在会话表中为月份和小时创建计算字段，这将让您立即从查询中返回您要查找的数据，而不是在您想要一个月或一小时的每个查询中显式调用时间提取函数。

另外，不要忘记在查询中设置条件的字段上放置索引。您可能不会注意到小数据集的差异，但随着您的数据库变大，它们可能会产生巨大的差异。

回复收藏 0 原文

~没有更多了~