历史聚合的数据库设计
我正在尝试确定哪种是最好的数据仓库类型设计。它将用于使用 Google 类型搜索来查找不同时间段内不同商品的历史平均价格。例如,股票 A 本月、300 万、600 万和 1 年前的平均价格是多少?问题是我没有可以使用的项目名称,我有关于该项目的描述字段。
这意味着我无法将项目聚合到视图中,因为同一个项目可能会列出 20 次,每次都有不同的描述,所以我必须在描述字段上进行全文搜索,获取插入日期所在的价格< 3个月前。然后求出其平均值。
因此,我最好的选择是将所有内容都放在一张表中,例如:
MAIN
----------------------------
ID | Description | Price | Date
或许多表:
DESCRIPTION
------------------
ID | Description |
PRICE
---------
ID | PRICE
并且只需加入即可获取我想要的数据。该数据库将包含几百万行。如果我有办法获得该项目的真实名称,我可以看到预先聚合数据,但这对我来说不是一个选择。我很感激任何建议!
I am trying to decide which would be the best data warehouse type design. It will be used to find historical price averages of different item during different time periods using a Google type search. For example, what was the avg price of Stock A this month, 3m, 6m, and 1 year ago? The issue is that I do not have an item name that I can use, I have descriptions fields about the item.
This means that I can't aggregate items into views, since the same item maybe listed 20 times each with different descriptions, so I have to on the fly do a full-text search on the description field, grab the price where the insertdate is < 3 months ago. Then find the average of that.
So is my best bet to have everything in one table like:
MAIN
----------------------------
ID | Description | Price | Date
or many tables:
DESCRIPTION
------------------
ID | Description |
PRICE
---------
ID | PRICE
And just join to get the data I want. The database will contain a few million rows. If I had a way to get the real name of the item I could see pre aggregating the data, but that is not an option for me. I appreciate any advice!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
我想说选项 2 ...将顶级详细信息保留在“描述”表中。以及“价格”表中的历史数据(尽管添加了日期字段以捕获时间值)
I'd say option 2 ... keep the top level details in the "description" table. And the historic data in the "price" table (albeit, with a Date field added to capture the temporal value)
正如 Joel 所建议的,选项 2 可能会为您提供更大的灵活性。我建议在每个表中添加额外的日期,以适应缓慢变化的维度。关于给定项目的描述和其他属性可能会随着时间的推移而改变。
对于实体零售商,您很可能还包含商店 ID,因为由于给定位置附近客户的竞争和人口构成,不同位置的商品价格很可能不同。
As Joel suggested, Option 2 is likely going to provide you more flexibility. I would suggest including additional dates in each table to accomodate for slowly changing dimensions. Descriptions and other attributes about a given item may change over time.
In the case of a brick and mortar retailer, you would quite likely include the Store ID as well because items are quite likely priced differently in different locations due to competition and demographic make-up of your customers near a given location.