在mysql股票交易数据库中选择主键、代理键、索引
数据库类型:mysql
列: 日期、时间、价格1、数量1、价格2、数量2 时间将以毫秒为单位 一个月的记录数约为 550 万条。
我无法选择日期作为主键,因为它不是唯一的,但可以选择组合的日期和时间,但这也不是一个好主意。
我将运行类似的查询 在“此日期和时间”和“该日期和时间”之间选择价格和数量,结果可能在数百万范围内。
什么是主键、索引和代理键的最佳选择,以及实现这一点的最佳方法是什么。我应该如何优化数据库。
Database type: mysql
Columns:
Date,time,price1,qty1,price2,qty2
time will be in milliseconds
number of records approx 5.5 million for a month.
I cant choose date as primary key as it is not unique, but can choose date and time as combined but that is also not a good idea.
i will be running queries like
select price and qty between 'this date and time' and 'that date and time' and result might be in millions range.
what could be the best choice in terms of primary key, index and surrogate key and what is the best way to implement this. how should i optimize the database.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
不知道为什么你说选择日期和时间是一个坏主意(你反对复合键吗?)
对你来说更大的问题是时间不存储毫秒。有关详细信息,请参阅此错误:http://bugs.mysql.com/bug。 php?id=8523
此外,识别股票的键中似乎缺少某些内容,例如股票代码。由于股票代码会随着时间的推移而变化,因此引入一个替代项(例如 StockID)可能是一个好主意。您可以在名为 Stock 或类似的表中执行此操作。
然后,对于您的交易表,我建议使用 StockID、日期和时间(但将时间存储在 TIME 数据类型之外的其他数据中,以便您可以存储毫秒。如果您需要帮助,请询问另一个问题)。
PK 中密钥的顺序对于存储和检索都很重要。对于检索,您需要首先放置查询中最具选择性的键。因此,如果您倾向于一次访问一只股票(或一组股票)的所有数据,请将 StockID 放在第一位,以便可以使用索引快速找到它们。如果您倾向于访问给定时间间隔内的所有数据,请先放置日期,然后放置时间。
对于存储,最好是附加,因此首先添加日期和时间也是一个好主意。
如果您想主要在日期范围内访问,但有时想按股票访问,请在 StockID 上放置二级索引。
Not sure why you say choosing both date and time would be a bad idea (are you against composite keys?)
A bigger problem for you is that time does not store milliseconds. See this bug for more data on that: http://bugs.mysql.com/bug.php?id=8523
Also, there seems to be something missing from the key that identifies the Stock such as Ticker. Since the ticker can change over time, it might be a good idea to introduce a surrogate for it such as StockID. You would do this in a table called Stock or similar.
Then for your Trade table, I would suggest using StockID, Date and Time (but store the time in something other than the TIME datatype so you can store milliseconds. Ask another question if you need help with that).
The order of the keys in the PK is important for both storage and retreival. For retrieval, you want to put the most selective keys for your query first. So if you tend to access all the data for a stock at once (or for a set of stocks), put StockID first so the index can be used to find them quickly. If you tend to access all data for a given interval, put Date then Time first.
For storage, its better to be appending so having Date and Time first is a good idea here too.
In case you want to access mostly in date ranges, but sometimes by Stock, put a secondary index on StockID.
由于您没有自然键(因此每行中没有唯一的内容),因此您需要添加一个代理键(为了参数“transactionid”)。您仍然可以根据日期时间(实际上应该是单列)建立索引,以进行有效的周期扫描。
As you don't have a natural key (so nothing unique within each row), you'd need to add a surrogate key (for the sake of argument "transactionid"). You can still have your index based on date time (that really, really should be a single column) for efficient period scanning.