当前位置：文江博客话题详情

MySQL database financial

数据库设计问题

发布于 2024-07-10 13:26:41 字数 564 浏览 10 评论 0 原文

我以原始形式（csv 和二进制）积累了大量数据 - 准确地说，几个月每天 4GB。

我决定加入文明世界并使用数据库来访问数据，我想知道什么是正确的布局；格式非常简单：每次报价（买价、卖价、时间戳等）的几行 x 最多 50 万/天 x 数百种金融工具 x 月的数据。

有一个带有 MYISAM 的 MySQL 服务器（我知道它是这种类型使用的正确引擎），在商用硬件（2 x 1GB RAID 0 SATA，核心 2 @ 2.7GHz）上运行

正确的布局是什么数据库？表格/索引应该是什么样子？对于这种情况，一般建议是什么？您预计什么会给我带来陷阱？

编辑：我的常见用法是简单的查询来提取特定日期和工具的时间序列信息，例如

SELECT (ask + bid) / 2
  WHERE instrument='GOOG'
  AND date = '01-06-2008'
  ORDER BY timeStamp;

编辑：我试图将所有数据填充到一个索引表中到了时间戳，但速度太慢了 - 因此我认为需要一个更复杂的方案。

原文

I accumulated a quite a lot of data in a raw form (csv and binary) - 4GB per day for a few months to be precise.

I decided to join the civilized world and use database to access the data and I wondered what would be the correct layout; the format is quite simple: a few rows for every time tick (bid, ask, timestamp, etc.) x up to 0.5Million/day x hundreds of financial instruments x monthes of data.

There is a MySQL server with MYISAM (which I understood would be the correct engine for this type of usage) running on commodity harware (2 x 1GB RAID 0 SATA, core 2 @ 2.7GHz)

What would be correct layout of the database? How should the tables/indices look like? What are the general recommendations with this scenario? What would you predict set me pitfalls along the way?

Edit: my common usage will be simple queries to extract time series information for a specific date and instruments, e.g.

SELECT (ask + bid) / 2
  WHERE instrument='GOOG'
  AND date = '01-06-2008'
  ORDER BY timeStamp;

Edit: I tried to stuff all my data in one table indexed by the timeStamp but it was way too slow - therefore I reckoned it would take a more elaborate scheme.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

素染倾城色 2024-07-17 13:26:41

你并没有真正说出你的背景是什么以及你对编程和数据库设计了解多少。听起来你应该读点书。从概念上讲，您的设计相当简单。您的描述仅标识了两个实体：

金融工具；和
报价。

所以你需要识别属性。

金融工具：

安全码；
市场;
引用

：

时间戳；
金融工具；
竞价; 和
要价。

对金融工具的引用就是所谓的外键。每个表还需要一个主键，可能只是一个自动增量场地。

从概念上讲相当简单。

CREATE TABLE instrument (
  id BIGINT NOT NULL AUTO_INCREMENT,
  code CHAR(4),
  company_name VARCHAR(100),
  PRIMARY KEY (id)
);

CREATE TABLE quote (
  id BIGINT NOT NULL AUTO_INCREMENT,
  intrument_id BIGINT NOT NULL,
  dt DATETIME NOT NULL,
  bid NUMERIC(8,3),
  ask NUMERIC(8,3),
  PRIMARY KEY (id)
)

CREATE INDEX instrument_idx1 ON instrument (code);

CREATE INDEX quote_idx1 ON quote (instrument_id, dt);

SELECT (bid + ask) / 2
FROM instrument i
JOIN quote q ON i.id = q.instrument_id
WHERE i.code = 'GOOG'
AND q.dt >= '01-06-2008' AND q.dt < '02-06-2008'

如果您的数据集足够大，您可能需要在表中包含 (bid + Ask) / 2，这样您就不必即时计算。

好的，这就是标准化视图。之后您可能需要开始进行性能优化。考虑这个关于在 MySQL 中存储数十亿行的问题。分区是 MySQL 5.1+（相当新）的一项功能。

但要问自己的另一个问题是：您需要存储所有这些数据吗？我问这个问题的原因是，我曾经在网上经纪工作，我们只存储非常有限的窗口内的所有交易，并且交易将是比报价更小的数据集，这似乎是您想要的。

存储数十亿行数据是一个严重的问题，您确实需要认真的帮助来解决这个问题。

You don't really say what your background is and how much you know about programming and database design. It sounds like you should do some reading. Conceptually though your design is fairly simple. Your description identifies a mere two entities:

Financial instrument; and
Quote.

So you need to then identify the attributes.

Financial instrument:

Security code;
Market;
etc.

Quote:

Timestamp;
Financial instrument;
Bid price; and
Ask price.

The reference to the financial instrument is what's called a foreign key. Each table also needs a primary key, probably just an auto-increment field.

Conceptually fairly simple.

CREATE TABLE instrument (
  id BIGINT NOT NULL AUTO_INCREMENT,
  code CHAR(4),
  company_name VARCHAR(100),
  PRIMARY KEY (id)
);

CREATE TABLE quote (
  id BIGINT NOT NULL AUTO_INCREMENT,
  intrument_id BIGINT NOT NULL,
  dt DATETIME NOT NULL,
  bid NUMERIC(8,3),
  ask NUMERIC(8,3),
  PRIMARY KEY (id)
)

CREATE INDEX instrument_idx1 ON instrument (code);

CREATE INDEX quote_idx1 ON quote (instrument_id, dt);

SELECT (bid + ask) / 2
FROM instrument i
JOIN quote q ON i.id = q.instrument_id
WHERE i.code = 'GOOG'
AND q.dt >= '01-06-2008' AND q.dt < '02-06-2008'

If your dataset is sufficiently large you might want to include (bid + ask) / 2 in the table so you don't have to calculate on the fly.

Ok, so that's the normalized view. After this you may need to start making performance optimizations. Consider this question about storing billions of rows in MySQL. Partitioning is a feature of MySQL 5.1+ (fairly new).

But another question to ask yourself is this: do you need to store all this data? The reason I ask this is that I used to be working in online broking and we only stored all the trades for a very limited window and trades would be a smaller set of data than quotes, which you seem to want.

Storing billions of rows of data is a serious problem and one you really need serious help to solve.

回复收藏 0 原文