mysql:存储任意数据
背景:
我问了一个关于堆栈溢出的问题,关于在进行此对话时动态创建表:
这听起来是个糟糕的主意!事实上,它闻起来就像这个。
<小时> @deceze:非常正确,但是,您还如何存储这些 CSV 文件的内容。
你到底想用这个做什么? – 德塞兹
它们必须存储在mysql中以便建立索引。 关于它们唯一可靠的事实是它们都有一个标准格式的移动栏。
CSV 可以具有任意数量的列和任意数量的行。
它们的范围可以(毫不夸张)从单行、35 列 CSV 到 80k 行单列 CSV。我对其他想法持开放态度。 ——海尔伍德 <小时> 有很多解决方案,从属性值模式到 JSON 存储和 NoSQL 存储。打开一个 关于它的新问题。无论你做什么 不过,不要动态创建 桌子! – 德塞兹
问题:
所以我的问题是, 您认为存储这些数据的最佳方式是什么?
您是否同意deceze关于不创建动态表的观点?
Background:
I was asking a question on stack overflow regarding creating tables on the fly where this conversation ensued:
This smells like a terrible idea! In fact, it smells just like this one.
What in the world do you want to use this for? – deceze
@deceze: very true, However, How else would you store the contents of these CSV files.
They must be stored in mysql for indexing.
The only solid fact about them is that they all have a mobile column with a standard format.
The CSV can have an arbitrary amount of columns with an arbitrary amount of rows.
They can (with no exaggeration) range from a single row, 35 column csv to an 80k row single column CSV. I am open to other ideas. – Hailwood
There are many solutions for this, from attribute-value schemas to
JSON storage and NoSQL storage. Open a
new question about it. Whatever you do
though, don't dynamically create
tables! – deceze
Question:
So my question is,
What would you say is the best way to store this data?
Are you in agreement with deceze about not creating dynamic tables?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
存储任意长记录的一个非常简单的模式是这样的:
因此 CSV 记录可以这样存储:
另一种方法是将记录数据作为 JSON 打包的 blob 存储在单个列中。如果您不需要搜索数据,这是最紧凑的方式,尽管不是非常 RDBMS。
最合适的可能是 NoSQL 数据库(如果您有该选项)。
A very simple schema for storing arbitrarily long records is this:
So a CSV record can be stored like this:
An alternative is to store the record data as JSON packed blob in a single column. If you don't need to search for the data, this is the most compact way, albeit not very RDBMS.
The best fit is probably a NoSQL database, if you have that option.
XML 文件可以处理这种事情吗?使用XQuery使其在语言结构上与SQL相似,并且XML在动态添加数据方面没有问题。
Could an XML file work with this kind of thing? Using XQuery makes it similar in language structure to SQL, and XML has no problems with dynamic adding of data.
您还可以查看 Amazon 的 SimpleDB。它是专门为此目的而设计的。它允许您向所有记录和索引添加任意属性。我确信 NoSQL 领域可能还有其他解决方案。
You might also check out Amazon's SimpleDB. It's specifically engineered for this purpose. It allows you to add arbitrary attributes to your records and indexes across all of them. I'm sure there are probably other solutions in the NoSQL realm as well.
我想详细说明 deceze 的记录/记录属性答案,但评论还不够……
这让人想起 SimpleDB 的项目和属性模型。如果您来自普通的 RDB 世界,请查看 SimpleDB 文档,了解一些您需要考虑的奇怪之处,例如:
所有值都存储为文本,以便排序(或选择值范围)对于非字符串数据类型(通常是数字和日期),您需要采取一些不寻常的数据步骤,包括零填充和偏移量,以便数据“按字典顺序”正确排序。
考虑您的查询是什么样子。要获取具有 Color='Red' 和 Size>3 等属性的项目,您可以从以下内容开始:
SELECT Items.*, Sizes.Value AS Size
来自项目
INNER JOIN 属性作为颜色 ON Items.ItemID=Attributes.ItemID AND Attributes.Name='Color'
INNER JOIN 属性 AS Sizes ON Items.ItemID=Attributes.ItemID AND Attributes.Name='Size'
WHERE Colors.Value='Red' AND Sizes.Value>'003'
您可以通过几种替代方式构造此查询,但需要注意的主要事项是:
您要过滤的属性越多,就越多加入您需要的。请注意,您不能简单地执行以下操作: SELECT ... FROM Items INNER JOIN Attributes USING ItemID WHERE (Attributes.Name='Color' AND Attributes.Value='Red') AND (Attributes.Name='Size' AND Attributes. Value>'003') -- 一旦你看到它写出来,这是不言而喻的
如果你想在响应中添加额外的属性,你需要添加更多的连接(我包括了 Size 来表明它对于以下之一来说很简单已经加入的属性)。但是,如果您想要检索包含所选项目的大量属性的列的响应,该怎么办?查询将开始变得更加复杂。至少 SimpleDB 会透明地为您处理这些内容,以便对查询的响应看起来像您所期望的那样,其中包含指定属性的列。
关键是,以这种方式存储数据相当容易,但查询它却变得困难。如果您的数据集变大,您可能需要考虑索引属性的正确方法。
I wanted to elaborate on deceze's records/record-attributes answer, but a comment didn't suffice...
This is reminiscent of SimpleDB's Items and Attributes model. If you come from the normal RDB world, look at the SimpleDB documentation to see some of the strangeness you'll need to account for, such as:
All values are stored as text, so to sort (or select value ranges) for non-string data types (usually numbers and dates), you need to take some unusual steps, including zero-padding and offsets so that data sorts correctly "lexicographically".
Consider what your queries will look like. To get Items with Attributes like Color='Red' and Size>3, you might start with something like this:
SELECT Items.*, Sizes.Value AS Size
FROM Items
INNER JOIN Attributes AS Colors ON Items.ItemID=Attributes.ItemID AND Attributes.Name='Color'
INNER JOIN Attributes AS Sizes ON Items.ItemID=Attributes.ItemID AND Attributes.Name='Size'
WHERE Colors.Value='Red' AND Sizes.Value>'003'
You could structure this query in a couple of alternative ways, but the main things to note are:
the more Attributes you want to filter by, the more JOINs you need. Note that you CAN'T simply do: SELECT ... FROM Items INNER JOIN Attributes USING ItemID WHERE (Attributes.Name='Color' AND Attributes.Value='Red') AND (Attributes.Name='Size' AND Attributes.Value>'003') -- which is self-evident once you see it written out
if you want additional Attributes in the response, you'll need to add more joins (I included Size to show it's simple for one of the already JOINed Attributes). But what if you want to retrieve a response that has columns for some larger number of Attributes for the selected Items? The query will start to get more complex. At least SimpleDB handles this stuff for you transparently, so that the response to a query looks like what you'd expect, with columns for specified Attributes.
The point is that storing the data this way is fairly easy, but querying it becomes harder. And if your dataset gets big, you may need to give some thought as to the proper way to index Attributes.