当前位置：文江博客话题详情

动态数据输入值存储

发布于 2024-09-03 13:42:53 字数 1184 浏览 3 评论 0原文

我正在创建一个数据输入应用程序，允许用户创建输入架构。

我的第一个版本只是为每个条目模式创建了一个表，每个条目跨越具有适当数据类型的单个或多列（对于复杂类型）。这允许“快速”查询（在小型数据集上，因为我没有对所有列建立索引）和简单的同步，其中数据条目分布在多个数据库上。

但我对这个解决方案不太满意；唯一积极的事情是简单...... 我只能存储固定数量的列。我需要在所有列上创建索引。我需要根据架构更改重新创建表。

我的一些关键设计标准是：

非常快的查询（使用简单的特定于域的查询语言）
写入不必很快
许多并发用户
模式会经常更改
模式可能包含数千列
数据条目可能是分布式的并且需要同步化。
首选 MySQL 和 SQLite - 像 DB2 和 Oracle 这样的数据库是不可能的。
使用 .Net/Mono

我一直在考虑几种可能的设计，但它们似乎都不是一个好的选择。

解决方案 1：包含一个类型列和每个类型一个可为空列的联合表。

这避免了连接，但肯定会使用大量空间。

解决方案 2：键/值存储。所有值都存储为字符串并在需要时进行转换。

还使用大量空间，当然，我讨厌必须将所有内容都转换为字符串。

解决方案 3：使用 xml 数据库或将值存储为 xml。

如果没有任何经验，我会认为这相当慢（至少对于关系模型来说，除非有一些非常好的 xpath 支持）。我还想避免使用 xml 数据库，因为应用程序的其他部分更适合作为关系模型，并且能够连接数据会很有帮助。

我不禁认为有人已经解决了（部分）这个问题，但我找不到任何东西。也不太确定要搜索什么...

我知道市场研究正在为他们的调查问卷做类似的事情，但是开源实现很少，而且我发现的那些不太符合要求。

PSPP 有很多我正在思考的逻辑；原始列类型，多列、多行，快速查询、合并。太糟糕了，它不适用于数据库......当然......我不需要提供的 99% 的功能，但很多东西都没有包括在内。

我不确定这是问这样一个与设计相关的问题的正确地方，但我希望这里有人有一些提示，了解任何现有的工作，或者可以指出我到一个更好的地方来问这样的问题。

提前致谢！

原文

I'm creating a data-entry application where users are allowed to create the entry schema.

My first version of this just created a single table per entry schema with each entry spanning a single or multiple columns (for complex types) with the appropriate data type. This allowed for "fast" querying (on small datasets as I didn't index all columns) and simple synchronization where the data-entry was distributed on several databases.

I'm not quite happy with this solution though; the only positive thing is the simplicity...
I can only store a fixed number of columns. I need to create indexes on all columns. I need to recreate the table on schema changes.

Some of my key design criterias are:

Very fast querying (Using a simple domain specific query language)
Writes doesn't have to be fast
Many concurrent users
Schemas will change often
Schemas might contain many thousand columns
The data-entries might be distributed and needs syncronization.
Preferable MySQL and SQLite - Databases like DB2 and Oracle is out of the question.
Using .Net/Mono

I've been thinking of a couple of possible designs, but none of them seems like a good choice.

Solution 1: Union like table containing a Type column and one nullable column per type.

This avoids joins, but will definitly use a lot of space.

Solution 2: Key/value store. All values are stored as string and converted when needed.

Also use a lot of space, and of course, I hate having to convert everything to string.

Solution 3: Use an xml database or store values as xml.

Without any experience I would think this is quite slow (at least for the relational model unless there is some very good xpath support).
I also would like to avoid an xml database as other parts of the application fits better as a relational model, and being able to join the data is helpful.

I cannot help to think that someone has solved (some of) this already, but I'm unable to find anything. Not quite sure what to search for either...

I know market research is doing something like this for their questionnaires, but there are few open source implementations, and the ones I've found doesn't quite fit the bill.

PSPP has much of the logic I'm thinking of; primitive column types, many columns, many rows, fast querying and merging. Too bad it doesn't work against a database.. And of course... I don't need 99% of the provided functionality, but a lot of stuff not included.

I'm not sure this is the right place to ask such a design related question, but I hope someone here has some tips, know of any existing work, or can point me to a better place to ask such a question.

Thanks in advance!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

山田美奈子 2024-09-10 13:42:53

您是否已经考虑过最简单的解决方案：为每种数据类型创建一个表，并将数据集的架构存储在数据库中。最简单的解决方案：

DATASET Table (Virtual "table")
ID - primary key
Name - Name for the dataset/table

COLUMNSCHEMA Table (specifies the columns for one "dataset")
DATASETID - int (reference to Dataset-table)
COLID - smallint (unique # of the column)
Name - varchar
DataType - ("varchar", "int", whatever)

Row Table 
DATASETID
ID - Unique id for the "row"

ColumnData Table (one for each datatype)
ROWID - int (reference to Row-table)
COLID - smallint
DATA - (varchar/int/whatever)

要查询数据集（虚拟表），您必须使用 COLUMNSCHEMA 表中的架构信息动态构造 SQL 语句。

Have you already considered the most trivial solution: having one table for each of your datatypes and storing the schema of your dataset in the database as well. Most simple solution:

DATASET Table (Virtual "table")
ID - primary key
Name - Name for the dataset/table

COLUMNSCHEMA Table (specifies the columns for one "dataset")
DATASETID - int (reference to Dataset-table)
COLID - smallint (unique # of the column)
Name - varchar
DataType - ("varchar", "int", whatever)

Row Table 
DATASETID
ID - Unique id for the "row"

ColumnData Table (one for each datatype)
ROWID - int (reference to Row-table)
COLID - smallint
DATA - (varchar/int/whatever)

To query a dataset (a virtual table), you must then dynamically construct a SQL statement using the schema information in COLUMNSCHEMA table.

回复收藏 0 原文

~没有更多了~