如何设计一个布局可能随时间变化的通用数据库?
这是一个棘手的问题 - 如何以编程方式创建和查询一个我无法真正预见其内容的数据库?
我正在实现一个通用输入表单系统。用户可以创建具有所见即所得布局的 PHP 表单,并将其用于他希望的任何目的。他还可以查询输入。
因此,我们分为三个阶段:
- 设计和生成表单。这是一次性过程,但可以稍后编辑表格。这样就设计了数据库。
- 某人或几个人使用该表格 - 例如每日销售报告、库存管理、工资单等。他们对表格的输入被写入数据库。
- 其他人(可能是管理层)可以查询数据库并生成报告。
由于这些表单是通用的,我无法预测数据库结构 - 除了说它将反映 HTML 表单字段并包含来自编辑框、备忘录、单选按钮等集合的数据输入。
问题和备注:
A) 如何以表和列的形式最好地构建数据库?主键呢?我的第一个想法是使用控件名称来标识每一列,然后我意识到用户可以编辑表单并重命名,这样也许“name”变成“employee”或者“wages”变成“:salary”。我倾向于为每个人提供一个唯一的数字。
B) 如何最好地对行进行键控?我正在考虑一个允许我查询的时间戳和一个来自 A) C) 的行 Id 的列,
我必须处理列重命名/插入/删除。删除的时候,我不确定是否要从数据库中删除数据。即使用户不再从表单中输入它,他也可能希望查询先前输入的内容。或者可能存在一些保留数据的法律要求。列重命名/插入/删除中有任何问题吗?
D)对于查询,我可以让我的 PHP 询问数据库以获取列名称并生成一个包含列表的表单,其中每个条目都有一个数据库列名称,一个复选框来说明是否应该在查询中使用它,并基于列类型,一些选择标准。这应该足以构建诸如“职位 = '高级推销员'且薪水 > 50k”之类的搜索。
E)我可能必须生成一些奇特的图表 - 图形、直方图、饼图等,用于随时间推移的数值数据的查询结果。我需要为此找到一些好的 FOSS PHP。
F) 我还忘记了什么?
这一切对我来说似乎非常棘手,但我是数据库 n00b - 也许这对你们大师来说很简单?
编辑:请不要告诉我不要这样做。我别无选择:-(
编辑:在现实生活中,我不希望列重命名/插入/删除频繁。但是,运行几个月后可能需要对数据库进行更改。我确信这种情况经常发生。我担心我对这个问题的措辞很糟糕,人们认为每隔 10 分钟左右就会进行一次更改。
实际上,我的用户在布置表单时会定义一个数据库。可能第一次就做对了,并且永远不会改变它 - 特别是如果他们从纸质表格转换,即使他们决定改变,这种情况可能只会发生一两次,几个月或几年后 - 这可能发生在任何数据库中。
我不认为这里有一个特殊情况,也不认为我们应该专注于更改,也许更好的是专注于链接 - 比如说,对于一个文本输入、一个数字和一个备忘录,什么是好的主键方案? ?
Here's a tricky one - how do I programatically create and interrogate a database whose contents I can't really foresee?
I am implementing a generic input form system. The user can create PHP forms with a WYSIWYG layout and use them for any purpose he wishes. He can also query the input.
So, we have three stages:
- a form is designed and generated. This is a one-off procedure, although the form can be edited later. This designs the database.
- someone or several people make use of the form - say for daily sales reports, stock keeping, payroll, etc. Their input to the forms is written to the database.
- others, maybe management, can query the database and generate reports.
Since these forms are generic, I can't predict the database structure - other than to say that it will reflect HTML form fields and consist of a the data input from collection of edit boxes, memos, radio buttons and the like.
Questions and remarks:
A) how can I best structure the database, in terms of tables and columns? What about primary keys? My first thought was to use the control name to identify each column, then I realized that the user can edit the form and rename, so that maybe "name" becomes "employee" or "wages" becomes ":salary". I am leaning towards a unique number for each.
B) how best to key the rows? I was thinking of a timestamp to allow me to query and a column for the row Id from A)
C) I have to handle column rename/insert/delete. Foe deletion, I am unsure whether to delete the data from the database. Even if the user is not inputting it from the form any more he may wish to query what was previously entered. Or there may be some legal requirements to retain the data. Any gotchas in column rename/insert/delete?
D) For the querying, I can have my PHP interrogate the database to get column names and generate a form with a list where each entry has a database column name, a checkbox to say if it should be used in the query and, based on column type, some selection criteria. That ought to be enough to build searches like "position = 'senior salesman' and salary > 50k".
E) I probably have to generate some fancy charts - graphs, histograms, pie charts, etc for query results of numerical data over time. I need to find some good FOSS PHP for this.
F) What else have I forgotten?
This all seems very tricky to me, but I am database n00b - maybe it is simple to you gurus?
Edit: please don't tell me not to do it. I don't have any choice :-(
Edit: in real life I don't expect column rename/insert/delete to be frequent. However it is possible that after running for a few months a change to the database might be required. I am sure this happens regularly. I fear that I have worded this question badly and that people think that changes will be made willy-nilly every 10 minutes or so.
Realistically, my users will define a database when they lay out the form. They might get it right first time and never change it - especially if they are converting from paper forms. Even if they do decide to change, this might only happen once or twice ever, after months or years - and that can happen in any database.
I don't think that I have a special case here, nor that we should be concentrating on change. Perhaps better to concentrate on linkage - what's a good primary key scheme? Say, perhaps, for one text input, one numerical and a memo?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
不,这确实很棘手。从根本上来说,您所描述的不是数据库应用程序,而是数据库应用程序构建器。事实上,它听起来好像您想要编写类似 Google App Engine 或 Web 版本的 MS Access 的代码。一个工具需要花费大量的时间和专业知识。Google
通过使用其 BigTable 平台实现了灵活的模式,但问题是,这种灵活性使得它变得非常灵活。很难写出像“position = '高级推销员'并且薪水>”这样的查询50k”。
所以我认为 NoSQL 方法不是您所需要的。您想要构建一个生成和维护 RDBMS 模式的应用程序。这意味着您需要设计一个元数据存储库,从中可以生成动态 SQL 来构建和更改用户的模式并生成前端。
元数据模式需要存储的内容
用于模式生成:
用于前端生成
最后一点指出了您的潜在复杂性建议:像 Joe Soap 这样的常规表单设计者将无法制定 JS 来验证输入值是否在 X 和 Y 之间,因此您必须使用模板化规则来派生它。
这些绝不是详尽的清单,只是我的脑海中浮现出来的。
对于主键,我建议您使用 GUID 数据类型的列。时间戳不能保证是唯一的,尽管如果您在运行数据库的操作系统上运行到六个位置(即不是 Windows),则不太可能会发生冲突。
最后一句话
我之前已经构建过数据库模式生成器。他们很难走下去。调试动态 SQL 是一件困难的事情。因此,让自己变得更轻松:对表和列使用真实姓名。仅仅因为应用程序用户现在想要查看标题为 HEADCOUNT 的表单,并不意味着您必须重命名 EMPLOYEES 表。因此需要将显示的标签与模式对象名称分开。否则你会发现自己试图找出这个生成的 SQL 语句失败的原因:
这就是疯狂所在。
Nope, it really is tricky. Fundamentally what you're describing is not a database application, it is a database application builder. In fact, it sounds as if you want to code something like Google App Engine or a web version of MS Access. Writing such a tool will take a lot of time and expertise.
Google has implemented flexible schemas by using its BigTable platform. It allows you to flex the schema pretty much at will. The catch is, this flexibility makes it very hard to write queries like "position = 'senior salesman' and salary > 50k".
So I don't think the NoSQL approach is what you need. You want to build an application which generates and maintains RDBMS schemas. This means you need to design a metadata repository from which you can generate dynamic SQL to build and change the users' schemas and also generate the front end.
Things your metadata schema needs to store
For schema generation:
For front-end generation
That last points to the potential complexity in your proposal: a regular form designer like Joe Soap is not going to be able to formulate the JS to (say) validate that an input value is between X and Y, so you're going to have to derive it using templated rules.
These are by no means exhaustive lists, it's just off the top of my head.
For primary keys I suggest you use a column of GUID datatype. Timestamps aren't guaranteed to be unique, although if you run your database on an OS which goes to six places (i.e. not Windows) it's unlikely you'll get clashes.
last word
I have built database schema generators before. They are hard going. One thing which can be tough is debugging the dynamic SQL. So make it easier on yourself: use real names for tables and columns. Just because the app user now wants to see a form titled HEADCOUNT it doesn't mean you have to rename the EMPLOYEES table. Hence the need to separate the displayed label from the schema object name. Otherwise you'll find yourself trying to figure out why this generated SQL statement failed:
That way madness lies.
本质上,您是在问如何构建没有规范的应用程序。关系数据库的设计初衷并不是为了让您能够有效地做到这一点。解决这个问题的常见方法是实体-属性-值设计,对于您想要使用它的系统类型,失败的可能性几乎是 100%。
例如,名为“Name”的列可能变成“Salary”,这是没有意义的。如果工资值可以包含“Fred”、“Bob”、100K、1000、“很多”,那么您想要的总工资报告将如何工作?数据库的设计初衷并不是让任何人可以将任何东西放在任何地方。成功的数据库模式需要结构,这意味着需要在需要存储的内容和原因的规范方面付出努力。
因此,为了回答你的问题,我会重新思考这个问题。尝试制作一个可以存储宇宙中任何东西的应用程序的整个方法并不是成功的秘诀。
In essence, you are asking how to build an application without specifications. Relational databases were not designed so that you can do this effectively. The common approach to this problem is an Entity-Attribute-Value design and for the type of system in which you want to use it, the odds of failure are nearly 100%.
It makes no sense for example, that the column called "Name" could become "Salary". How would a report where you want the total salary work if the salary values could have "Fred", "Bob", 100K, 1000, "a lot"? Databases were not designed to let anyone put anything anywhere. Successful database schemas require structure which means effort with respect to specifications on what needs to be stored and why.
Therefore, to answer your question, I would rethink the problem. The entire approach of trying to make an app that can store anything in the universe is not a recipe for success.
就像托马斯说的,理性数据库不擅长解决你的问题。但是,您可能想看看 NoSQL 数据库,例如 MongoDB。
Like Thomas said, rational database is not good at your problem. However, you may want to take a look at NoSQL dbs like MongoDB.
参见这篇文章:
http://www.simple-talk.com/opinion/opinion -pieces/bad-carma/
获取其他人对您的问题的经验。
See this article:
http://www.simple-talk.com/opinion/opinion-pieces/bad-carma/
for someone else's experience of your problem.
这是针对 A) & B),这不是我做过的事情,但我认为这是 Reddit 使用的一个有趣的想法,请参阅此链接(查看第 3 课):
http://highscalability.com/blog/ 2010/5/17/7-lessons-learned-while-building-reddit-to-2.70-million-page.html
This is for A) & B), and is not something I have done but thought it was an interesting idea that Reddit put to use, see this link (look at Lesson 3):
http://highscalability.com/blog/2010/5/17/7-lessons-learned-while-building-reddit-to-270-million-page.html
不确定数据库,但对于图表而不是使用 PHP 的图表,我建议使用 javascript (http://www.reynoldsftw.com/2009/02/6-jquery-chart-plugins-reviewed/)。这样做的优点是一些处理被卸载到客户端以进行图表显示,并且它们可以是交互式的。
Not sure about the database but for charts instead of using PHP for the charts, I recommend looking into using javascript (http://www.reynoldsftw.com/2009/02/6-jquery-chart-plugins-reviewed/). Advantages to this are some of the processing is offloaded to the client side for chart displays and they can be interactive.
其他受访者是正确的,您应该对这种方法非常谨慎,因为它比传统的关系模型更复杂且性能较差 - 但我已经做了这种类型的事情来适应工作中的部门差异,并且它对于它得到的使用量。
基本上我是这样设置的,首先 - 一个表来存储有关用户想要创建的表单的一些信息(显然,根据需要进行调整):
然后是一个表来定义要在表单上显示的字段,包括任何限制
以及它们要呈现的顺序和页面(我的应用程序将字段呈现为
多页向导类型的流程)。
-
然后我的 Perl 代码将按第 1 页的顺序循环遍历字段,并将它们放在“向导表单”上...而“下一个”按钮将按顺序显示第 2 页的字段等。
我有 javascript 函数来强制执行还为每个字段指定了限制...然后
是一个表来保存用户输入的值:
这将是您想要做的事情的一个很好的起点,但要注意性能,因为它确实会减慢任何报告的速度如果他们添加 1000 个自定义字段。 :-)
The other respondents are correct that you should be very cautious with this approach because it is more complex and less performant than the traditional relational model - but I've done this type of thing to accommodate departmental differences at work, and it worked fine for the amount of use it got.
Basically I set it up like this, first - a table to store some information about the Form the user wants to create (obviously, adjust as you need):
Then a table to define the fields to be presented on the form including any limits
and the order and page they are to be presented (my app presented the fields as a
multi-page wizard type of flow).
-
Then my perl code would loop through the fields in order for page 1 and put them on the "wizard form" ... and the "next" button would present the page 2 fields in order etc.
I had javascript functions to enforce the limits specified for each field as well ...
Then a table to hold the values entered by the users:
That would be a good starting point for what you want to do, but keep an eye on performance as it can really slow down any reports if they add 1000 custom fields. :-)