允许系统适应人为错误的做法?
系统有时必须适应现实世界中出现不良数据的可能性。考虑到某些数据源自纸质表格。表单本质上验证数据的手段有限。
示例 1:在一种表单中,用户需要在空白处输入整数距离(以英里为单位)。我们捕获以字符串形式写入的信息,因为我们并不总是最终获得整数值。
示例 2:在另一种形式上,我们捕获一个代码。该代码应该映射到我们系统中的代码之一。然而,有时表格上写的代码是不正确的。我们捕获代码并允许它以无效值存在,直到将来某个解决时间为止。也就是说,我们暂时允许不良数据,因为记录记录很重要,即使其中一些数据无效。
我有兴趣了解更多有关系统如何容纳不良数据(即人为错误)的信息。数据库应该是数据完整性的堡垒,但现实世界是混乱的,人们会犯错误。系统必须允许我们反思这些错误。
您开发的系统有哪些适应人为错误的方法?你用过哪些做法?你学到了什么教训?
还有关于该主题的进一步阅读吗? (我在谷歌上搜索时遇到了麻烦。)
Systems have to sometimes accommodate the possibility of real world bad data. Consider that some data originates with paper forms. And forms inherently have a limited means of validating data.
Example 1: On one form users are expected to enter an integer distance (in miles) into a blank. We capture the information as written as a string since we don't always end up getting integer values.
Example 2: On another form we capture a code. That code should map to one of the codes in our system. However, sometimes the code written on the form is incorrect. We capture the code and allow it to exist with an invalid value until some future time of resolution. That is, we temporarily allow bad data since it's important to record the record even if some of it is invalid.
I'm interested in learning more about how systems accommodate bad data, that is, human error. Databases are supposed to be bastions of data integrity, but the real world is messy and people make mistakes. Systems must allow us to reflect those mistakes.
What are some ways systems you've developed accommodate human error? What practices have you used? What lessons have you learned?
Any further reading on the topic? (I had trouble Googling it.)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我同意你的观点,无论我们做什么,都不能保证我们能够消除错误或不正确的数据。特别是但不仅限于用户输入。根据我的经验,复杂的集成项目中也存在同样的问题,在这些项目中,您必须集成和合并从不同系统检索的数据(通常不一致)。
一个好的策略是将输入与操作系统本身分离。首先,将用户(或外部系统)提供的数据放置在单独的数据存储中(例如不同的模式)。第二步将此数据加载到您的操作数据存储中,但前提是它符合严格的规则(例如,使用地址验证软件来验证给定地址)。这种提取、转换、加载 (ETL) 方法在数据仓库 (DWH) 解决方案中相当常见,但也可以以编程方式应用在事务系统中(根据我的经验)。
上述方法通常会导致异步过程,其中输入首先被提交,并且(可能)稍后外部实体(用户或系统)检索反馈,无论其数据是否正确。
编辑:为了进一步阅读,我建议看看 DWH 概念。尽管如此,您可能不想构建这样的东西,您可以部分应用这些概念:
http: //en.wikipedia.org/wiki/Extract,_transform,_load
http://en.wikipedia.org/wiki/Data_warehouse
http: //en.wikipedia.org/wiki/Data_cleansing
I agree with you, whatever we do there's no guarantee that we can get rid of bad or incorrect data. Especially, but not only, if it comes to user input. In my experience the same problems exist in complex integration projects, in which you have to integrate and merge (often inconsistent) data retrieved from different systems.
A good strategy is to decouple the input from the operational system itself. First, place user (or external system) provided data in a separate datastore (e.g. different schema). In a second step load this data into your operational datastore, but only if it confirms to strict rules (e.g. use address verification software to verify a given address). This Extract, Transform, Load (ETL) approach is fairly common in Data Warehousing (DWH) solutions, but can be applied programmatically in transactional systems as well (in my experience).
The above approach often leads to asynchronous processes in which the input is subitted first and (maybe) at a later time the external entity (user or system) retrives feedback whether its data was correct or not.
EDIT: For further readings I recommend to have a look at DWH concepts. Alhtough, you may not want to build such a thing, you could partially apply those concepts:
http://en.wikipedia.org/wiki/Extract,_transform,_load
http://en.wikipedia.org/wiki/Data_warehouse
http://en.wikipedia.org/wiki/Data_cleansing
我工作过的一个政府部门做了很多调查,其中大部分仍然是纸质的。
在不同的线程上;在某些时候,您希望根据您希望其符合的任何预期数据范围来验证传入的数据;在进入时拒绝它,你给了用户纠正它的机会 - 权衡是,每次你拒绝它,你都会增加他们放弃整个过程的机会。
在系统中的某个时刻,您需要指定用于验证的规则。归根结底,系统的智能程度取决于这些规则。您可以自己将这些开发为代码(可能是业务逻辑),也可以使用第三方组件。
对验证进行灵活的控制非常重要,因为它们可能会随着时间的推移而改变。
A government department I worked in does a lot of surveys, most of which are (were) still paper based.
On a different thread; at some point you want to validate the data coming in against any expected data ranges that you want it to conform to; buy rejecting it at the point of entry you give the user a chance to correct it - the trade off is that every time you reject it you increase the chance of them abandoning the whole process.
At some point in your system you need to specify the rules which will be used for validation. At the end of the day a system is only going to be as smart as those rules. You can develop these yourself into the code (probably the business logic) or you might use a 3rd party component.
having flexible control over the validation is pretty important as they are likely to change overtime.
老实说,从纸质系统迁移到 IT 的一个要点是消除这些错误并确保所有数据始终正确。我怀疑任何正确规划和开发的 IT 系统(尤其是商业财务系统)都会允许出现此类错误。反正不在我工作的公司...
To be honest with you, one point of migrating from paper-based systems to IT is to remove these errors and make sure all data is always correct. I doubt any correctly planned and developed IT system (especially business financial systems) would allow such errors. Not in the company I am working for anyway...
有很多软件工具可以解决您提到的各种问题。有些平台和工具可让您定义清理和转换数据以及处理验证错误的规则。这些技术广泛用于数据集成和商业智能应用程序。 Google 搜索“数据质量”或“数据集成”。
There are lots of software tools that address the kinds of problems you mention. There are platforms and tools that let you define rules for scrubbing and transforming data and handling validation errors. Those techniques are widely used for Data Integration and Business Intelligence applications. Google for "Data Quality" or "Data Integration".
最简单的事情是(这并不总是可行)设计用户输入数据的界面,以尽可能限制他们需要输入的文本量。根据我的经验,这似乎是很多问题的根源。一个简单的例子是提供一个选择或自动完成选择字段,
您可以做的一件事是在进入数据库之前尽一切可能确定数据是否正确。我尝试为输入数据的用户提供尽可能多的反馈,以便他们(理想情况下)可以在数据持久化之前解决一些问题。例如,可以非常快速地检查以确定输入的数据类型是否正确。
The easiest thing to do is to (this is not always possible) design the interface where users enter the data to limit as much as possible the amount of text that they need to enter. In my experience this seems to be where a lot of problems come from. One simple example of this is to provide a select, or auto-complete select field
One thing that you could do is do everything possible to determine if the data is correct before going into the db. I try to give the user entering the data as much feedback as possible so they can (ideally) fix some of the issues before the data gets persisted. For example, it is a very quick check to determine if the data being entered is of the correct type.
我在个人电脑时代之前就开始接触法律系统。诉讼支持数据库通常必须容纳事实上不正确、不完整和矛盾的信息。这需要不同的思维方式。
简短的版本。 。 。
您不是记录单个事实,而是记录有关一个事实的多个断言。归根结底就是设计一个数据库来存储来自此类断言的数据。
2011年1月2日20:00至2011年1月3日08:13期间他在家。
尼尔·莱姆斯于 2011 年 1 月 2 日 23:45 回家。
施拉格尔说他在2011-01-03 03:00在克罗格见到了尼尔·莱姆斯
I got started in legal systems before the PC era. Litigation support databases routinely have to accommodate factually incorrect, incomplete, and contradictory information. It takes a different way of thinking.
The short version . . .
Instead of recording a single fact, you record multiple assertions about a fact. It boils down to designing a database to store data from assertions like these.
that he was at home from 2011-01-02 20:00 until 2011-01-03 08:13.
that Neil Rimes came home at 2011-01-02 23:45.
Schlagel that he saw Neil Rimes at Kroger at 2011-01-03 03:00