如何将清理后的原始数据存储在数据库中?
原始数据存储在数据库(多个表)中。需要人工检查并修正。检查后的数据应与原始数据一起存储在数据库中。在这种情况下,创建两个单独的数据库(例如 raw_data 和checked_data)是个好主意吗?或者应该只有一个数据库? 谢谢
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
一般来说,在单个实例中工作比在多个实例中工作要容易得多。分布式事务执行速度较慢。它们需要更多的输入(总是需要添加数据库链接)。这不仅是方便的问题,也是诚信的问题。您可能希望确保给定记录位于 RAW 数据集或 CLEANSED 数据集中,但不能同时位于两者中。在单个数据库中检查此类事情更容易管理。
如何在单个数据库中组织事物在某种程度上取决于您选择的 DBMS 风格及其支持的内容。您可以拥有单个架构(用户帐户)并使用命名约定(例如前缀),例如 RAW_TABLE_1 和 CLEAN_TABLE_1。或者您可能想要使用不同的架构,这将允许您保留相同的表名称,例如 RAW_USER.TABLE_1 和 CLEAN_USER.TABLE_1。两种方法各有优点。不断提醒我们使用的是原始数据还是干净数据总是好的。另一方面,我们可能希望使用期望正常表名称的工具或应用程序。同义词在这方面可以提供帮助。
Generally speaking it is a lot easier to work within a single instance than across multiple instances. Distributed transactions perform slower. They require more typing (always having to add a database link). This is not just a matter of convenience but also of integrity. You may want to ensure that a given record is either in the RAW data set or the CLEANSED data set but not both. Checking this sort of thing is more manageable in a single database.
How you organize things in a single database depends to some extent on your chosen DBMS flavour, and what it supports. You can have a single schema (user account) and use a naming convention such as prefix, for example RAW_TABLE_1 and CLEAN_TABLE_1. Or you may want to use different schemas, which will allow you to retain the same table name, for example RAW_USER.TABLE_1 and CLEAN_USER.TABLE_1. Both approaches have advantages. It is always good to have a constant reminder of whether we are working with raw or clean data. On the other hand we may have tools or applications which we would like to use that expect the normal table names. Synonyms can help in this regard.
如果您的原始数据和检查数据非常不规范,那么只使用两个不同的数据库
通过规范化并使用程序,您可以将其维护在一个数据库中。
If your raw data and checked data are going to be very inormous than only use two different database
With normalization and using procedures you can maintain it in one database.
这里没有推荐的方法,除了你自己的喜好。您可以将清理后的数据与原始数据存储在同一数据库但不同的表中,并且可以在原始数据表中添加像 raw_ 这样的前缀。
否则,您可能会对每种类型的数据有一个单独的数据库。好处是分离,缺点是如果需要在两者之间进行连接等,成本会更高。
There is no recommended method here except your own preferences. You can store the cleansed data with raw data in same database but in different tables and may be prefix something like raw_ to the raw data tables.
Otherwise you may have a seperate database for each type of data. The benefits would be seperation where as the drawback would be costlier Join etc if need to be done between these two.