有没有用于比较转储文件中的数据的工具?
这个问题与这个有点相似,但更具体。我想通过可视化两个转储文件中的差异来测试 ETL 过程。转储文件包含整个数据库。差异不会出现在模式上,因为这种比较很容易手动进行,而是数据上的细微差异。
有任何工具可以做到这一点吗?我想象的可视化可能是这样的:
第 1 列在 10 中存在 0.02% 的差异 行。
当然,还应该可以详细地查看每行中的实际差异。
是否存在这样的工具。
This question is slightly similar to this one, but more specific. I would like to test an ETL process by getting a visualization of the differences in two dump files. The dump files contain the entire database. The differences are not going to be on the schema as such comparisons are easy to make manually, but rather slight differences in the data.
Are there any tools for doing this? The visualization I imagine could be something like:
Column1 has 0.02% difference in 10
rows.
It should of course also be possible to verbose to see the actual differences in each row.
Does such a tool exist.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
文本实用程序通常是您的最佳选择。
但如果我正在测试 ETL 过程,我不想立即测试整个转储。 (在我的例子中,这将是数百万行。)我宁愿自动将每个表转储到一个单独的文件中。这样就可以轻松判断表中数据的两个版本是否相同。
如果文件相同,
cmp
不会产生输出。diff
会告诉你差异在哪里。当我必须在 Windows 下做这些事情时,我使用 Cygwin 。
Text utilities are usually your best bet.
But if I were testing an ETL process, I wouldn't want to test the entire dump at once. (In my case, that would be millions of lines.) I'd rather automate dumping each table into a separate file. Then it's easy to tell whether two versions of the data from a table are identical.
cmp
produces no output if the files are identical.diff
will tell you where the differences are.I use Cygwin when I have to do this stuff under Windows.