管理脚本内文件更改的最佳实践

发布于 2024-12-17 14:22:35 字数 499 浏览 1 评论 0原文

我有一个 BASH 脚本,它对文件执行许多操作,例如:

cp input.txt file.tmp1
sed (code) file.tmp1 > file.tmp2
sed (code) file.tmp2 > file.tmp3
sed (code) file.tmp3 > file.tmp4
sed (code) file.tmp4 > file.tmp5
sed (code) file.tmp5 > file.tmp6
sed (code) file.tmp6 > file.tmp7
cp output.txt

这样:

  • 原始文件不变。
  • 我可以检查每个阶段的文件更改,只是为了确保我的代码没有做任何错误。

然而,这似乎不是一种非常理想的处理文件的方式。

  • 有更好的方法吗?
  • 是否有任何工具可以帮助检查更改,看看是否引入了任何异常情况?

I have a BASH script which performs many actions on a file, for e.g.:

cp input.txt file.tmp1
sed (code) file.tmp1 > file.tmp2
sed (code) file.tmp2 > file.tmp3
sed (code) file.tmp3 > file.tmp4
sed (code) file.tmp4 > file.tmp5
sed (code) file.tmp5 > file.tmp6
sed (code) file.tmp6 > file.tmp7
cp output.txt

In this way:

  • The original file is unchanged.
  • I can check the files changes at each stage, just to make sure my code did not do anything wrong.

However, this seems a not very ideal way to handle the files.

  • Is there a better way to do this?
  • Is there any tool which can help inspect the changes, just to see if anything unusual was introduced?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

沉睡月亮 2024-12-24 14:22:35

使用临时文件是一个好主意,但您应该使用 mktemp(1) 来安全地创建临时文件。

虽然使用多个文件进行多次传递并没有什么问题,但请考虑使用 mktemp -d 为所有文件创建临时目录,以确保永远不会覆盖用户关心的任何内容。

但是,如果您永远不会查看中间文件,则可以像这样处理多个传递:

sed (code) input.txt | sed (code) | sed (code) | sed (code) | ...
    sed (code) > output.txt

如果一个失败,它们都会失败,这可以使错误处理更容易。完成后无需删除任何临时文件。

如果您想检查管道是否有错误,tee 将为您提供帮助。它将所有输入重定向到其标准输出和管道,用法如下:

sed (code) input.txt | sed (code) | tee state-of-pipe.txt | sed (code) | ...
    sed (code) > output.txt

您可以使用 diff -u input.txt output.txt 检查更改。 diff(1) 是一个逐行差异程序,-u 统一输出非常容易阅读。 wdiff(1) 是一个逐字差异程序,对于某些情况可能更有用。

xxdiff(1) 是一个出色的 GUI 界面,用于检查两个文件之间的差异 - 它会尽力向您显示单独更改的字符。 (它对于处理 CVS 和 SVN 风格的冲突文件也非常有用,但这完全是另一回事。)

Working on a temporary file is a fine idea, but you should use mktemp(1) to make your temporary file safely.

While there's nothing wrong with using multiple files for multiple passes, consider using mktemp -d to create a temporary directory for all your files to ensure you never overwrite anything the user cares about.

But if you're never going to look at the intermediate files, multiple passes can be handled like this:

sed (code) input.txt | sed (code) | sed (code) | sed (code) | ...
    sed (code) > output.txt

If one fails, they all fail, which can make for easier error handling. There's no temporary files to remove when you're finished.

If you like to inspect the pipelines for errors, tee will help you. It redirects all input both to its standard output and a pipe, used like:

sed (code) input.txt | sed (code) | tee state-of-pipe.txt | sed (code) | ...
    sed (code) > output.txt

You can inspect the changes by using diff -u input.txt output.txt. diff(1) is a line-wise differences program, and the -u unified output is pretty easy to read. wdiff(1) is a word-wise differences program, which might be more useful for some cases.

And xxdiff(1) is a superb GUI interface for inspecting the differences between two files -- it will go to some effort to show you individually changed characters. (It is also fantastic for handling CVS- and SVN-style conflict files, but that's another matter completely.)

假面具 2024-12-24 14:22:35

更有效的方法是使用管道。例如:

cat input.txt | sed ... | ... | sed ... > output.txt

问题是你无法检查不同阶段的变化。

A more effective way would be to use pipes. E.g.:

cat input.txt | sed ... | ... | sed ... > output.txt

The problem is that you can not check the changes of the different stages.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文