删除连续的、相同的、重复的文件

发布于 2024-10-30 14:18:32 字数 641 浏览 7 评论 0原文

我有一台运行 Windows Server 2003 R2 Enterprise 的服务器，每个目录包含 50,000 到 250,000 个 1KB 文本文件。文件名是连续的（例如，MLLP000001.rcv、MLLP000002.rcv 等），并且相同的文件将是连续的。一旦后续文件不同，我就不会收到另一个相同的文件。

我需要一个可以执行以下操作的脚本，但我不知道从哪里开始。

for each file in the target directory index 'i'
{
  for each file in the target directory index 'j' = i+1
  {
    compare the hash values of files i and j

    if the hashes are identical
      delete file j
    if the hashes differ
      set i = j // to skip past the files that are now deleted
      break
  }
}

我尝试了DOS批处理脚本，但这确实很麻烦，我无法摆脱内部循环，并且它会自行跳闸，因为外部循环有目录中的文件列表，但该列表不断变化。据我所知，VBScript 没有哈希函数。

原文

I have a server running Windows Server 2003 R2 Enterprise with directories of anywhere between 50,000 to 250,000 1KB text files each. The filenames are sequential (e.g., MLLP000001.rcv, MLLP000002.rcv, etc.) and identical files will be sequential. Once subsequent files differ, I can expect I won't receive another identical file.

I need a script that will do the following, but I don't know where to begin.

for each file in the target directory index 'i'
{
  for each file in the target directory index 'j' = i+1
  {
    compare the hash values of files i and j

    if the hashes are identical
      delete file j
    if the hashes differ
      set i = j // to skip past the files that are now deleted
      break
  }
}

I tried DOS batch scripts, but that's really cumbersome, I can't break out of the inner loop, and it trips over itself because the outer loop has a list of files in the directory, but that list is constantly changing. VBScript doesn't have a hash function as far as I know.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

酒绊 2024-11-06 14:18:32

既然文件大小只有 1KB，为什么不进行按位比较并避免哈希呢？

回复收藏 0 原文

耶耶耶 2024-11-06 14:18:32

听起来你可以做类似的事情：

Set Files to an array of files in a given directory.
Set PreviousHash to hash of the first file in the Files.

For each CurrentFile file after the first in Files,
    Set CurrentHash to hash of the CurrentFile.
    If CurrentHash is equal to PreviousHash, then delete CurrentFile.
    Else, set PreviousHash to CurrentHash.

Sounds like you could do something like:

Set Files to an array of files in a given directory.
Set PreviousHash to hash of the first file in the Files.

For each CurrentFile file after the first in Files,
    Set CurrentHash to hash of the CurrentFile.
    If CurrentHash is equal to PreviousHash, then delete CurrentFile.
    Else, set PreviousHash to CurrentHash.

回复收藏 0 原文

~没有更多了~