delphi比较文本文件内容
我们需要比较两个(或更多)文本文件的内容以确定是否需要创建备份。如果它们不同,我们将创建一个新的备份。
我目前使用每个文件的 CRC 值来检查差异,但我想知道是否有更有效或更优雅的方法来检测文件之间的差异。
//Use madZIP to calculate the CRC fior this file
GetUncompressedFileInfo(Filename_1, Size_1, NewCRC);
//Use madZIP to calculate the CRC fior this file
GetUncompressedFileInfo(Filename_2, Size_2, OldCRC);
//if ThisFileHash = ExistingFileHash then
if (OldCRC <> NewCRC) then
CreateABackup;
问候,彼得。
We need to compare the contents of two (or more) text files to determine if we need to create a backup. If they differ we create a new backup.
I currently use the CRC value of each file to check for differences but I was wondering if there is a more efficient or elegant way of detecting differences between to files.
//Use madZIP to calculate the CRC fior this file
GetUncompressedFileInfo(Filename_1, Size_1, NewCRC);
//Use madZIP to calculate the CRC fior this file
GetUncompressedFileInfo(Filename_2, Size_2, OldCRC);
//if ThisFileHash = ExistingFileHash then
if (OldCRC <> NewCRC) then
CreateABackup;
Regards, Pieter.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
实际上,确保文件身份的最佳实践是存储内容哈希值(例如:CRC-32 或任何其他哈希函数)以及文件大小 。这样做将大大提高可靠性。 RE:存储 - 无需对已知多次不变的内容计算哈希值。
Actually, best practice to assure file identity is to store content hashes (eg: CRC-32 or any other hash function) and the file sizes. Doing so will increase reliability by magnitude. RE: to store - there is no need to compute hash on contents known to be unchanged more than once.
您还应该考虑使用增量备份。
我为我们的 SynProject 开源工具发布了一些优化的文件版本控制函数。 ProjectVersioning 单元中的
TVersions
类允许zip 容器内的二进制差异存储。我们专有但比 zip 更快的 SynLZ 算法 用于存储增量差异。它在实践中效果很好。
例如,请参阅
TVersions.FillStrings
方法来检索要更新的文件列表。请注意,您可能会发现一小时的差异,具体取决于当前的夏令时。以下是我们允许按日期进行比较的方式:
我们不读取此处的文件内容。出于备份目的,依靠文件日期来标记要比较的文件就足够了。然后对文件的两个版本执行差异比较。如果文件内容相同,则仅存储日期差异。
恕我直言,您不应该使用专有的 madzip 容器,而应该使用标准容器,例如 .zip。有几个,包括我们在 SynProject 或我们的 ORM 中使用的版本。它比 MadZip 更快,并且解压缩是在优化的 asm 中。请参阅 SynZip 单元进行低级压缩和简单的 .zip 读取器和写入器,以及 SynZipFiles 中更进化的类(用于同步项目)。对于纯 Delphi 版本,例如 madzip 版本,请检查速度更快的 PasZip 单元比 madzip (但 PasZip 不会用 Unicode Delphi 进行编译,而 SynZip 可以)。
You should also consider using an incremental backup.
I've published some optimized file versioning functions for our SynProject Open Source tool. The
TVersions
class, in ProjectVersioning unit allows binary diff storage inside a zip container.Our proprietary but faster-than-zip SynLZ algorithm is used to store incremental differences. It works very well on practice.
See e.g.
TVersions.FillStrings
method for retrieving a list of files to be updated.Be aware that you may discover a one-hour difference, depending on the current Daylight saving time. Here is how we allow a per-date comparison:
We don't read the file content here. For a backup purpose, it's enough to rely on the file date to mark the file as to be compared. Then a differential diff is performed about both versions of the file. If the file content is the same, it will store only the date difference.
IMHO you should not use the proprietary madzip container, but a standard one, like the .zip. There are several around, include our version used in SynProject or our ORM. It's faster than MadZip and decompression is in optimized asm. See SynZip unit for low-level compression and a simple .zip reader and writer, and more evolved classes in SynZipFiles (used in SynProject). For a pure Delphi version, like madzip one, check the PasZip unit which is faster than madzip (but PasZip won't compile with Unicode Delphi, whereas SynZip does).
CRC 可能更准确,而且相当高效。但是需要检查内容吗?
我假设您正在检查 CRC 以查看是否进行了修改并重新备份更新的文件。在这种情况下,FileAge() 就可以了。
CRC is probably more accurate, and pretty efficient. However do you need to check the contents?
I'm assuming you're checking the CRC to see if a modification has been made and re-backup the updated file. In which case FileAge() would do just fine.
CRC 不是检测文件更改的安全方法 - 加密哈希(如 MD5 或 SHA1)要好得多。
另一种方法(如构建系统使用的方法)是比较文件日期。如果文件比备份新,则需要新的备份。
CRC is not a safe method to detect file changes - cryptographic hashes (like MD5 or SHA1) are much better.
Another approach (like the one used by build systems) is to compare file dates. If the file is newer than backup, a new backup is needed.