是否可以分片进行CRC-32计算?

发布于 2024-11-17 13:56:04 字数 967 浏览 5 评论 0原文

我使用这个简单的函数来计算给定文件的 CRC 校验和:

long i, j = 0;
int k = 0;
uint crc = 0xFFFFFFFF;
FileInfo file_info = new FileInfo(file);
byte[] file_buffer = new byte[32768];

FileStream file_stream = new FileStream(@file, FileMode.Open);
while ((i = file_stream.Read(file_buffer, 0, file_buffer.Count())) > 0)
{
    for (j = 0; j < i; j++)
    {
        uint before = crc;
        k = (int)((crc ^ file_buffer[j]) & 0x000000FFL);
        uint after = (uint)((crc >> 8) & 0x00FFFFFFL) ^ crc32_table[k];
        crc = after;
        uint test = (uint)((crc << 8) & 0x00FFFFFFL) ^ crc32_table[k];
        MessageBox.Show((~crc).ToString("X"));
    }
}
file_stream.Close();
return ~crc;

我的问题是:假设我有一个大文件,比如说 100MB。前 50MB 和最后 50MB 的 CRC-32 计算与 100MB 文件的 CRC-32 计算之间是否有任何联系?

我问的原因是我有一些非常大的文件(〜10GB给予或接受)需要一些时间才能生成,但是在生成它们时,大多数部分保持静态,但是中间的部分(已知点)和开头(标题,也称为部分/长度)。计算 10GB 文件的 CRC-32 校验和需要相当长的时间,所以我想知道是否有任何方法可以分块进行?

I use this trivial function to calculate the CRC checksum of a given file:

long i, j = 0;
int k = 0;
uint crc = 0xFFFFFFFF;
FileInfo file_info = new FileInfo(file);
byte[] file_buffer = new byte[32768];

FileStream file_stream = new FileStream(@file, FileMode.Open);
while ((i = file_stream.Read(file_buffer, 0, file_buffer.Count())) > 0)
{
    for (j = 0; j < i; j++)
    {
        uint before = crc;
        k = (int)((crc ^ file_buffer[j]) & 0x000000FFL);
        uint after = (uint)((crc >> 8) & 0x00FFFFFFL) ^ crc32_table[k];
        crc = after;
        uint test = (uint)((crc << 8) & 0x00FFFFFFL) ^ crc32_table[k];
        MessageBox.Show((~crc).ToString("X"));
    }
}
file_stream.Close();
return ~crc;

My question is this: Say I have a large file, of say 100MB. Is there any link between a CRC-32 calculation of the first 50MB and the last 50MB and the CRC-32 calculation of the 100MB file?

The reason I'm asking, is I have some very large files (~10GB give or take) which take some time to generate, but while they're being generated, most parts remain static, however, parts in the middle (known point) and right at the start (header, also known part/length). Calculating a CRC-32 checksum of a 10GB file takes quite some time, so I was wondering if there was any way to do it in chunks?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

客…行舟 2024-11-24 13:56:04

是的。请参阅 crc32_combine_() zlib 为例。

Yes. See crc32_combine_() in zlib for an example.

上课铃就是安魂曲 2024-11-24 13:56:04

确实可以并行化 CRC-32 计算,但细节很混乱,我需要花大约一天的时间来编写代码。

让我们看一下基本的 CRC 算法,其中没有求反,也没有位反转。

对于要计算 CRC 的字节串,我们将其称为消息。基本思想是,将消息视为 GF(2) 中的多项式,然后计算 CRC 多项式的余数。

基本 CRC 算法是加法/线性的。如果有两条长度相同的消息 a 和 b,则 CRC(a XOR b) = CRC(a) XOR CRC(b)。

此外,如果在消息右侧填充 n 个零,则新的 CRC 将是旧的 CRC 乘以 x^n mod CRC 多项式。

 

话虽如此,解决问题的唯一方法是真正理解 CRC 算法背后的数学原理并编写自己的自定义代码。这是 CRC 的一个很长但非常完整的解释:http://www.ross.net/crc /download/crc_v3.txt

It is indeed possible to parallelize the CRC-32 computation, but the details are messy and I'd need to spend about a day to come up with the code.

Let's look at the basic CRC algorithm, where there is no negation and no bit reversal.

For the string of bytes that you want to compute the CRC of, let's call it the message. The basic idea is that you treat the message as a polynomial in GF(2), and you compute its remainder modulo the CRC polynomial.

The basic CRC algorithm is additive/linear. If you have two messages a and b of the same length, then CRC(a XOR b) = CRC(a) XOR CRC(b).

Furthermore, if you pad the message on the right side with n zeros, the new CRC will be the old CRC times x^n mod the CRC polynomial.

 

With all that said, the only way to solve your problem is to really understand the mathematics behind the CRC algorithm and write your own custom code. Here's a long but very complete explanation of CRCs: http://www.ross.net/crc/download/crc_v3.txt

允世 2024-11-24 13:56:04

这个方程是关键:

CRC(a XOR b) == CRC(a) XOR CRC(b)

假设您要计算以下消息的 CRC:

"Always desire to learn something useful."

存在计算 CRC 的函数:

crc_join(crc_part1("Always desire to lea"),
         crc_part2("rn something useful."))

如果 crc_part1crc_part2 补零 (\0)他们的参数如图所示,crc_join只是异或。

crc_part1 = crc("Always desire to lea\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0")
crc_part2 = crc("\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0rn something useful.")

可以使用查找表在 crc_part1 中解释尾随零。 crc_part2 中的前导零可以被忽略。

参考文献:

  1. 基于软件的 CRC 的高速并行架构
    永州。做吧,宋禄。尹泰奎。金光义. Pyun 和 Sin-Chong。公园
  2. https://en.wikipedia.org/wiki/Cyclic_redundancy_check

This equation is key:

CRC(a XOR b) == CRC(a) XOR CRC(b)

Suppose you want to compute the CRC of the following message:

"Always desire to learn something useful."

There exist functions to compute the CRC as:

crc_join(crc_part1("Always desire to lea"),
         crc_part2("rn something useful."))

If crc_part1 and crc_part2 zeros pad (\0) their arguments as shown, crc_join is just XOR.

crc_part1 = crc("Always desire to lea\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0")
crc_part2 = crc("\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0rn something useful.")

The trailing zeros can be accounted for in crc_part1 with lookup tables. The leading zeros can be ignored in crc_part2.

References:

  1. High-speed Parallel Architecture for Software-based CRC
    Youngju. Do, Sung-Rok. Yoon, Taekyu. Kim, Kwang Eui. Pyun and Sin-Chong. Park
  2. https://en.wikipedia.org/wiki/Cyclic_redundancy_check
与往事干杯 2024-11-24 13:56:04

我知道这是一个老问题,但这是我用于“分块”CRC 计算的方法,以防对任何人有帮助:

public static class Crc32Checksum
{
    private static readonly uint[] Table = GenerateTable();

    /// <summary>
    /// Calculates a CRC32 value for the data given.
    /// </summary>
    /// <param name="data">Data contents</param>
    /// <param name="offset">Byte offset to start reading</param>
    /// <param name="count">Number of bytes to read</param>
    /// <returns>The computed CRC32 value.</returns>
    public static int Calculate(byte[] data, int offset, int count) 
        => Calculate(0, data, offset, count);

    /// <summary>
    /// Calculates a new CRC32 value given additional data for the current CRC value.
    /// </summary>
    /// <param name="currentCrc">The current CRC value to start with</param>
    /// <param name="data">Additional data contents</param>
    /// <param name="offset">Byte offset to start reading</param>
    /// <param name="count">Number of bytes to read</param>
    /// <returns>The computed CRC32 value.</returns>
    public static int Calculate(int currentCrc, byte[] data, int offset, int count)
    {
        unchecked
        {
            uint crc = ~(uint)currentCrc;

            for (int i = offset, end = offset + count; i < end; i++)
                crc = (crc >> 8) ^ Table[(crc ^ data[i]) & 0xFF];

            return (int)~crc;
        }
    }

    private static uint[] GenerateTable()
    {
        unchecked
        {
            var table = new uint[256];

            const uint poly = 0xEDB88320;

            for (uint i = 0; i < table.Length; i++)
            {
                uint crc = i;

                for (int j = 8; j > 0; j--)
                {
                    if ((crc & 1) == 1)
                        crc = (crc >> 1) ^ poly;
                    else
                        crc >>= 1;
                }

                table[i] = crc;
            }

            return table;
        }
    }
}

I know this is an old question but this is what I use for "chunking" CRC calculations in case this helps anyone:

public static class Crc32Checksum
{
    private static readonly uint[] Table = GenerateTable();

    /// <summary>
    /// Calculates a CRC32 value for the data given.
    /// </summary>
    /// <param name="data">Data contents</param>
    /// <param name="offset">Byte offset to start reading</param>
    /// <param name="count">Number of bytes to read</param>
    /// <returns>The computed CRC32 value.</returns>
    public static int Calculate(byte[] data, int offset, int count) 
        => Calculate(0, data, offset, count);

    /// <summary>
    /// Calculates a new CRC32 value given additional data for the current CRC value.
    /// </summary>
    /// <param name="currentCrc">The current CRC value to start with</param>
    /// <param name="data">Additional data contents</param>
    /// <param name="offset">Byte offset to start reading</param>
    /// <param name="count">Number of bytes to read</param>
    /// <returns>The computed CRC32 value.</returns>
    public static int Calculate(int currentCrc, byte[] data, int offset, int count)
    {
        unchecked
        {
            uint crc = ~(uint)currentCrc;

            for (int i = offset, end = offset + count; i < end; i++)
                crc = (crc >> 8) ^ Table[(crc ^ data[i]) & 0xFF];

            return (int)~crc;
        }
    }

    private static uint[] GenerateTable()
    {
        unchecked
        {
            var table = new uint[256];

            const uint poly = 0xEDB88320;

            for (uint i = 0; i < table.Length; i++)
            {
                uint crc = i;

                for (int j = 8; j > 0; j--)
                {
                    if ((crc & 1) == 1)
                        crc = (crc >> 1) ^ poly;
                    else
                        crc >>= 1;
                }

                table[i] = crc;
            }

            return table;
        }
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文