十六进制字符串的运行长度编码（包括换行符）

发布于 2024-09-13 13:44:36 字数 1737 浏览 2 评论 0原文

我正在 C# winforms 应用程序中使用 GZipStream 类实现运行长度编码。

数据以一系列由换行符分隔的字符串形式提供，如下所示：

FFFFFFFF
FFFFFEFF
FDFFFFFF
00FFFFFF

在压缩之前，我将字符串转换为字节数组，但如果存在换行符，则这样做会失败。

每个换行符都很重要，但我不确定如何保留它们在编码中的位置。

这是我用来转换为字节数组的代码：

private static byte[] HexStringToByteArray(string _hex)
{
    _hex = _hex.Replace("\r\n", "");
    if (_hex.Length % 2 != 0) throw new FormatException("Hex string length must be divisible by 2.");
    int l = _hex.Length / 2;
    byte[] b = new byte[l];
    for (int i = 0; i < l; i++)
    b[i] = Convert.ToByte(_hex.Substring(i * 2, 2), 16);
    return b;
}

如果未删除换行符，Convert.ToByte 会抛出 FormatException，并包含以下信息：“其他不可解析的字符位于细绳。”这并不令我惊讶。

确保正确包含换行符的最佳方法是什么？

注意我应该补充一点，该字符串的压缩版本本身必须是可以包含在 XML 文档中的字符串。

编辑：

我尝试简单地将字符串转换为字节数组，而不对其执行任何二进制转换，但在压缩方面仍然遇到问题。以下是相关方法：

    private static byte[] StringToByteArray(string _s)
    {
        Encoding enc = Encoding.ASCII;
        return enc.GetBytes(_s);
    }

    public static byte[] Compress(byte[] buffer)
    {
        MemoryStream ms = new MemoryStream();
        GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true);
        zip.Write(buffer, 0, buffer.Length);
        zip.Close();
        ms.Position = 0;

        byte[] compressed = new byte[ms.Length];
        ms.Read(compressed, 0, compressed.Length);

        byte[] gzBuffer = new byte[compressed.Length + 4];
        Buffer.BlockCopy(compressed, 0, gzBuffer, 4, compressed.Length);
        Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gzBuffer, 0, 4);
        return gzBuffer;
    }

原文

I am implementing run length encoding using the GZipStream class in a C# winforms app.

Data is provided as a series of strings separated by newline characters, like this:

FFFFFFFF
FFFFFEFF
FDFFFFFF
00FFFFFF

Before compressing, I convert the string to a byte array, but doing so fails if newline characters are present.

Each newline is significant, but I am not sure how to preserve their position in the encoding.

Here is the code I am using to convert to a byte array:

private static byte[] HexStringToByteArray(string _hex)
{
    _hex = _hex.Replace("\r\n", "");
    if (_hex.Length % 2 != 0) throw new FormatException("Hex string length must be divisible by 2.");
    int l = _hex.Length / 2;
    byte[] b = new byte[l];
    for (int i = 0; i < l; i++)
    b[i] = Convert.ToByte(_hex.Substring(i * 2, 2), 16);
    return b;
}

Convert.ToByte throws a FormatException if the newlines are not removed, with the info: "Additional non-parsable characters are at the end of the string." Which doesn't surprise me.

What would be the best way to make sure newline characters can be included properly?

Note I should add that the compressed version of this string must itself be a string that can be included in an XML document.

Edit:

I have tried to simply convert the string to a byte array without performing any binary conversion on it, but am still having trouble with the compression. Here are the relevant methods:

    private static byte[] StringToByteArray(string _s)
    {
        Encoding enc = Encoding.ASCII;
        return enc.GetBytes(_s);
    }

    public static byte[] Compress(byte[] buffer)
    {
        MemoryStream ms = new MemoryStream();
        GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true);
        zip.Write(buffer, 0, buffer.Length);
        zip.Close();
        ms.Position = 0;

        byte[] compressed = new byte[ms.Length];
        ms.Read(compressed, 0, compressed.Length);

        byte[] gzBuffer = new byte[compressed.Length + 4];
        Buffer.BlockCopy(compressed, 0, gzBuffer, 4, compressed.Length);
        Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gzBuffer, 0, 4);
        return gzBuffer;
    }

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

零時差 2024-09-20 13:44:36

首先：您确定仅压缩文本不会得到与压缩“转换为二进制”形式相同的结果吗？

假设您想继续转换为二进制，我可以建议两个选项：

在每行的开头，写一个数字，说明该行中有多少字节。然后，当您解压缩时，您读取并转换那么多字节，然后写入换行符。如果您知道每行的长度始终小于 256 字节，则可以将其表示为单个字节。否则，您可能需要更大的固定大小，或一些可变大小的编码（例如“虽然设置了最高位，但这仍然是数字的一部分”） - 后者很快就会变得毛茸茸的。
或者，通过将换行符表示为（例如）0xFF、0x00 来“转义”换行符。然后，您还需要将真正的 0xFF 转义为（例如）0xFF 0xFF。当您读取数据时，如果读取 0xFF，您将读取下一个字节以确定它是代表换行符还是真正的 0xFF。

编辑：我相信你原来的方法从根本上来说是有缺陷的。从 GZipStream 中获得的任何内容都不是文本，并且不应将其视为使用 Encoding 的文本。不过，您可以通过调用 Convert.ToBase64String 轻松地将其转换为 ASCII 文本。顺便说一句，您错过的另一个技巧是在 MemoryStream 上调用 ToArray，这将为您提供 byte[] 形式的内容没有额外的混乱。