十六进制字符串的运行长度编码(包括换行符)

发布于 2024-09-13 13:44:36 字数 1737 浏览 2 评论 0原文

我正在 C# winforms 应用程序中使用 GZipStream 类实现运行长度编码。

数据以一系列由换行符分隔的字符串形式提供,如下所示:

FFFFFFFF
FFFFFEFF
FDFFFFFF
00FFFFFF

在压缩之前,我将字符串转换为字节数组,但如果存在换行符,则这样做会失败。

每个换行符都很重要,但我不确定如何保留它们在编码中的位置。

这是我用来转换为字节数组的代码:

private static byte[] HexStringToByteArray(string _hex)
{
    _hex = _hex.Replace("\r\n", "");
    if (_hex.Length % 2 != 0) throw new FormatException("Hex string length must be divisible by 2.");
    int l = _hex.Length / 2;
    byte[] b = new byte[l];
    for (int i = 0; i < l; i++)
    b[i] = Convert.ToByte(_hex.Substring(i * 2, 2), 16);
    return b;
}

如果未删除换行符,Convert.ToByte 会抛出 FormatException,并包含以下信息:“其他不可解析的字符位于细绳。”这并不令我惊讶。

确保正确包含换行符的最佳方法是什么?

注意 我应该补充一点,该字符串的压缩版本本身必须是可以包含在 XML 文档中的字符串。

编辑:

我尝试简单地将字符串转换为字节数组,而不对其执行任何二进制转换,但在压缩方面仍然遇到问题。以下是相关方法:

    private static byte[] StringToByteArray(string _s)
    {
        Encoding enc = Encoding.ASCII;
        return enc.GetBytes(_s);
    }

    public static byte[] Compress(byte[] buffer)
    {
        MemoryStream ms = new MemoryStream();
        GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true);
        zip.Write(buffer, 0, buffer.Length);
        zip.Close();
        ms.Position = 0;

        byte[] compressed = new byte[ms.Length];
        ms.Read(compressed, 0, compressed.Length);

        byte[] gzBuffer = new byte[compressed.Length + 4];
        Buffer.BlockCopy(compressed, 0, gzBuffer, 4, compressed.Length);
        Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gzBuffer, 0, 4);
        return gzBuffer;
    }

I am implementing run length encoding using the GZipStream class in a C# winforms app.

Data is provided as a series of strings separated by newline characters, like this:

FFFFFFFF
FFFFFEFF
FDFFFFFF
00FFFFFF

Before compressing, I convert the string to a byte array, but doing so fails if newline characters are present.

Each newline is significant, but I am not sure how to preserve their position in the encoding.

Here is the code I am using to convert to a byte array:

private static byte[] HexStringToByteArray(string _hex)
{
    _hex = _hex.Replace("\r\n", "");
    if (_hex.Length % 2 != 0) throw new FormatException("Hex string length must be divisible by 2.");
    int l = _hex.Length / 2;
    byte[] b = new byte[l];
    for (int i = 0; i < l; i++)
    b[i] = Convert.ToByte(_hex.Substring(i * 2, 2), 16);
    return b;
}

Convert.ToByte throws a FormatException if the newlines are not removed, with the info: "Additional non-parsable characters are at the end of the string." Which doesn't surprise me.

What would be the best way to make sure newline characters can be included properly?

Note I should add that the compressed version of this string must itself be a string that can be included in an XML document.

Edit:

I have tried to simply convert the string to a byte array without performing any binary conversion on it, but am still having trouble with the compression. Here are the relevant methods:

    private static byte[] StringToByteArray(string _s)
    {
        Encoding enc = Encoding.ASCII;
        return enc.GetBytes(_s);
    }

    public static byte[] Compress(byte[] buffer)
    {
        MemoryStream ms = new MemoryStream();
        GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true);
        zip.Write(buffer, 0, buffer.Length);
        zip.Close();
        ms.Position = 0;

        byte[] compressed = new byte[ms.Length];
        ms.Read(compressed, 0, compressed.Length);

        byte[] gzBuffer = new byte[compressed.Length + 4];
        Buffer.BlockCopy(compressed, 0, gzBuffer, 4, compressed.Length);
        Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gzBuffer, 0, 4);
        return gzBuffer;
    }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

零時差 2024-09-20 13:44:36

首先:您确定仅压缩文本不会得到与压缩“转换为二进制”形式相同的结果吗?

假设您想继续转换为二进制,我可以建议两个选项:

  • 在每行的开头,写一个数字,说明该行中有多少字节。然后,当您解压缩时,您读取并转换那么多字节,然后写入换行符。如果您知道每行的长度始终小于 256 字节,则可以将其表示为单个字节。否则,您可能需要更大的固定大小,或一些可变大小的编码(例如“虽然设置了最高位,但这仍然是数字的一部分”) - 后者很快就会变得毛茸茸的。
  • 或者,通过将换行符表示为(例如)0xFF、0x00 来“转义”换行符。然后,您需要将真正的 0xFF 转义为(例如)0xFF 0xFF。当您读取数据时,如果读取 0xFF,您将读取下一个字节以确定它是代表换行符还是真正的 0xFF。

编辑:我相信你原来的方法从根本上来说是有缺陷的。从 GZipStream 中获得的任何内容都不是文本,并且不应将其视为使用 Encoding 的文本。不过,您可以通过调用 Convert.ToBase64String 轻松地将其转换为 ASCII 文本。顺便说一句,您错过的另一个技巧是在 MemoryStream 上调用 ToArray,这将为您提供 byte[] 形式的内容没有额外的混乱。

Firstly: are you certain that just compressing the text doesn't give much the same result as compressing the "converted to binary" form?

Assuming you want to go ahead with converting to binary, I can suggest two options:

  • At the start of each line, write a number stating how many bytes are in the line. Then when you decompress, you read and convert that many bytes, then write a newline. If you know that each line is always going to be less than 256 bytes long, you can just represent this as a single byte. Otherwise you might want a larger fixed size, or some variable size encoding (e.g. "while the top bit is set, this is still part of the number") - the latter gets hairy pretty quickly.
  • Alternatively, "escape" a newline by representing it as (say) 0xFF, 0x00. You'd then also need to escape a genuine 0xFF as (say) 0xFF 0xFF. When you read the data, if you read an 0xFF you'd then read the next byte to determine whether it represented a newline or a genuine 0xFF.

EDIT: I believe your original approach was fundamentally flawed. Whatever you get out of GZipStream is not text, and shouldn't be treated as if it were text using Encoding. However, you can turn it into ASCII text very easily, by calling Convert.ToBase64String. By the way, another trick you've missed is to call ToArray on the MemoryStream, which will give you the contents as a byte[] with no extra messing around.

小清晰的声音 2024-09-20 13:44:36

如果你贴出来的数据代表了所有的数据,那么你每4个字节就有一个换行符,所以如果转换回来的时候需要换行符,就每4个字节的数据贴一个换行符即可

If the data you posted is representative of all the data, then you have a newline every 4 bytes, so if you need it when converting back, just stick one in every 4 bytes of data

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文