十六进制字符串的运行长度编码(包括换行符)
我正在 C# winforms 应用程序中使用 GZipStream
类实现运行长度编码。
数据以一系列由换行符分隔的字符串形式提供,如下所示:
FFFFFFFF
FFFFFEFF
FDFFFFFF
00FFFFFF
在压缩之前,我将字符串转换为字节数组,但如果存在换行符,则这样做会失败。
每个换行符都很重要,但我不确定如何保留它们在编码中的位置。
这是我用来转换为字节数组的代码:
private static byte[] HexStringToByteArray(string _hex)
{
_hex = _hex.Replace("\r\n", "");
if (_hex.Length % 2 != 0) throw new FormatException("Hex string length must be divisible by 2.");
int l = _hex.Length / 2;
byte[] b = new byte[l];
for (int i = 0; i < l; i++)
b[i] = Convert.ToByte(_hex.Substring(i * 2, 2), 16);
return b;
}
如果未删除换行符,Convert.ToByte
会抛出 FormatException,并包含以下信息:“其他不可解析的字符位于细绳。”这并不令我惊讶。
确保正确包含换行符的最佳方法是什么?
注意 我应该补充一点,该字符串的压缩版本本身必须是可以包含在 XML 文档中的字符串。
编辑:
我尝试简单地将字符串转换为字节数组,而不对其执行任何二进制转换,但在压缩方面仍然遇到问题。以下是相关方法:
private static byte[] StringToByteArray(string _s)
{
Encoding enc = Encoding.ASCII;
return enc.GetBytes(_s);
}
public static byte[] Compress(byte[] buffer)
{
MemoryStream ms = new MemoryStream();
GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true);
zip.Write(buffer, 0, buffer.Length);
zip.Close();
ms.Position = 0;
byte[] compressed = new byte[ms.Length];
ms.Read(compressed, 0, compressed.Length);
byte[] gzBuffer = new byte[compressed.Length + 4];
Buffer.BlockCopy(compressed, 0, gzBuffer, 4, compressed.Length);
Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gzBuffer, 0, 4);
return gzBuffer;
}
I am implementing run length encoding using the GZipStream
class in a C# winforms app.
Data is provided as a series of strings separated by newline characters, like this:
FFFFFFFF
FFFFFEFF
FDFFFFFF
00FFFFFF
Before compressing, I convert the string to a byte array, but doing so fails if newline characters are present.
Each newline is significant, but I am not sure how to preserve their position in the encoding.
Here is the code I am using to convert to a byte array:
private static byte[] HexStringToByteArray(string _hex)
{
_hex = _hex.Replace("\r\n", "");
if (_hex.Length % 2 != 0) throw new FormatException("Hex string length must be divisible by 2.");
int l = _hex.Length / 2;
byte[] b = new byte[l];
for (int i = 0; i < l; i++)
b[i] = Convert.ToByte(_hex.Substring(i * 2, 2), 16);
return b;
}
Convert.ToByte
throws a FormatException if the newlines are not removed, with the info: "Additional non-parsable characters are at the end of the string." Which doesn't surprise me.
What would be the best way to make sure newline characters can be included properly?
Note I should add that the compressed version of this string must itself be a string that can be included in an XML document.
Edit:
I have tried to simply convert the string to a byte array without performing any binary conversion on it, but am still having trouble with the compression. Here are the relevant methods:
private static byte[] StringToByteArray(string _s)
{
Encoding enc = Encoding.ASCII;
return enc.GetBytes(_s);
}
public static byte[] Compress(byte[] buffer)
{
MemoryStream ms = new MemoryStream();
GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true);
zip.Write(buffer, 0, buffer.Length);
zip.Close();
ms.Position = 0;
byte[] compressed = new byte[ms.Length];
ms.Read(compressed, 0, compressed.Length);
byte[] gzBuffer = new byte[compressed.Length + 4];
Buffer.BlockCopy(compressed, 0, gzBuffer, 4, compressed.Length);
Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gzBuffer, 0, 4);
return gzBuffer;
}
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
首先:您确定仅压缩文本不会得到与压缩“转换为二进制”形式相同的结果吗?
假设您想继续转换为二进制,我可以建议两个选项:
编辑:我相信你原来的方法从根本上来说是有缺陷的。从
GZipStream
中获得的任何内容都不是文本,并且不应将其视为使用Encoding
的文本。不过,您可以通过调用Convert.ToBase64String
轻松地将其转换为 ASCII 文本。顺便说一句,您错过的另一个技巧是在MemoryStream
上调用ToArray
,这将为您提供byte[]
形式的内容没有额外的混乱。Firstly: are you certain that just compressing the text doesn't give much the same result as compressing the "converted to binary" form?
Assuming you want to go ahead with converting to binary, I can suggest two options:
EDIT: I believe your original approach was fundamentally flawed. Whatever you get out of
GZipStream
is not text, and shouldn't be treated as if it were text usingEncoding
. However, you can turn it into ASCII text very easily, by callingConvert.ToBase64String
. By the way, another trick you've missed is to callToArray
on theMemoryStream
, which will give you the contents as abyte[]
with no extra messing around.如果你贴出来的数据代表了所有的数据,那么你每4个字节就有一个换行符,所以如果转换回来的时候需要换行符,就每4个字节的数据贴一个换行符即可
If the data you posted is representative of all the data, then you have a newline every 4 bytes, so if you need it when converting back, just stick one in every 4 bytes of data