字符串压缩结果为字符串

发布于 2024-10-04 14:25:18 字数 1016 浏览 4 评论 0原文

我在互联网上创建了以下代码用于字符串压缩。当我压缩一个简单的字符串时,返回值有很大不同。

例如,压缩(“abc”)返回“AwAAAB+LCAAAAAAABADtvQdgHEmWJSYvbcp7f0r1StfgdKEIgGATJNiQQBDswYjN5pLsHWlHIymrKoHKZVZlXWYWQMztnbz33nvvvffee++997o7nU4n99//P1xmZAFs9s5K2smeIY CqyB8/fnwfPyKyyfT/AcJBJDUDAAAA"

我可以获取简单的字符串结果吗?

谢谢

using System.IO.Compression;
using System.Text;
using System.IO;

public static string Compress(string text)
{
byte[] buffer = Encoding.UTF8.GetBytes(text);
MemoryStream ms = new MemoryStream();
using (GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true))
{
zip.Write(buffer, 0, buffer.Length);
}

ms.Position = 0;
MemoryStream outStream = new MemoryStream();

byte[] compressed = new byte[ms.Length];
ms.Read(compressed, 0, compressed.Length);

byte[] gzBuffer = new byte[compressed.Length + 4];
System.Buffer.BlockCopy(compressed, 0, gzBuffer, 4, compressed.Length);
System.Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gzBuffer, 0, 4);
return Convert.ToBase64String (gzBuffer);
}

I founded following code on internet for string compression. When I compress a simple string, return value is very different.

For example, Compress("abc") returns "AwAAAB+LCAAAAAAABADtvQdgHEmWJSYvbcp7f0r1StfgdKEIgGATJNiQQBDswYjN5pLsHWlHIymrKoHKZVZlXWYWQMztnbz33nvvvffee++997o7nU4n99//P1xmZAFs9s5K2smeIYCqyB8/fnwfPyKyyfT/AcJBJDUDAAAA"

Can I take simple string result.

Thanks

using System.IO.Compression;
using System.Text;
using System.IO;

public static string Compress(string text)
{
byte[] buffer = Encoding.UTF8.GetBytes(text);
MemoryStream ms = new MemoryStream();
using (GZipStream zip = new GZipStream(ms, CompressionMode.Compress, true))
{
zip.Write(buffer, 0, buffer.Length);
}

ms.Position = 0;
MemoryStream outStream = new MemoryStream();

byte[] compressed = new byte[ms.Length];
ms.Read(compressed, 0, compressed.Length);

byte[] gzBuffer = new byte[compressed.Length + 4];
System.Buffer.BlockCopy(compressed, 0, gzBuffer, 4, compressed.Length);
System.Buffer.BlockCopy(BitConverter.GetBytes(buffer.Length), 0, gzBuffer, 0, 4);
return Convert.ToBase64String (gzBuffer);
}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

单调的奢华 2024-10-11 14:25:18

您使用的代码旨在压缩非常大的字符串。它使用 GZip 压缩源字符串 压缩算法,然后使用 BASE64 使其可读(或者可能可用/“可通过”) a> 编码。

Base64 将源字符串扩展至约 1.33 倍( 8 位符号被编码为 6 位 + 下一个符号的 2 位溢出)。因此,为了使字符串有意义,必须将源长度至少压缩 70%。

使用该编码时,结果是预期的并且是常见的。

要回答您的问题,请定义“简单字符串结果”的含义

Code you are using is intended for compress really large string. It compress source string by using GZip compression algorithm and then make it readable (or maybe usable / "passable") by using BASE64 encoding.

Base64 expand source string up to ~1.33 times large (8 bit symbol is encoded as 6 bit + 2 bit overflow for next symbol). So to make sense string have to be compressed at least to 70% from source length.

The result is expected and usual when using that encoding.

To answer your question please define what you mean by "simple string result"

人事已非 2024-10-11 14:25:18

当然,因为结果是 base64 格式的(请参阅代码中的最后一行)。

sure, because the result is in base64 (see the last line in your code).

心在旅行 2024-10-11 14:25:18

由于以下几个原因,压缩并不总是会导致较小的输出:

  1. 输入可能是完全随机的,在这种情况下,大多数压缩不会压缩任何内容,但仍然需要保存解压缩“指令”。压缩此类数据的结果是数据+指令……更大。
  2. 输入没有使用所使用的压缩算法搜索到的特征。这与前一个情况非常相似,只是它取决于所使用的压缩算法(在您的情况下为 Gzip)。
  3. 输入非常小。输入越小,在其中找到可压缩段的机会就越少,所以很有可能你会得到伪随机输入(不是随机的,但太小看起来是随机的),我们再次回到第一种情况。

是的,Base64 在这里很重要,但不要忘记这些关于压缩的小事实。

Compression doesn't always result in a smaller output for a few reasons:

  1. The input might be completely random, in which case most compressions will not compress anything, but still need to save the decompression "instructions". The result of compressing such data is data + instructions...bigger.
  2. The input has no features searched by the used compression algorithm. This is a very similar case to the previous one, except it is dependent on the compression algorithm used (in your case Gzip).
  3. Very small input. The smaller the input, the less chance to find compressible segments in it, so there is a big chance you'll get pseudo-random input (not random, but so small it looks random), and we go back to the first case again.

Base64 is a big point here, yes, but just don't forget these small facts about compression in general.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文