为什么我们在Encoder.GetBytes方法中使用flush参数

发布于 2024-09-25 21:46:03 字数 367 浏览 8 评论 0原文

链接解释了Encoder.GetBytes 方法,还有一个名为“flush”的 bool 参数也进行了解释。对flush的解释是:

如果该编码器可以刷新其值,则为 true 转换结束时的状态; 否则为假。为确保正确 一系列块的终止 编码字节,最后一次调用 GetBytes 可以指定值为 true 用于冲洗。

但我不明白flush是做什么的,也许我喝醉了或者什么的:)。您能更详细地解释一下吗?

This link explains the Encoder.GetBytes Method and there is a bool parameter called flush explained too . The explanation of flush is :

true if this encoder can flush its
state at the end of the conversion;
otherwise, false. To ensure correct
termination of a sequence of blocks of
encoded bytes, the last call to
GetBytes can specify a value of true
for flush.

but I didn't understand what flush does , maybe I am drunk or somthing :). can you explain it in more details please.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

三寸金莲 2024-10-02 21:46:03

假设您通过套接字连接接收数据。您将收到由多个 byte[] 块组成的长文本。

在 UTF-8 流中,1 个 Unicode 字符可能占用 2 个以上字节,并且它被分成 2 个字节块。单独编码 2 个字节块(并连接字符串)会产生错误。

所以你只能在最后一个块上指定flush=true。当然,如果您只有 1 个区块,那么这也是最后一个。

提示:使用 TextReader 并让它为您解决这个问题。

编辑

镜像问题(实际上被问到:GetBytes)稍微难以解释。

使用 flush=true 与使用 Encoder.Reset()。它清除编码器的“状态”,

包括前一个数据块末尾的尾随字符,例如不匹配的高代理项

基本思想是相同的:当从字符串转换为字节块时,反之亦然,这些块是 >不独立。

Suppose you receive data over a socket connection. You will receive a long text as several byte[] blocks.

It is possible that 1 Unicode character occupies 2+ bytes in a UTF-8 stream and that it is split over 2 byte blocks. Encoding the 2 byte blocks separately (and concatenating the strings) would produce an error.

So you can only specify flush=true on the last block. And of course, if you only have 1 block then that is also the last.

Tip: Use a TextReader and let it handle this problem(s) for you.

Edit

The mirror problem (that was actually asked: GetBytes) is slightly harder to explain.

Using flush=true is the same as using Encoder.Reset() after GetBytes(...). It clears the 'state' of the encoder,

including trailing characters at the end of the previous data block, such as an unmatched high surrogate

The basic idea is the same: when converting from string to blocks of bytes, or vice versa, the blocks are not independent.

夜雨飘雪 2024-10-02 21:46:03

刷新将重置用于将文本编码为字节的编码器实例的内部状态。您可能会问,为什么需要内部状态?好吧,引用MSDN:

flush 参数对于在流末尾刷新高代理项很有用
没有低代理。例如,创建的编码器
UTF8Encoding.GetEncoder 使用此参数来确定是否写出
在字符块末尾悬挂高代理。

因此,如果您使用多个 GetBytes(),您可能希望在最后刷新内部状态以终止任何需要终止的字符序列,但 end,因为否则可能会在单词中间引入终止序列。

请注意,如今这可能是一个纯粹的理论问题。而且,你最好 使用更高的无论如何,级别包装器。如果你这样做了,喝醉就不是问题了。

Flushing will reset the internal state of the encoder instance used to encode the text into bytes. Why does it need internal state, you ask? Well, to quote MSDN:

The flush parameter is useful for flushing a high-surrogate at the end of a stream
that does not have a low-surrogate. For example, the Encoder created by
UTF8Encoding.GetEncoder uses this parameter to determine whether to write out a
dangling high-surrogate at the end of a character block.

If you're using multiple GetBytes(), hence, you would want to flush the internal state at the end to terminate any character sequences that need terminating, but only at the end, since terminating sequences might otherwise be introduced in the middle of words.

Note that this may be a purely theoretical problem these days. And, you'd be better off using higher-level wrappers anyway. If you do, being drunk will not be a problem.

爱的故事 2024-10-02 21:46:03

在内部,Encoder 将使用缓冲区来实现 - 该缓冲区可能需要刷新(清除),以便正确结束读取或为下一次读取准备 Encoder

这里是一种解释缓冲区刷新。

此处描述了 flush 参数的确切用法一个>:

true表示转换后清除编码器内部状态;否则为假。

Internally the Encoder would be implemented with a buffer - this buffer may need to be flushed (cleared) in order to end the read correctly or prepare the Encoder for the next read.

Here is one explanation of buffer flushing.

The exact usage of the flush parameter is described here:

true to clear the internal state of the encoder after the conversion; otherwise, false.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文