缓冲区为空，但 IdTCPClient.IOHandler.InputBufferIsEmpty 为 false

发布于 2024-12-25 16:59:56 字数 546 浏览 4 评论 0原文

我在下面的代码中使用 idTCPClient 从 telnet 服务器读取缓冲区时遇到问题：

procedure TForm2.ReadTimerTimer(Sender: TObject);
var
   S: String; 
begin
   if IdTCPClient.IOHandler.InputBufferIsEmpty then
   begin
     IdTCPClient.IOHandler.CheckForDataOnSource(10);
     if IdTCPClient.IOHandler.InputBufferIsEmpty then Exit;
   end;
   s := idTCPClient.IOHandler.InputBufferAsString(TEncoding.UTF8);
   CheckText(S);
end;

此过程每 1000 毫秒运行一次，并且当缓冲区有一个名为 CheckText 的值时运行。

该代码可以工作，但有时会将空缓冲区返回给 CheckText。

有什么问题吗？

谢谢

原文

I have problem in below code with idTCPClient for reading buffer from a telnet server:

procedure TForm2.ReadTimerTimer(Sender: TObject);
var
   S: String; 
begin
   if IdTCPClient.IOHandler.InputBufferIsEmpty then
   begin
     IdTCPClient.IOHandler.CheckForDataOnSource(10);
     if IdTCPClient.IOHandler.InputBufferIsEmpty then Exit;
   end;
   s := idTCPClient.IOHandler.InputBufferAsString(TEncoding.UTF8);
   CheckText(S);
end;

this procedure run every 1000 milliseconds and when the buffer have a value CheckText called.

this code works but sometimes this return the empty buffer to CheckText.

what's the problem?

thanks

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

陌路黄昏 2025-01-01 16:59:56

您的代码尝试从 InputBuffer 读取任意数据块，并期望它们是完整且有效的字符串。它这样做时没有任何考虑您收到的数据类型。这在多个层面上都是灾难的根源。

您已连接到 Telnet 服务器，但直接使用 TIdTCPClient 而不是使用 TIdTelnet，因此您必须手动解码任何 Telnet 序列BEFORE 收到后，您可以处理任何剩余的字符串数据。查看TIdTelnet的源代码。在触发 OnDataAvailable 事件之前会发生许多解码逻辑。所有 Telnet 序列数据均在内部处理，然后 OnDataAvailable 事件提供解码后剩余的所有非 Telnet 数据。

处理好 Telnet 解码后，您必须注意的另一个问题是 TEncoding.UTF8 只能处理正确编码的 COMPLETE UTF-8 序列。如果它遇到编码错误的序列，或更重要的是遇到不完整的序列，整个解码失败并且返回一个空字符串。这已被报告为错误（请参阅 QC #79042）。

CheckForDataOnSource() 将当时套接字中的任何原始字节存储到 InputBuffer 中。 InputBufferAsString() 提取当时 InputBuffer 中的任何原始字节，并尝试使用指定的编码对其进行解码。当您调用 InputBufferAsString() 时，InputBuffer 中的原始字节很可能并不总是包含 COMPLETE UTF-8序列。有时，InputBuffer 中的最后一个序列可能仍在等待字节到达套接字，并且直到下一次调用 CheckForDataOnSource() 时才会读取它们。这可以解释为什么您的 CheckText() 函数在使用 TEncoding.UTF8 时接收空白字符串。

您应该使用 IndyUTF8Encoding() （Indy 实现自己的 UTF-8 编码器/解码器以避免 TEncoding.UTF8 中的解码错误）。至少，您不会再得到空白字符串，但是当 UTF-8 序列跨越多个 CheckForDataOnSource() 调用时，您仍然可能会丢失数据（不完整的 UTF-8 序列将被转换为 ? 字符）。仅出于这个原因，在这种情况下您就不应该使用 InputBufferAsString()（即使 TEncoding.UTF8 确实工作正常）。要正确处理此问题，您应该：

1) 手动扫描 InputBuffer，计算仅构成 COMPLETE UTF-8 序列的字节数，然后将该计数传递给 <代码>InputBuffer.Extract() 或TIdIOHandler.ReadString()。任何剩余的字节都将保留在 InputBuffer 中以备下次使用。为此，您必须摆脱第一个 InputBufferIsEmpty() 调用，而只需无条件调用 CheckForDataOnSource() ，以便始终检查更多字节，即使你已经有一些了。

2) 使用 TIdIOHandler.ReadChar() 代替，并完全摆脱对 InputBufferIsEmpty() 和 CheckForDataOnSource() 的调用。缺点是，如果 UTF-8 序列解码为 UTF-16 代理对，您将丢失数据。 ReadChar() 可以解码代理项，但它无法返回该对中的第二个字符（我已经开始为 Indy 的未来版本开发新的 ReadChar() 重载，该重载返回String 而不是 Char，因此可以返回完整的代理对）。

Your code is attempting to read arbitrary blocks of data from the InputBuffer and expects them to be complete and valid strings. It is doing this without ANY consideration for what kind of data you are receiving. That is a recipe for disaster on multiple levels.

You are connected to a Telnet server, but you are using TIdTCPClient directly instead of using TIdTelnet, so you MUST manually decode any Telnet sequences that are received BEFORE you can then process any remaining string data. Look at the source code for TIdTelnet. There is a lot of decoding logic that takes place before the OnDataAvailable event is fired. All Telnet sequence data is handled internally, then the OnDataAvailable event provides whatever non-Telnet data is left over after decoding.

Once you have Telnet decoding taken care of, another problem you have to watch out for is that TEncoding.UTF8 only handles properly encoded COMPLETE UTF-8 sequences. If it encounters a badly encoded sequence, or more importantly encounters an incomplete sequence, THE ENTIRE DECODE FAILS and it returns a blank string. This has already been reported as a bug (see QC #79042).

CheckForDataOnSource() stores whatever raw bytes are in the socket at that moment into the InputBuffer. InputBufferAsString() extracts whatever raw bytes are in the InputBuffer at that moment and attempts to decode them using the specified encoding. It is very possible and likely that the raw bytes that are in the InputBuffer when you call InputBufferAsString() do not always contain COMPLETE UTF-8 sequences. Chances are that sometimes the last sequence in the InputBuffer is still waiting for bytes to arrive in the socket and they will not be read until the next call to CheckForDataOnSource(). That would explain why your CheckText() function is receiving blank strings when using TEncoding.UTF8.

You should use IndyUTF8Encoding() instead (Indy implements its own UTF-8 encoder/decoder to avoid the decoding bug in TEncoding.UTF8). At the very least, you will not get blank strings anymore, however you can still lose data when a UTF-8 sequence spans multiple CheckForDataOnSource() calls (incomplete UTF-8 sequences will be converted to ? characters). For that reason alone, you should not be using InputBufferAsString() in this situation (even if TEncoding.UTF8 did work properly). To handle this properly, you should either:

1) scan through the InputBuffer manually, calculating how many bytes constitute COMPLETE UTF-8 sequences only, and then pass that count to InputBuffer.Extract() or TIdIOHandler.ReadString(). Any left over bytes will remain in the InputBuffer for the next time. For that to work, you will have to get rid of the first InputBufferIsEmpty() call and just call CheckForDataOnSource() unconditionally so that you are always checking for more bytes even if you already have some.

2) use TIdIOHandler.ReadChar() instead and get rid of the calls to InputBufferIsEmpty() and CheckForDataOnSource() altogether. The downside is that you will lose data if a UTF-8 sequence decodes into a UTF-16 surrogate pair. ReadChar() can decode surrogates, but it cannot return the second character in the pair (I have started working on new ReadChar() overloads for a future release of Indy that return String instead of Char so full surrogate pairs can be returned).

回复收藏 0 原文