缓冲区为空,但 IdTCPClient.IOHandler.InputBufferIsEmpty 为 false
我在下面的代码中使用 idTCPClient 从 telnet 服务器读取缓冲区时遇到问题:
procedure TForm2.ReadTimerTimer(Sender: TObject);
var
S: String;
begin
if IdTCPClient.IOHandler.InputBufferIsEmpty then
begin
IdTCPClient.IOHandler.CheckForDataOnSource(10);
if IdTCPClient.IOHandler.InputBufferIsEmpty then Exit;
end;
s := idTCPClient.IOHandler.InputBufferAsString(TEncoding.UTF8);
CheckText(S);
end;
此过程每 1000 毫秒运行一次,并且当缓冲区有一个名为 CheckText 的值时运行。
该代码可以工作,但有时会将空缓冲区返回给 CheckText。
有什么问题吗?
谢谢
I have problem in below code with idTCPClient for reading buffer from a telnet server:
procedure TForm2.ReadTimerTimer(Sender: TObject);
var
S: String;
begin
if IdTCPClient.IOHandler.InputBufferIsEmpty then
begin
IdTCPClient.IOHandler.CheckForDataOnSource(10);
if IdTCPClient.IOHandler.InputBufferIsEmpty then Exit;
end;
s := idTCPClient.IOHandler.InputBufferAsString(TEncoding.UTF8);
CheckText(S);
end;
this procedure run every 1000 milliseconds and when the buffer have a value CheckText called.
this code works but sometimes this return the empty buffer to CheckText.
what's the problem?
thanks
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您的代码尝试从
InputBuffer
读取任意数据块,并期望它们是完整且有效的字符串。它这样做时没有任何考虑您收到的数据类型。这在多个层面上都是灾难的根源。您已连接到 Telnet 服务器,但直接使用
TIdTCPClient
而不是使用TIdTelnet
,因此您必须手动解码任何 Telnet 序列BEFORE 收到后,您可以处理任何剩余的字符串数据。查看TIdTelnet
的源代码。在触发 OnDataAvailable 事件之前会发生许多解码逻辑。所有 Telnet 序列数据均在内部处理,然后 OnDataAvailable 事件提供解码后剩余的所有非 Telnet 数据。处理好 Telnet 解码后,您必须注意的另一个问题是
TEncoding.UTF8
只能处理正确编码的 COMPLETE UTF-8 序列。如果它遇到编码错误的序列,或更重要的是遇到不完整的序列,整个解码失败并且返回一个空字符串。这已被报告为错误(请参阅 QC #79042)。CheckForDataOnSource()
将当时套接字中的任何原始字节存储到InputBuffer
中。InputBufferAsString()
提取 当时InputBuffer
中的任何原始字节,并尝试使用指定的编码对其进行解码。当您调用InputBufferAsString()
时,InputBuffer
中的原始字节很可能并不总是包含 COMPLETE UTF-8序列。有时,InputBuffer
中的最后一个序列可能仍在等待字节到达套接字,并且直到下一次调用CheckForDataOnSource()
时才会读取它们。这可以解释为什么您的CheckText()
函数在使用TEncoding.UTF8
时接收空白字符串。您应该使用
IndyUTF8Encoding()
(Indy 实现自己的 UTF-8 编码器/解码器以避免TEncoding.UTF8
中的解码错误)。至少,您不会再得到空白字符串,但是当 UTF-8 序列跨越多个CheckForDataOnSource()
调用时,您仍然可能会丢失数据(不完整的 UTF-8 序列将被转换为?
字符)。仅出于这个原因,在这种情况下您就不应该使用InputBufferAsString()
(即使TEncoding.UTF8
确实工作正常)。要正确处理此问题,您应该:1) 手动扫描
InputBuffer
,计算仅构成 COMPLETE UTF-8 序列的字节数,然后将该计数传递给 <代码>InputBuffer.Extract() 或TIdIOHandler.ReadString()
。任何剩余的字节都将保留在InputBuffer
中以备下次使用。为此,您必须摆脱第一个InputBufferIsEmpty()
调用,而只需无条件调用CheckForDataOnSource()
,以便始终检查更多字节,即使你已经有一些了。2) 使用
TIdIOHandler.ReadChar()
代替,并完全摆脱对InputBufferIsEmpty()
和CheckForDataOnSource()
的调用。缺点是,如果 UTF-8 序列解码为 UTF-16 代理对,您将丢失数据。ReadChar()
可以解码代理项,但它无法返回该对中的第二个字符(我已经开始为 Indy 的未来版本开发新的ReadChar()
重载,该重载返回String
而不是Char
,因此可以返回完整的代理对)。Your code is attempting to read arbitrary blocks of data from the
InputBuffer
and expects them to be complete and valid strings. It is doing this without ANY consideration for what kind of data you are receiving. That is a recipe for disaster on multiple levels.You are connected to a Telnet server, but you are using
TIdTCPClient
directly instead of usingTIdTelnet
, so you MUST manually decode any Telnet sequences that are received BEFORE you can then process any remaining string data. Look at the source code forTIdTelnet
. There is a lot of decoding logic that takes place before theOnDataAvailable
event is fired. All Telnet sequence data is handled internally, then theOnDataAvailable
event provides whatever non-Telnet data is left over after decoding.Once you have Telnet decoding taken care of, another problem you have to watch out for is that
TEncoding.UTF8
only handles properly encoded COMPLETE UTF-8 sequences. If it encounters a badly encoded sequence, or more importantly encounters an incomplete sequence, THE ENTIRE DECODE FAILS and it returns a blank string. This has already been reported as a bug (see QC #79042).CheckForDataOnSource()
stores whatever raw bytes are in the socket at that moment into theInputBuffer
.InputBufferAsString()
extracts whatever raw bytes are in theInputBuffer
at that moment and attempts to decode them using the specified encoding. It is very possible and likely that the raw bytes that are in theInputBuffer
when you callInputBufferAsString()
do not always contain COMPLETE UTF-8 sequences. Chances are that sometimes the last sequence in theInputBuffer
is still waiting for bytes to arrive in the socket and they will not be read until the next call toCheckForDataOnSource()
. That would explain why yourCheckText()
function is receiving blank strings when usingTEncoding.UTF8
.You should use
IndyUTF8Encoding()
instead (Indy implements its own UTF-8 encoder/decoder to avoid the decoding bug inTEncoding.UTF8
). At the very least, you will not get blank strings anymore, however you can still lose data when a UTF-8 sequence spans multipleCheckForDataOnSource()
calls (incomplete UTF-8 sequences will be converted to?
characters). For that reason alone, you should not be usingInputBufferAsString()
in this situation (even ifTEncoding.UTF8
did work properly). To handle this properly, you should either:1) scan through the
InputBuffer
manually, calculating how many bytes constitute COMPLETE UTF-8 sequences only, and then pass that count toInputBuffer.Extract()
orTIdIOHandler.ReadString()
. Any left over bytes will remain in theInputBuffer
for the next time. For that to work, you will have to get rid of the firstInputBufferIsEmpty()
call and just callCheckForDataOnSource()
unconditionally so that you are always checking for more bytes even if you already have some.2) use
TIdIOHandler.ReadChar()
instead and get rid of the calls toInputBufferIsEmpty()
andCheckForDataOnSource()
altogether. The downside is that you will lose data if a UTF-8 sequence decodes into a UTF-16 surrogate pair.ReadChar()
can decode surrogates, but it cannot return the second character in the pair (I have started working on newReadChar()
overloads for a future release of Indy that returnString
instead ofChar
so full surrogate pairs can be returned).虽然您的代码是正确的,但问题很可能是 inputBuffer 包含的数据可能包含空字符 (#0),这会结束字符串。
尝试 Remy 的 解决方案,并检查您在原始字节字符串中得到的内容。
编辑
我没有读到 OP 正在从 TelnetServer 读取数据。
OP 应使用 TidTelnet 而不是 IdTCPClient。
编辑2
我刚刚读过OP 的旧帖子解释了他不使用 TidTelnet 的原因。
/爸爸
While your code is correct, the problem is most likely that the inputBuffer contains data that might contain null characters (#0) which would end the string.
Try Remy's solution, and check what you get in the rawbytestring.
Edit
I didn't read that the OP was reading from a TelnetServer.
OP should use TidTelnet instead of IdTCPClient.
Edit2
I just read an older post of OP which explains the reason why he is not using TidTelnet.
/Daddy
Telnet 服务器在每次回车后发送一个空字符 (#0)。这很可能就是您所看到的。
编码为 UTF8 的空字符仍然是值为 0 的单个字节。请检查这是否是您收到的内容。
Telnet servers send a null character (#0) after each carriage return. This is most likely what you are seeing.
A null character encoded to UTF8 is still a single byte with the value of 0. Check to see if that's what you are receiving.