替换包含#0 的字符串?
我使用此函数将文件读取为字符串
function LoadFile(const FileName: TFileName): string;
begin
with TFileStream.Create(FileName,
fmOpenRead or fmShareDenyWrite) do begin
try
SetLength(Result, Size);
Read(Pointer(Result)^, Size);
except
Result := '';
Free;
raise;
end;
Free;
end;
end;
这是文件的文本:
version
这是 LoadFile 的返回值:
'ÿþv'#0'e'#0'r'#0's'#0'i'#0'o'#0'n'#0
我想创建一个包含“verabc”的新文件。问题是我仍然无法用“abc”替换“sion”。我用的是D2007。如果我删除所有#0,那么结果就变成了汉字。
I use this function to read file to string
function LoadFile(const FileName: TFileName): string;
begin
with TFileStream.Create(FileName,
fmOpenRead or fmShareDenyWrite) do begin
try
SetLength(Result, Size);
Read(Pointer(Result)^, Size);
except
Result := '';
Free;
raise;
end;
Free;
end;
end;
Here's the text of file :
version
Here's the return value of LoadFile :
'ÿþv'#0'e'#0'r'#0's'#0'i'#0'o'#0'n'#0
I want to make a new file contain "verabc". The problem is I still have a problem to replace "sion" with "abc". I am using D2007. If I remove all #0 then the result become Chinese character.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您认为的文件文本并不是真正的文件文本。您读入字符串变量的内容是准确的。您有一个编码为小端 UTF-16 的 Unicode 文本文件。前两个字节表示字节顺序标记,之后的每对字节表示字符串的另一个字符。
如果您正在读取 Unicode 文件,则应使用 Unicode 数据类型,例如
WideString
。设置字符串长度时,您需要将文件大小除以二,并且需要丢弃前两个字节。如果您不知道正在读取哪种文件,那么您需要先读取前两个或三个字节。如果前两个字节是 $ff $fe,如上所述,那么您可能有一个小端 UTF-16 文件;将文件的其余部分读入
WideString
或UnicodeString
(如果您有该类型)。如果它们是 $fe $ff,那么它可能是大端字节序;将文件的其余部分读取到WideString
中,然后交换每对字节的顺序。如果前两个字节是 $ef $bb,则检查第三个字节。如果是$bf,那么它们可能是UTF-8字节顺序标记。丢弃所有三个并将文件的其余部分读入AnsiString
或字节数组,然后使用UTF8Decode
等函数将其转换为WideString< /代码>。
将数据放入
WideString
后,调试器将显示它包含version
,并且使用支持 Unicode 的StringReplace< 版本应该不会有任何问题。 /code> 进行替换。
What you think is the text of the file isn't really the text of the file. What you've read into your string variable is accurate. You have a Unicode text file encoded as little-endian UTF-16. The first two bytes represent the byte-order mark, and each pair of bytes after that are another character of the string.
If you're reading a Unicode file, you should use a Unicode data type, such as
WideString
. You'll want to divide the file size by two when setting the length of the string, and you'll want to discard the first two bytes.If you don't know what kind of file you're reading, then you need to read the first two or three bytes first. If the first two bytes are $ff $fe, as above, then you might have a little-endian UTF-16 file; read the rest of the file into a
WideString
, orUnicodeString
if you have that type. If they're $fe $ff, then it might be big-endian; read the remainder of the file into aWideString
and then swap the order of each pair of bytes. If the first two bytes are $ef $bb, then check the third byte. If it's $bf, then they are probably the UTF-8 byte-order mark. Discard all three and read the rest of the file into anAnsiString
or an array of bytes, and then use a function likeUTF8Decode
to convert it into aWideString
.Once you have your data in a
WideString
, the debugger will show that it containsversion
, and you should have no trouble using a Unicode-enabled version ofStringReplace
to do your replacement.您似乎加载了 unicode 编码的文本文件。
0
表示拉丁字符。如果您不想处理 unicode 文本,请在保存文件时在编辑器中选择 ANSI 编码。
如果您需要 unicode 编码,请使用
WideCharToString
将其转换为 ANSI 字符串,或者直接删除0
,尽管后者不是最佳解决方案。同时删除 2 个前导字符ÿþ
。编辑器将这些字节标记文件为unicode。
It seems that you load a unicode encoded text file.
0
indicates Latin character.If you don't want to deal with unicode text, choose ANSI encoding in your editor when you save the file.
If you need unicode encoding, use
WideCharToString
to convert it to an ANSI string, or just remove yourself the0
s, though the latter isn't the best solution. Also remove the 2 leading characters,ÿþ
.The editor put those bytes to mark the file as unicode.