StringOf 是否复制传递给它的数据?

发布于 2024-12-22 17:48:27 字数 375 浏览 3 评论 0原文

我正在读取一个文件,尝试通过检查前 n 字节中的 NUL 字节来检查它是否是二进制文件,如果未确定它是二进制文件,则将其操作为一个字符串。我尝试循环一个字符串并检查第一个 n 索引是否为 NUL,但这会产生误报,而检查 TBytes 则不会。

我使用 TFile.ReadAllBytes,它返回一个 TBytes 并对其执行 NUL 检查。然后,如果没有找到 NUL,我会在 TBytes 上使用 StringOf 来获取字符串。我想知道 StringOf 是否必须复制数据才能从中生成字符串(这些是大文件,所以我想避免这种情况),如果是这样,有什么更好的方法我正在尝试做什么。

I am reading in a file, attempting to check if it is a binary file by checking the first n bytes for a NUL byte, and if it is not determined to be binary that way, it is manipulated as a string. I tried to loop over a string and check the first n indices for a NUL, but that would give false positives that checking a TBytes does not.

I use TFile.ReadAllBytes, which returns a TBytes and perform the NUL check on that. Then if no NUL is found, I use StringOf on the TBytes to get a string. I was wondering if StringOf has to make a copy of the data to make a string out of it (these are large files so I want to avoid that) and if so, what is a better way to do what I am trying to do.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

七禾 2024-12-29 17:48:27

StringOf 是否复制传递给它的数据?

是的,根据文档'转换字节数组使用默认系统区域设置转换为 Unicode 字符串。'

如果您只想将 TByte 作为字符串访问,为什么不将其转换为 PChar(如果它是 Unicode)或PAnsiChar 如果它是 AnsiString?

示例代码:

var
  MyBuffer: TBytes;
  BufferLength: integer;
  BufferAsString: PChar;
  BuferAsAnsiString: PAnsiChar;
begin
  MyBuffer:= TFile.ReadAllBytes(Filename);
  BufferLength:= SizeOf(MyBuffer);
  BufferAsString:= PChar(@MyBuffer[0]);
  BufferAsAnsiString:= PAnsiChar(@MyBuffer[0]);
  //if there's no #0 at the end, make sure not to read past the end of the buffer!

编辑
我有点困惑,为什么你不只是使用 TFile.OpenRead 来获取 FileStream。
假设您有千兆字节的数据并且您很着急。
文件流将允许您只读取一小块数据,从而加快速度。

此示例代码读取整个文件,但可以轻松修改为仅获取一小部分:

var
  MyData: TFileStream
  MyString: string;  {or AnsiString}
  FileSize: integer;
  Index: integer;
begin
  MyData:= TFile.OpenRead(Filename);
  try
    FileSize:= MyData.GetSize;
    SetLength(MyString,FileSize+1); //Preallocate the string;
    Index:= 0;
    MyData.Read(PChar(MyString[Index])^, FileSize);
  finally
    MyData.Free;
  end;
  //Do stuff with your newly read string.  

请注意,最后一个示例仍然首先从磁盘读取所有数据(这可能是也可能不是您想要的)。
不过,您也可以分块读取数据。
使用 AnsiStrings 所有这些都更简单,因为 1 个字符 = 1 个字节:-)。

Does StringOf make a copy of the data passed to it?

Yes, according to the docs: 'Converts a byte array into a Unicode string using the default system locale.'

If you just want to access the TBytes as a string, why not cast it to a PChar (if it's Unicode) or PAnsiChar if it's an AnsiString?

Example code:

var
  MyBuffer: TBytes;
  BufferLength: integer;
  BufferAsString: PChar;
  BuferAsAnsiString: PAnsiChar;
begin
  MyBuffer:= TFile.ReadAllBytes(Filename);
  BufferLength:= SizeOf(MyBuffer);
  BufferAsString:= PChar(@MyBuffer[0]);
  BufferAsAnsiString:= PAnsiChar(@MyBuffer[0]);
  //if there's no #0 at the end, make sure not to read past the end of the buffer!

EDIT
I'm a bit puzzled, why you're not just using TFile.OpenRead to get a FileStream.
Let's assume you've got gigabyte(s) of data and you're in a hurry.
The Filestream will allow you to just read a small chunk of the data speeding things up.

This example code reads the whole file, but can easily be modified to only get a small part:

var
  MyData: TFileStream
  MyString: string;  {or AnsiString}
  FileSize: integer;
  Index: integer;
begin
  MyData:= TFile.OpenRead(Filename);
  try
    FileSize:= MyData.GetSize;
    SetLength(MyString,FileSize+1); //Preallocate the string;
    Index:= 0;
    MyData.Read(PChar(MyString[Index])^, FileSize);
  finally
    MyData.Free;
  end;
  //Do stuff with your newly read string.  

Note that the last example still reads all data from disk first (which may or may not be what your want).
However you can also read the data in chunks.
All of this is simpler with AnsiStrings because 1 char = 1 byte there :-).

无边思念无边月 2024-12-29 17:48:27

如果您认为 StringOf 只是就地类型转换,那您就错了。
StringOf 将其参数视为默认系统 ANSI 代码页编码中的字符数组,并将其转换为 UTF16 unicode 编码。当然,您会在结果字符串中发现很多零字节(WideChar 的高字节)。

If you think that StringOf is just an in-place typecasting, you are wrong.
StringOf treats its argument as an array of characters in default system ANSI codepage encoding and converts it to UTF16 unicode encoding. Sure you will find a lot of zero bytes in the resulting string (upper bytes of WideChar's).

冰雪之触 2024-12-29 17:48:27
  1. 使用 TFile.ReadAllBytes
  2. 检查 NUL 字节(请注意,UTF-16 将包含大量 NUL)
  3. 如果它是字符串,请使用 SetLength 将 TByte 增加 1 或 2 个字节(具体取决于编码)
  4. 在末尾追加 1 或 2 个 NUL(再次取决于编码)
  5. 将 @Bytes[0] 转换为 PAnsiChar/PWideChar(取决于编码)

您可以通过查看找到编码在 BOM 处。当然,这取决于您的输入文件的编码方式。

然而,SetLength 可能会复制数据。

  1. Use TFile.ReadAllBytes
  2. Do your checking for NUL bytes (be aware that UTF-16 will contain lots of NULs)
  3. If it is a string use SetLength to grow the TBytes by 1 or 2 bytes (depending on the encoding)
  4. Append 1 or 2 NUL at the end (depending on the encoding again)
  5. Cast @Bytes[0] to PAnsiChar/PWideChar (depending on the encoding)

You could find the encoding by looking at the BOM. This depends on the way your input files are encoded of course.

However SetLength may make a copy of the data.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文