字节数与字符数

发布于 2024-10-14 10:15:24 字数 1429 浏览 4 评论 0原文

有些api需要字符数

// Why did they choose cch in these functions.
HRESULT StringCchCopyW(
  __out  LPWSTR pszDest,
  __in   size_t cchDest,
  __in   LPCWSTR pszSrc
);

errno_t wcscpy_s(
   wchar_t *strDestination,
   size_t numberOfElements,
   const wchar_t *strSource 
);

DWORD WINAPI GetCurrentDirectoryW(
  __in   DWORD nBufferLength, // Count of Chars
  __out  LPWSTR lpBuffer
);  

,有些api需要字节数

// What do you prefer cch vs cb function.
// Do cch functions almost useful?
HRESULT StringCbCopyW(
  __out  LPWSTR pszDest,
  __in   size_t cbDest,
  __in   LPCWSTR pszSrc
);

BOOL WINAPI ReadFile(
  __in         HANDLE hFile,
  __out        LPVOID lpBuffer,
  __in         DWORD nNumberOfBytesToRead,
  __out_opt    LPDWORD lpNumberOfBytesRead,
  __inout_opt  LPOVERLAPPED lpOverlapped
);

// Why did they choose cb in these structures.
// Because there are some apis uses cb, I always should see MSDN.
typedef struct _LSA_UNICODE_STRING {
  USHORT Length; // Count of bytes.
  USHORT MaximumLength; // Count of bytes.
  PWSTR  Buffer;
} UNICODE_STRING, *PUNICODE_STRING;

typedef struct _FILE_RENAME_INFO {
  BOOL   ReplaceIfExists;
  HANDLE RootDirectory;
  DWORD  FileNameLength; // Count of bytes.
  WCHAR  FileName[1];
} FILE_RENAME_INFO, *PFILE_RENAME_INFO;

当你设计一个函数或数据结构时,你如何确定cb或cch?为什么?
为了为调用者设计更好的 api,我应该了解什么?

Some apis requires count of chars.

// Why did they choose cch in these functions.
HRESULT StringCchCopyW(
  __out  LPWSTR pszDest,
  __in   size_t cchDest,
  __in   LPCWSTR pszSrc
);

errno_t wcscpy_s(
   wchar_t *strDestination,
   size_t numberOfElements,
   const wchar_t *strSource 
);

DWORD WINAPI GetCurrentDirectoryW(
  __in   DWORD nBufferLength, // Count of Chars
  __out  LPWSTR lpBuffer
);  

And Some apis requires count of bytes.

// What do you prefer cch vs cb function.
// Do cch functions almost useful?
HRESULT StringCbCopyW(
  __out  LPWSTR pszDest,
  __in   size_t cbDest,
  __in   LPCWSTR pszSrc
);

BOOL WINAPI ReadFile(
  __in         HANDLE hFile,
  __out        LPVOID lpBuffer,
  __in         DWORD nNumberOfBytesToRead,
  __out_opt    LPDWORD lpNumberOfBytesRead,
  __inout_opt  LPOVERLAPPED lpOverlapped
);

// Why did they choose cb in these structures.
// Because there are some apis uses cb, I always should see MSDN.
typedef struct _LSA_UNICODE_STRING {
  USHORT Length; // Count of bytes.
  USHORT MaximumLength; // Count of bytes.
  PWSTR  Buffer;
} UNICODE_STRING, *PUNICODE_STRING;

typedef struct _FILE_RENAME_INFO {
  BOOL   ReplaceIfExists;
  HANDLE RootDirectory;
  DWORD  FileNameLength; // Count of bytes.
  WCHAR  FileName[1];
} FILE_RENAME_INFO, *PFILE_RENAME_INFO;

When you design a function or a data structure, how do you determine cb or cch? And why?
To design better api for caller, what should I know about this?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

时常饿 2024-10-21 10:15:24

如果返回的数据是字符串,则应该返回字符数,因为字节数通常是无用的。但如果它是通用二进制数据(而不是特定的字符串),那么显然字符数没有任何意义,因此使用字节数。

至于原因:

我认为 LSA_UNICODE_STRING 保存字节数的原因是它与 UNICODE_STRING 兼容,而后者又用于 NtCreateFile。但是 NtCreateFile 接受一个 FILE_OPEN_BY_FILE_ID 参数,该参数实际上将 UNICODE_STRING 视为指向 LONGLONG 值,并且 < em>不是一个字符串...所以字节数在那里更有意义,尽管我想说这总体上是一个糟糕的设计:

FILE_OPEN_BY_FILE_ID:由ObjectAttributes参数指定的文件名包括该文件的8字节文件引用号。

If the data returned is a string, you should return the count of chars, since the number of bytes is often useless. But if it's generic binary data (and not specifically a string), then obviously the number of chars doesn't make any sense, so use the number of bytes.

As to why:

I believe the reason for LSA_UNICODE_STRING holding the number of bytes is that it's meant to be compatible with UNICODE_STRING, which in turn is used in NtCreateFile. But NtCreateFile takes in a FILE_OPEN_BY_FILE_ID parameter that actually treats the UNICODE_STRING to be pointing to a LONGLONG value, and not a string... so the number of bytes made more sense there, although I'd say it was overall a poor design:

FILE_OPEN_BY_FILE_ID: The file name that is specified by the ObjectAttributes parameter includes the 8-byte file reference number for the file.

风透绣罗衣 2024-10-21 10:15:24

如果您注意到,您提到的第一组函数都是 ASCII 函数,因此在这种情况下没有区别 - 字节数就是字符数。这是因为(无论如何)单个 ASCII 字符的大小恰好是一个字节。

第二组是 unicode 函数/结构。在这种情况下,字符不能保证只是一个字节的大小 - 如果采用 UTF16 格式,它们将是两个字节宽,在 UTF32 中,它们将是四个字节,而在 UTF8 中,它们(通常)将是以下任意位置:一到四个字节宽。

特别是在 UTF8 数据的情况下,如果创建缓冲区,通常会预留一定数量的字节,根据字符大小,这些字节在字符数方面可能会有多种长度。我对你所介绍的大多数函数/结构都不太熟悉,但如果这与它有关,我也不会感到惊讶。

要回答你的问题,如果你使用 ASCII,你可以使用任何一种方法 - 这没有什么区别。然而,如果使用可变长度编码(例如 UTF8),您是否使用其中一种取决于您是否只对所涉及的字符感兴趣,或者是否还需要考虑它们的编码。

If you notice, the first group of functions you mention are all ASCII functions, and so in that case there is no difference - the count of bytes is the count of characters. That is because (generally, anyway) a single ASCII character is exactly one byte in size.

The second group are unicode functions/structs. In this case, the characters are not guaranteed to be only a single byte in size - if in UTF16 format they'll be two bytes wide, in UTF32 they'll be four, and in UTF8 they'll (typically) be anywhere from one to four bytes wide.

Particularly with the case of UTF8 data, if you create a buffer usually you set aside a certain number of bytes, which depending on character sizes could be quite a variety of lengths in terms of character counts. I'm not overly familiar with most of the functions/structs you've presented, but it wouldn't surprise me if that has something to do with it.

To answer your question, if you're working with ASCII you can use either approach - it makes no difference. If working with variable-length encodings however (such as UTF8), whether you use one or the other depends on whether you are interested in just the characters involved, or whether you also need to take into account their encoding.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文