字节数与字符数
有些api需要字符数
// Why did they choose cch in these functions.
HRESULT StringCchCopyW(
__out LPWSTR pszDest,
__in size_t cchDest,
__in LPCWSTR pszSrc
);
errno_t wcscpy_s(
wchar_t *strDestination,
size_t numberOfElements,
const wchar_t *strSource
);
DWORD WINAPI GetCurrentDirectoryW(
__in DWORD nBufferLength, // Count of Chars
__out LPWSTR lpBuffer
);
,有些api需要字节数。
// What do you prefer cch vs cb function.
// Do cch functions almost useful?
HRESULT StringCbCopyW(
__out LPWSTR pszDest,
__in size_t cbDest,
__in LPCWSTR pszSrc
);
BOOL WINAPI ReadFile(
__in HANDLE hFile,
__out LPVOID lpBuffer,
__in DWORD nNumberOfBytesToRead,
__out_opt LPDWORD lpNumberOfBytesRead,
__inout_opt LPOVERLAPPED lpOverlapped
);
// Why did they choose cb in these structures.
// Because there are some apis uses cb, I always should see MSDN.
typedef struct _LSA_UNICODE_STRING {
USHORT Length; // Count of bytes.
USHORT MaximumLength; // Count of bytes.
PWSTR Buffer;
} UNICODE_STRING, *PUNICODE_STRING;
typedef struct _FILE_RENAME_INFO {
BOOL ReplaceIfExists;
HANDLE RootDirectory;
DWORD FileNameLength; // Count of bytes.
WCHAR FileName[1];
} FILE_RENAME_INFO, *PFILE_RENAME_INFO;
当你设计一个函数或数据结构时,你如何确定cb或cch?为什么?
为了为调用者设计更好的 api,我应该了解什么?
Some apis requires count of chars.
// Why did they choose cch in these functions.
HRESULT StringCchCopyW(
__out LPWSTR pszDest,
__in size_t cchDest,
__in LPCWSTR pszSrc
);
errno_t wcscpy_s(
wchar_t *strDestination,
size_t numberOfElements,
const wchar_t *strSource
);
DWORD WINAPI GetCurrentDirectoryW(
__in DWORD nBufferLength, // Count of Chars
__out LPWSTR lpBuffer
);
And Some apis requires count of bytes.
// What do you prefer cch vs cb function.
// Do cch functions almost useful?
HRESULT StringCbCopyW(
__out LPWSTR pszDest,
__in size_t cbDest,
__in LPCWSTR pszSrc
);
BOOL WINAPI ReadFile(
__in HANDLE hFile,
__out LPVOID lpBuffer,
__in DWORD nNumberOfBytesToRead,
__out_opt LPDWORD lpNumberOfBytesRead,
__inout_opt LPOVERLAPPED lpOverlapped
);
// Why did they choose cb in these structures.
// Because there are some apis uses cb, I always should see MSDN.
typedef struct _LSA_UNICODE_STRING {
USHORT Length; // Count of bytes.
USHORT MaximumLength; // Count of bytes.
PWSTR Buffer;
} UNICODE_STRING, *PUNICODE_STRING;
typedef struct _FILE_RENAME_INFO {
BOOL ReplaceIfExists;
HANDLE RootDirectory;
DWORD FileNameLength; // Count of bytes.
WCHAR FileName[1];
} FILE_RENAME_INFO, *PFILE_RENAME_INFO;
When you design a function or a data structure, how do you determine cb or cch? And why?
To design better api for caller, what should I know about this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果返回的数据是字符串,则应该返回字符数,因为字节数通常是无用的。但如果它是通用二进制数据(而不是特定的字符串),那么显然字符数没有任何意义,因此使用字节数。
至于原因:
我认为
LSA_UNICODE_STRING
保存字节数的原因是它与UNICODE_STRING
兼容,而后者又用于 NtCreateFile。但是NtCreateFile
接受一个FILE_OPEN_BY_FILE_ID
参数,该参数实际上将UNICODE_STRING
视为指向LONGLONG
值,并且 < em>不是一个字符串...所以字节数在那里更有意义,尽管我想说这总体上是一个糟糕的设计:If the data returned is a string, you should return the count of chars, since the number of bytes is often useless. But if it's generic binary data (and not specifically a string), then obviously the number of chars doesn't make any sense, so use the number of bytes.
As to why:
I believe the reason for
LSA_UNICODE_STRING
holding the number of bytes is that it's meant to be compatible withUNICODE_STRING
, which in turn is used in NtCreateFile. ButNtCreateFile
takes in aFILE_OPEN_BY_FILE_ID
parameter that actually treats theUNICODE_STRING
to be pointing to aLONGLONG
value, and not a string... so the number of bytes made more sense there, although I'd say it was overall a poor design:如果您注意到,您提到的第一组函数都是 ASCII 函数,因此在这种情况下没有区别 - 字节数就是字符数。这是因为(无论如何)单个 ASCII 字符的大小恰好是一个字节。
第二组是 unicode 函数/结构。在这种情况下,字符不能保证只是一个字节的大小 - 如果采用 UTF16 格式,它们将是两个字节宽,在 UTF32 中,它们将是四个字节,而在 UTF8 中,它们(通常)将是以下任意位置:一到四个字节宽。
特别是在 UTF8 数据的情况下,如果创建缓冲区,通常会预留一定数量的字节,根据字符大小,这些字节在字符数方面可能会有多种长度。我对你所介绍的大多数函数/结构都不太熟悉,但如果这与它有关,我也不会感到惊讶。
要回答你的问题,如果你使用 ASCII,你可以使用任何一种方法 - 这没有什么区别。然而,如果使用可变长度编码(例如 UTF8),您是否使用其中一种取决于您是否只对所涉及的字符感兴趣,或者是否还需要考虑它们的编码。
If you notice, the first group of functions you mention are all ASCII functions, and so in that case there is no difference - the count of bytes is the count of characters. That is because (generally, anyway) a single ASCII character is exactly one byte in size.
The second group are unicode functions/structs. In this case, the characters are not guaranteed to be only a single byte in size - if in UTF16 format they'll be two bytes wide, in UTF32 they'll be four, and in UTF8 they'll (typically) be anywhere from one to four bytes wide.
Particularly with the case of UTF8 data, if you create a buffer usually you set aside a certain number of bytes, which depending on character sizes could be quite a variety of lengths in terms of character counts. I'm not overly familiar with most of the functions/structs you've presented, but it wouldn't surprise me if that has something to do with it.
To answer your question, if you're working with ASCII you can use either approach - it makes no difference. If working with variable-length encodings however (such as UTF8), whether you use one or the other depends on whether you are interested in just the characters involved, or whether you also need to take into account their encoding.