Or, if you happen to like using Pascal-style strings (or f***** strings as Joel Spolsky likes to call them when they have a trailing NULL), just dereference the first character.
LATIN CAPITAL LETTER Z
COMBINING LEFT ANGLE BELOW
COMBINING DOUBLE LOW LINE
COMBINING INVERTED BRIDGE BELOW
COMBINING LATIN SMALL LETTER I
COMBINING LATIN SMALL LETTER R
COMBINING VERTICAL TILDE
LATIN SMALL LETTER A
COMBINING TILDE OVERLAY
COMBINING RIGHT ARROWHEAD BELOW
COMBINING LOW LINE
COMBINING TURNED COMMA ABOVE
COMBINING TURNED COMMA ABOVE
COMBINING ALMOST EQUAL TO ABOVE
COMBINING DOUBLE ACUTE ACCENT
COMBINING LATIN SMALL LETTER H
LATIN SMALL LETTER L
COMBINING OGONEK
COMBINING UPWARDS ARROW BELOW
COMBINING TILDE BELOW
COMBINING LEFT TACK BELOW
COMBINING LEFT ANGLE BELOW
COMBINING PLUS SIGN BELOW
COMBINING LATIN SMALL LETTER E
COMBINING GRAVE ACCENT
COMBINING DIAERESIS
COMBINING LEFT ANGLE ABOVE
COMBINING DOUBLE BREVE BELOW
LATIN SMALL LETTER G
COMBINING RIGHT ARROWHEAD BELOW
COMBINING LEFT ARROWHEAD BELOW
COMBINING DIAERESIS BELOW
COMBINING RIGHT ARROWHEAD AND UP ARROWHEAD BELOW
COMBINING PLUS SIGN BELOW
COMBINING TURNED COMMA ABOVE
COMBINING DOUBLE BREVE
COMBINING GREEK YPOGEGRAMMENI
LATIN SMALL LETTER O
COMBINING SHORT STROKE OVERLAY
COMBINING PALATALIZED HOOK BELOW
COMBINING PALATALIZED HOOK BELOW
COMBINING SEAGULL BELOW
COMBINING DOUBLE RING BELOW
COMBINING CANDRABINDU
COMBINING LATIN SMALL LETTER X
COMBINING OVERLINE
COMBINING LATIN SMALL LETTER H
COMBINING BREVE
COMBINING LATIN SMALL LETTER A
COMBINING LEFT ANGLE ABOVE
5 个字素
Z with some s**t
a with some s**t
l with some s**t
g with some s**t
o with some s**t
ICU 有 C++ 类,但它们需要转换为 UTF-16。 您可以直接使用 C 类型和宏来获得一些 UTF-8 支持:
#include <memory>
#include <iostream>
#include <unicode/utypes.h>
#include <unicode/ubrk.h>
#include <unicode/utext.h>
//
// C++ helpers so we can use RAII
//
// Note that ICU internally provides some C++ wrappers (such as BreakIterator), however these only seem to work
// for UTF-16 strings, and require transforming UTF-8 to UTF-16 before use.
// If you already have UTF-16 strings or can take the performance hit, you should probably use those instead of
// the C functions. See: http://icu-project.org/apiref/icu4c/
//
struct UTextDeleter { void operator()(UText* ptr) { utext_close(ptr); } };
struct UBreakIteratorDeleter { void operator()(UBreakIterator* ptr) { ubrk_close(ptr); } };
using PUText = std::unique_ptr<UText, UTextDeleter>;
using PUBreakIterator = std::unique_ptr<UBreakIterator, UBreakIteratorDeleter>;
void checkStatus(const UErrorCode status)
{
if(U_FAILURE(status))
{
throw std::runtime_error(u_errorName(status));
}
}
size_t countGraphemes(UText* text)
{
// source for most of this: http://userguide.icu-project.org/strings/utext
UErrorCode status = U_ZERO_ERROR;
PUBreakIterator it(ubrk_open(UBRK_CHARACTER, "en_us", nullptr, 0, &status));
checkStatus(status);
ubrk_setUText(it.get(), text, &status);
checkStatus(status);
size_t charCount = 0;
while(ubrk_next(it.get()) != UBRK_DONE)
{
++charCount;
}
return charCount;
}
size_t countCodepoints(UText* text)
{
size_t codepointCount = 0;
while(UTEXT_NEXT32(text) != U_SENTINEL)
{
++codepointCount;
}
// reset the index so we can use the structure again
UTEXT_SETNATIVEINDEX(text, 0);
return codepointCount;
}
void printStringInfo(const std::string& utf8)
{
UErrorCode status = U_ZERO_ERROR;
PUText text(utext_openUTF8(nullptr, utf8.data(), utf8.length(), &status));
checkStatus(status);
std::cout << "UTF-8 string (might look wrong if your console locale is different): " << utf8 << std::endl;
std::cout << "Length (UTF-8 bytes): " << utf8.length() << std::endl;
std::cout << "Length (UTF-8 codepoints): " << countCodepoints(text.get()) << std::endl;
std::cout << "Length (graphemes): " << countGraphemes(text.get()) << std::endl;
std::cout << std::endl;
}
void main(int argc, char** argv)
{
printStringInfo(u8"Hello, world!");
printStringInfo(u8"หวัดดีชาวโลก");
printStringInfo(u8"\xF0\x9F\x90\xBF");
printStringInfo(u8"Z͉̳̺ͥͬ̾a̴͕̲̒̒͌̋ͪl̨͎̰̘͉̟ͤ̀̈̚͜g͕͔̤͖̟̒͝ͅo̵̡̡̼͚̐ͯ̅ͪ̆ͣ̚");
}
这将打印:
UTF-8 string (might look wrong if your console locale is different): Hello, world!
Length (UTF-8 bytes): 13
Length (UTF-8 codepoints): 13
Length (graphemes): 13
UTF-8 string (might look wrong if your console locale is different): หวัดดีชาวโลก
Length (UTF-8 bytes): 36
Length (UTF-8 codepoints): 12
Length (graphemes): 10
UTF-8 string (might look wrong if your console locale is different):
For Unicode
Several answers here have addressed that .length() gives the wrong results with multibyte characters, but there are 11 answers and none of them have provided a solution.
The case of Z͉̳̺ͥͬ̾a̴͕̲̒̒͌̋ͪl̨͎̰̘͉̟ͤ̀̈̚͜g͕͔̤͖̟̒͝ͅo̵̡̡̼͚̐ͯ̅ͪ̆ͣ̚
First of all, it's important to know what you mean by "length". For a motivating example, consider the string "Z͉̳̺ͥͬ̾a̴͕̲̒̒͌̋ͪl̨͎̰̘͉̟ͤ̀̈̚͜g͕͔̤͖̟̒͝ͅo̵̡̡̼͚̐ͯ̅ͪ̆ͣ̚" (note that some languages, notably Thai, actually use combining diacritical marks, so this isn't just useful for 15-year-old memes, but obviously that's the most important use case). Assume it is encoded in UTF-8. There are 3 ways we can talk about the length of this string:
LATIN CAPITAL LETTER Z
COMBINING LEFT ANGLE BELOW
COMBINING DOUBLE LOW LINE
COMBINING INVERTED BRIDGE BELOW
COMBINING LATIN SMALL LETTER I
COMBINING LATIN SMALL LETTER R
COMBINING VERTICAL TILDE
LATIN SMALL LETTER A
COMBINING TILDE OVERLAY
COMBINING RIGHT ARROWHEAD BELOW
COMBINING LOW LINE
COMBINING TURNED COMMA ABOVE
COMBINING TURNED COMMA ABOVE
COMBINING ALMOST EQUAL TO ABOVE
COMBINING DOUBLE ACUTE ACCENT
COMBINING LATIN SMALL LETTER H
LATIN SMALL LETTER L
COMBINING OGONEK
COMBINING UPWARDS ARROW BELOW
COMBINING TILDE BELOW
COMBINING LEFT TACK BELOW
COMBINING LEFT ANGLE BELOW
COMBINING PLUS SIGN BELOW
COMBINING LATIN SMALL LETTER E
COMBINING GRAVE ACCENT
COMBINING DIAERESIS
COMBINING LEFT ANGLE ABOVE
COMBINING DOUBLE BREVE BELOW
LATIN SMALL LETTER G
COMBINING RIGHT ARROWHEAD BELOW
COMBINING LEFT ARROWHEAD BELOW
COMBINING DIAERESIS BELOW
COMBINING RIGHT ARROWHEAD AND UP ARROWHEAD BELOW
COMBINING PLUS SIGN BELOW
COMBINING TURNED COMMA ABOVE
COMBINING DOUBLE BREVE
COMBINING GREEK YPOGEGRAMMENI
LATIN SMALL LETTER O
COMBINING SHORT STROKE OVERLAY
COMBINING PALATALIZED HOOK BELOW
COMBINING PALATALIZED HOOK BELOW
COMBINING SEAGULL BELOW
COMBINING DOUBLE RING BELOW
COMBINING CANDRABINDU
COMBINING LATIN SMALL LETTER X
COMBINING OVERLINE
COMBINING LATIN SMALL LETTER H
COMBINING BREVE
COMBINING LATIN SMALL LETTER A
COMBINING LEFT ANGLE ABOVE
5 graphemes
Z with some s**t
a with some s**t
l with some s**t
g with some s**t
o with some s**t
处理 C++ 字符串 (std::string) 时,您需要查找 length() 或 size()。 两者应该为您提供相同的价值。 但是,在处理 C 风格字符串时,您可以使用 strlen()。
#include <iostream>
#include <string.h>
int main(int argc, char **argv)
{
std::string str = "Hello!";
const char *otherstr = "Hello!"; // C-Style string
std::cout << str.size() << std::endl;
std::cout << str.length() << std::endl;
std::cout << strlen(otherstr) << std::endl; // C way for string length
std::cout << strlen(str.c_str()) << std::endl; // convert C++ string to C-string then call strlen
return 0;
}
输出:
6
6
6
6
When dealing with C++ strings (std::string), you're looking for length() or size(). Both should provide you with the same value. However when dealing with C-Style strings, you would use strlen().
#include <iostream>
#include <string.h>
int main(int argc, char **argv)
{
std::string str = "Hello!";
const char *otherstr = "Hello!"; // C-Style string
std::cout << str.size() << std::endl;
std::cout << str.length() << std::endl;
std::cout << strlen(otherstr) << std::endl; // C way for string length
std::cout << strlen(str.c_str()) << std::endl; // convert C++ string to C-string then call strlen
return 0;
}
It depends on what string type you're talking about. There are many types of strings:
const char* - a C-style multibyte string
const wchar_t* - a C-style wide string
std::string - a "standard" multibyte string
std::wstring - a "standard" wide string
For 3 and 4, you can use .size() or .length() methods.
For 1, you can use strlen(), but you must ensure that the string variable is not NULL (=== 0)
For 2, you can use wcslen(), but you must ensure that the string variable is not NULL (=== 0)
There are other string types in non-standard C++ libraries, such as MFC's CString, ATL's CComBSTR, ACE's ACE_CString, and so on, with methods such as .GetLength(), and so on. I can't remember the specifics of them all right off the top of my head.
The STLSoft libraries have abstracted this all out with what they call string access shims, which can be used to get the string length (and other aspects) from any type. So for all of the above (including the non-standard library ones) using the same function stlsoft::c_str_len(). This article describes how it all works, as it's not all entirely obvious or easy.
In C++ std::string the length() and size() method gives you the number of bytes, and not necessarily the number of characters !.
Same with the c-Style sizeof() function!
For most of the printable 7bit-ASCII Characters this is the same value, but for characters that are not 7bit-ASCII it's definitely not.
See the following example to give you real results (64bit linux).
There is no simple c/c++ function that can really count the number of characters.
By the way, all of this stuff is implementation dependent and may be different on other environments (compiler, win 16/32, linux, embedded, ...)
#include <iostream>
#include <string>
using namespace std;
int main(){
string str;
getline(cin,str);
cout<<"Length of given string is"<<str.length();
return 0;
}
不带空格的字符串
#include <iostream>
#include <string>
using namespace std;
int main(){
string str;
cin>>str;
cout<<"Length of given string is"<<str.length();
return 0;
}
Simplest way to get length of string without bothering about std namespace is as follows
string with/without spaces
#include <iostream>
#include <string>
using namespace std;
int main(){
string str;
getline(cin,str);
cout<<"Length of given string is"<<str.length();
return 0;
}
string without spaces
#include <iostream>
#include <string>
using namespace std;
int main(){
string str;
cin>>str;
cout<<"Length of given string is"<<str.length();
return 0;
}
发布评论
评论(12)
如果您使用的是
std::string
,请调用length()
:如果您使用的是 C 字符串,请调用
strlen()
。或者,如果您碰巧喜欢使用 Pascal 风格的字符串(或 Joel Spolsky 的 f***** 字符串 喜欢在它们尾随 NULL 时调用它们),只需取消引用第一个字符即可。
If you're using a
std::string
, calllength()
:If you're using a c-string, call
strlen()
.Or, if you happen to like using Pascal-style strings (or f***** strings as Joel Spolsky likes to call them when they have a trailing NULL), just dereference the first character.
对于 Unicode
这里的几个答案已经解决了
.length()
给出多字节字符的错误结果,但有 11 个答案,但没有一个提供解决方案。Z͉̳̺ͥͬ̾a̴͕̲̒̒͌̋ͪl̨͎̰̘͉̟ͤ̀̈̚͜g͕͔̤͖̟̒͝ͅo̵̡̡̼͚̐ͯ̅ͪ̆ͣ̚
首先,了解“长度”的含义很重要。 作为一个激励示例,请考虑字符串“Z͉̳̺ͥͬ̾a̴͕̲̒̒͌̋ͪl̨͎̰̘͉̟ͤ̀̈̚͜g͕͔̤͖̟̒͝ͅo̵̡̡̼͚̐ͯ̅ͪ̆ͣ̚”(请注意,某些语言,尤其是泰语,实际上使用变音符号标记,所以这不仅仅对 15 岁的模因有用,但显然这是最重要的用例)。 假设它是用 UTF-8 编码的。 我们可以通过 3 种方式来讨论该字符串的长度:
95 个字节
50 个代码点
5 个字素
使用 ICU< 查找长度/a>
ICU 有 C++ 类,但它们需要转换为 UTF-16。 您可以直接使用 C 类型和宏来获得一些 UTF-8 支持:
这将打印:
For Unicode
Several answers here have addressed that
.length()
gives the wrong results with multibyte characters, but there are 11 answers and none of them have provided a solution.The case of Z͉̳̺ͥͬ̾a̴͕̲̒̒͌̋ͪl̨͎̰̘͉̟ͤ̀̈̚͜g͕͔̤͖̟̒͝ͅo̵̡̡̼͚̐ͯ̅ͪ̆ͣ̚
First of all, it's important to know what you mean by "length". For a motivating example, consider the string "Z͉̳̺ͥͬ̾a̴͕̲̒̒͌̋ͪl̨͎̰̘͉̟ͤ̀̈̚͜g͕͔̤͖̟̒͝ͅo̵̡̡̼͚̐ͯ̅ͪ̆ͣ̚" (note that some languages, notably Thai, actually use combining diacritical marks, so this isn't just useful for 15-year-old memes, but obviously that's the most important use case). Assume it is encoded in UTF-8. There are 3 ways we can talk about the length of this string:
95 bytes
50 codepoints
5 graphemes
Finding the lengths using ICU
There are C++ classes for ICU, but they require converting to UTF-16. You can use the C types and macros directly to get some UTF-8 support:
This prints:
Boost.Locale wraps ICU, and might provide a nicer interface. However, it still requires conversion to/from UTF-16.
处理 C++ 字符串 (std::string) 时,您需要查找 length() 或 size()。 两者应该为您提供相同的价值。 但是,在处理 C 风格字符串时,您可以使用 strlen()。
输出:
When dealing with C++ strings (std::string), you're looking for length() or size(). Both should provide you with the same value. However when dealing with C-Style strings, you would use strlen().
Output:
这取决于您所讨论的字符串类型。 字符串有多种类型:
const char*
- C 风格多字节字符串const wchar_t*
- C 风格宽字符串std::string
code> - “标准”多字节字符串std::wstring
- “标准”宽字符串对于 3 和 4,您可以使用
.size()
或.length() 方法。
对于 1,可以使用
strlen()
,但必须确保字符串变量不为 NULL (=== 0)对于 2,可以使用
wcslen()
,但必须保证字符串变量不为NULL(=== 0)非标准C++库中还有其他字符串类型,如MFC的
CString
、ATL的CComBSTR
、ACE 的ACE_CString
等,以及.GetLength()
等方法。 我一时记不起它们的具体细节。STLSoft 库已将这一切抽象出来,并称为 字符串访问垫片,可用于从任何类型获取字符串长度(和其他方面)。 因此,对于上述所有内容(包括非标准库),请使用相同的函数
stlsoft::c_str_len()
。 本文描述了这一切是如何工作的,因为它并不完全显而易见或容易。It depends on what string type you're talking about. There are many types of strings:
const char*
- a C-style multibyte stringconst wchar_t*
- a C-style wide stringstd::string
- a "standard" multibyte stringstd::wstring
- a "standard" wide stringFor 3 and 4, you can use
.size()
or.length()
methods.For 1, you can use
strlen()
, but you must ensure that the string variable is not NULL (=== 0)For 2, you can use
wcslen()
, but you must ensure that the string variable is not NULL (=== 0)There are other string types in non-standard C++ libraries, such as MFC's
CString
, ATL'sCComBSTR
, ACE'sACE_CString
, and so on, with methods such as.GetLength()
, and so on. I can't remember the specifics of them all right off the top of my head.The STLSoft libraries have abstracted this all out with what they call string access shims, which can be used to get the string length (and other aspects) from any type. So for all of the above (including the non-standard library ones) using the same function
stlsoft::c_str_len()
. This article describes how it all works, as it's not all entirely obvious or easy.在 C++ std::string 中,length() 和 size() 方法给出字节数,而不一定是字符数!
与 c 风格的 sizeof() 函数相同!
对于大多数可打印的 7 位 ASCII 字符来说,这是相同的值,但对于不是 7 位 ASCII 的字符来说,这绝对不是。
请参阅以下示例,为您提供真实结果(64 位 Linux)。
没有简单的 C/C++ 函数可以真正计算字符数。
顺便说一句,所有这些内容都依赖于实现,并且在其他环境(编译器、win 16/32、linux、嵌入式等)上可能会有所不同
请参阅以下示例:
In C++ std::string the length() and size() method gives you the number of bytes, and not necessarily the number of characters !.
Same with the c-Style sizeof() function!
For most of the printable 7bit-ASCII Characters this is the same value, but for characters that are not 7bit-ASCII it's definitely not.
See the following example to give you real results (64bit linux).
There is no simple c/c++ function that can really count the number of characters.
By the way, all of this stuff is implementation dependent and may be different on other environments (compiler, win 16/32, linux, embedded, ...)
See following example:
The output of the example is this:
如果您使用旧的 C 样式字符串而不是较新的 STL 样式字符串,则 C 运行时库中有
strlen
函数:If you're using old, C-style string instead of the newer, STL-style strings, there's the
strlen
function in the C run time library:如果您使用 std::string,有两种常用方法:
如果您使用 C 样式字符串(使用 char * 或 const char *),则可以使用:
if you're using std::string, there are two common methods for that:
if you're using the C style string (using char * or const char *) then you can use:
.length 和 .size 是同义词,我只是认为“length”是一个稍微清晰一些的词。
.length and .size are synonymous, I just think that "length" is a slightly clearer word.
对于实际的字符串对象:
或
for an actual string object:
or
这可能是输入字符串并查找其长度的最简单方法。
It might be the easiest way to input a string and find its length.
获取字符串长度而不用担心 std 命名空间的最简单方法如下:
带有/不带空格的字符串
不带空格的字符串
Simplest way to get length of string without bothering about std namespace is as follows
string with/without spaces
string without spaces