如何将 std::string 的实例转换为小写

发布于 2024-07-09 18:54:45 字数 162 浏览 7 评论 0原文

我想将 std::string 转换为小写。 我知道函数tolower()。 然而,过去我在使用这个函数时遇到了问题,而且它并不理想,因为将它与 std::string 一起使用需要迭代每个字符。

有没有 100% 有效的替代方案?

I want to convert a std::string to lowercase. I am aware of the function tolower(). However, in the past I have had issues with this function and it is hardly ideal anyway as using it with a std::string would require iterating over each character.

Is there an alternative which works 100% of the time?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(30

£冰雨忧蓝° 2024-07-16 18:54:45

改编自不太常见的问题

#include <algorithm>
#include <cctype>
#include <string>

std::string data = "Abc";
std::transform(data.begin(), data.end(), data.begin(),
    [](unsigned char c){ return std::tolower(c); });

如果不迭代每个字符,你真的无法逃脱。 否则无法知道该字符是小写还是大写。

如果你真的讨厌 tolower(),这里有一个专门的纯 ASCII 替代方案,我不建议您使用:

char asciitolower(char in) {
    if (in <= 'Z' && in >= 'A')
        return in - ('Z' - 'z');
    return in;
}

std::transform(data.begin(), data.end(), data.begin(), asciitolower);

请注意,tolower() 只能进行单字节字符替换,这对于许多脚本来说是不合适的,尤其是使用 UTF-8 等多字节编码时。

Adapted from Not So Frequently Asked Questions:

#include <algorithm>
#include <cctype>
#include <string>

std::string data = "Abc";
std::transform(data.begin(), data.end(), data.begin(),
    [](unsigned char c){ return std::tolower(c); });

You're really not going to get away without iterating through each character. There's no way to know whether the character is lowercase or uppercase otherwise.

If you really hate tolower(), here's a specialized ASCII-only alternative that I don't recommend you use:

char asciitolower(char in) {
    if (in <= 'Z' && in >= 'A')
        return in - ('Z' - 'z');
    return in;
}

std::transform(data.begin(), data.end(), data.begin(), asciitolower);

Be aware that tolower() can only do a per-single-byte-character substitution, which is ill-fitting for many scripts, especially if using a multi-byte-encoding like UTF-8.

审判长 2024-07-16 18:54:45

Boost 为此提供了一个字符串算法 :

#include <boost/algorithm/string.hpp>

std::string str = "HELLO, WORLD!";
boost::algorithm::to_lower(str); // modifies str

或者,对于非就地:

#include <boost/algorithm/string.hpp>

const std::string str = "HELLO, WORLD!";
const std::string lower_str = boost::algorithm::to_lower_copy(str);

Boost provides a string algorithm for this:

#include <boost/algorithm/string.hpp>

std::string str = "HELLO, WORLD!";
boost::algorithm::to_lower(str); // modifies str

Or, for non-in-place:

#include <boost/algorithm/string.hpp>

const std::string str = "HELLO, WORLD!";
const std::string lower_str = boost::algorithm::to_lower_copy(str);
就是爱搞怪 2024-07-16 18:54:45

tl;dr

使用ICU 库如果如果您不这样做,您的转换例程将在您可能甚至不知道存在的情况下悄然中断。


首先,您必须回答一个问题:您的 std::string编码是什么? 是 ISO-8859-1 吗? 或者也许是 ISO-8859-8? 或者 Windows 代码页 1252? 你用来转换大写到小写的东西知道这一点吗?(或者对于超过 0x7f 的字符它会严重失败吗?)

如果你使用的是 UTF-8 ( 8 位编码中唯一明智的选择)以 std::string 作为容器,如果您相信自己仍然可以控制事情,那么您就已经在欺骗自己了。 您将多字节字符序列存储在不了解多字节概念的容器中,并且您可以对其执行的大多数操作也不了解! 即使像 .substr() 这样简单的东西也可能会导致无效(子)字符串,因为您在多字节序列的中间进行了分割。

一旦您在任何中尝试类似 std::toupper( 'ß' )std::tolower( 'Σ' )编码,你有麻烦了。 因为 1),该标准一次仅对一个字符进行操作,因此它根本无法将 ß 转换为 SS ,这是正确的。 2),该标准一次仅对一个字符进行操作,因此它无法确定 Σ 是否位于单词的中间(其中 σ 是正确的) ,或在末尾(ς)。 另一个例子是 std::tolower( 'I' ),它应该会产生不同的结果取决于语言环境 - 几乎在您期望的任何地方 i,但在土耳其,ı(拉丁文小写字母 DOTLESS I)是正确答案(同样,在 UTF-8 编码中它超过一个字节)。

因此,每次作用于一个字符的任何大小写转换,或者更糟糕的是一次作用于字节的大小写转换,都被设计破坏了。包括目前存在的所有 std:: 变体。

还有一点是,标准库的功能取决于运行软件的计算机支持哪些语言环境...如果您的目标区域设置不属于您的客户端计算机支持的区域设置,您该怎么办?

因此,您真正寻找的是一个能够正确处理所有这些问题的字符串类,而不是任何std: :basic_string<> 变体

(C++11 注意:std::u16stringstd::u32string更好,但仍然不完美。C++20 带来了std::u8string,但所有这些所做的只是指定编码,但在许多其他方面,它们仍然对 Unicode 机制一无所知,例如规范化、排序规则……)

。 Boost看起来不错,就 API 而言,Boost.Locale 基本上是 ICU 的包装器。 如果 Boost 在 ICU 支持下编译...如果不是,Boost.Locale 仅限于为标准库编译的区域设置支持。

相信我,让 Boost 与 ICU 一起编译有时会很痛苦。 (Windows 没有包含 ICU 的预编译二进制文件,因此您必须将它们与您的应用程序一起提供,会打开一个全新的蠕虫罐......)

所以就我个人而言,我建议直接从马口中获得完整的 Unicode 支持并直接使用 ICU 库:

#include <unicode/unistr.h>
#include <unicode/ustream.h>
#include <unicode/locid.h>

#include <iostream>

int main()
{
    /*                          "Odysseus" */
    char const * someString = u8"ΟΔΥΣΣΕΥΣ";
    icu::UnicodeString someUString( someString, "UTF-8" );
    // Setting the locale explicitly here for completeness.
    // Usually you would use the user-specified system locale,
    // which *does* make a difference (see ı vs. i above).
    std::cout << someUString.toLower( "el_GR" ) << "\n";
    std::cout << someUString.toUpper( "el_GR" ) << "\n";
    return 0;
}

编译(使用 G++本示例):

g++ -Wall example.cpp -licuuc -licuio

这给出:

ὀδυσσεύς

请注意,Σ<->σ 转换位于单词中间,Σ<->ς 转换位于单词末尾。 没有基于 的解决方案可以为您提供这一点。

tl;dr

Use the ICU library. If you don't, your conversion routine will break silently on cases you are probably not even aware of existing.


First you have to answer a question: What is the encoding of your std::string? Is it ISO-8859-1? Or perhaps ISO-8859-8? Or Windows Codepage 1252? Does whatever you're using to convert upper-to-lowercase know that? (Or does it fail miserably for characters over 0x7f?)

If you are using UTF-8 (the only sane choice among the 8-bit encodings) with std::string as container, you are already deceiving yourself if you believe you are still in control of things. You are storing a multibyte character sequence in a container that is not aware of the multibyte concept, and neither are most of the operations you can perform on it! Even something as simple as .substr() could result in invalid (sub-) strings because you split in the middle of a multibyte sequence.

As soon as you try something like std::toupper( 'ß' ), or std::tolower( 'Σ' ) in any encoding, you are in trouble. Because 1), the standard only ever operates on one character at a time, so it simply cannot turn ß into SS as would be correct. And 2), the standard only ever operates on one character at a time, so it cannot decide whether Σ is in the middle of a word (where σ would be correct), or at the end (ς). Another example would be std::tolower( 'I' ), which should yield different results depending on the locale -- virtually everywhere you would expect i, but in Turkey ı (LATIN SMALL LETTER DOTLESS I) is the correct answer (which, again, is more than one byte in UTF-8 encoding).

So, any case conversion that works on a character at a time, or worse, a byte at a time, is broken by design. This includes all the std:: variants in existence at this time.

Then there is the point that the standard library, for what it is capable of doing, is depending on which locales are supported on the machine your software is running on... and what do you do if your target locale is among the not supported on your client's machine?

So what you are really looking for is a string class that is capable of dealing with all this correctly, and that is not any of the std::basic_string<> variants.

(C++11 note: std::u16string and std::u32string are better, but still not perfect. C++20 brought std::u8string, but all these do is specify the encoding. In many other respects they still remain ignorant of Unicode mechanics, like normalization, collation, ...)

While Boost looks nice, API wise, Boost.Locale is basically a wrapper around ICU. If Boost is compiled with ICU support... if it isn't, Boost.Locale is limited to the locale support compiled for the standard library.

And believe me, getting Boost to compile with ICU can be a real pain sometimes. (There are no pre-compiled binaries for Windows that include ICU, so you'd have to supply them together with your application, and that opens a whole new can of worms...)

So personally I would recommend getting full Unicode support straight from the horse's mouth and using the ICU library directly:

#include <unicode/unistr.h>
#include <unicode/ustream.h>
#include <unicode/locid.h>

#include <iostream>

int main()
{
    /*                          "Odysseus" */
    char const * someString = u8"ΟΔΥΣΣΕΥΣ";
    icu::UnicodeString someUString( someString, "UTF-8" );
    // Setting the locale explicitly here for completeness.
    // Usually you would use the user-specified system locale,
    // which *does* make a difference (see ı vs. i above).
    std::cout << someUString.toLower( "el_GR" ) << "\n";
    std::cout << someUString.toUpper( "el_GR" ) << "\n";
    return 0;
}

Compile (with G++ in this example):

g++ -Wall example.cpp -licuuc -licuio

This gives:

ὀδυσσεύς

Note that the Σ<->σ conversion in the middle of the word, and the Σ<->ς conversion at the end of the word. No <algorithm>-based solution can give you that.

与往事干杯 2024-07-16 18:54:45

使用 C++11 的基于范围的 for 循环,更简单的代码是:

#include <iostream>       // std::cout
#include <string>         // std::string
#include <locale>         // std::locale, std::tolower

int main ()
{
  std::locale loc;
  std::string str="Test String.\n";

 for(auto elem : str)
    std::cout << std::tolower(elem,loc);
}

Using range-based for loop of C++11 a simpler code would be :

#include <iostream>       // std::cout
#include <string>         // std::string
#include <locale>         // std::locale, std::tolower

int main ()
{
  std::locale loc;
  std::string str="Test String.\n";

 for(auto elem : str)
    std::cout << std::tolower(elem,loc);
}
假扮的天使 2024-07-16 18:54:45

另一种方法是使用基于范围的 for 循环和引用变量

string test = "Hello World";
for(auto& c : test)
{
   c = tolower(c);
}

cout<<test<<endl;

Another approach using range based for loop with reference variable

string test = "Hello World";
for(auto& c : test)
{
   c = tolower(c);
}

cout<<test<<endl;
梅倚清风 2024-07-16 18:54:45

如果字符串包含 ASCII 范围之外的 UTF-8 字符,则 boost::algorithm::to_lower 将不会转换这些字符。 当涉及 UTF-8 时,最好使用 boost::locale::to_lower 。 请参阅http://www.boost.org/doc/libs/1_51_0 /libs/locale/doc/html/conversions.html

If the string contains UTF-8 characters outside of the ASCII range, then boost::algorithm::to_lower will not convert those. Better use boost::locale::to_lower when UTF-8 is involved. See http://www.boost.org/doc/libs/1_51_0/libs/locale/doc/html/conversions.html

有深☉意 2024-07-16 18:54:45

这是 Stefan Mai 回复的后续内容:如果您想将转换结果放入另一个字符串中,则需要在调用 std::transform 之前预先分配其存储空间。 由于 STL 将转换后的字符存储在目标迭代器中(在循环的每次迭代中递增它),因此目标字符串不会自动调整大小,并且存在内存踩踏的风险。

#include <string>
#include <algorithm>
#include <iostream>

int main (int argc, char* argv[])
{
  std::string sourceString = "Abc";
  std::string destinationString;

  // Allocate the destination space
  destinationString.resize(sourceString.size());

  // Convert the source string to lower case
  // storing the result in destination string
  std::transform(sourceString.begin(),
                 sourceString.end(),
                 destinationString.begin(),
                 ::tolower);

  // Output the result of the conversion
  std::cout << sourceString
            << " -> "
            << destinationString
            << std::endl;
}

This is a follow-up to Stefan Mai's response: if you'd like to place the result of the conversion in another string, you need to pre-allocate its storage space prior to calling std::transform. Since STL stores transformed characters at the destination iterator (incrementing it at each iteration of the loop), the destination string will not be automatically resized, and you risk memory stomping.

#include <string>
#include <algorithm>
#include <iostream>

int main (int argc, char* argv[])
{
  std::string sourceString = "Abc";
  std::string destinationString;

  // Allocate the destination space
  destinationString.resize(sourceString.size());

  // Convert the source string to lower case
  // storing the result in destination string
  std::transform(sourceString.begin(),
                 sourceString.end(),
                 destinationString.begin(),
                 ::tolower);

  // Output the result of the conversion
  std::cout << sourceString
            << " -> "
            << destinationString
            << std::endl;
}
岁月如刀 2024-07-16 18:54:45

将字符串转换为小写字母而不用担心 std 命名空间的最简单方法如下

1:带/不带空格的字符串

#include <algorithm>
#include <iostream>
#include <string>
using namespace std;
int main(){
    string str;
    getline(cin,str);
//------------function to convert string into lowercase---------------
    transform(str.begin(), str.end(), str.begin(), ::tolower);
//--------------------------------------------------------------------
    cout<<str;
    return 0;
}

2:不带空格的字符串

#include <algorithm>
#include <iostream>
#include <string>
using namespace std;
int main(){
    string str;
    cin>>str;
//------------function to convert string into lowercase---------------
    transform(str.begin(), str.end(), str.begin(), ::tolower);
//--------------------------------------------------------------------
    cout<<str;
    return 0;
}

Simplest way to convert string into loweercase without bothering about std namespace is as follows

1:string with/without spaces

#include <algorithm>
#include <iostream>
#include <string>
using namespace std;
int main(){
    string str;
    getline(cin,str);
//------------function to convert string into lowercase---------------
    transform(str.begin(), str.end(), str.begin(), ::tolower);
//--------------------------------------------------------------------
    cout<<str;
    return 0;
}

2:string without spaces

#include <algorithm>
#include <iostream>
#include <string>
using namespace std;
int main(){
    string str;
    cin>>str;
//------------function to convert string into lowercase---------------
    transform(str.begin(), str.end(), str.begin(), ::tolower);
//--------------------------------------------------------------------
    cout<<str;
    return 0;
}
蓝眼睛不忧郁 2024-07-16 18:54:45

我写了这个简单的辅助函数:

#include <locale> // tolower

string to_lower(string s) {        
    for(char &c : s)
        c = tolower(c);
    return s;
}

用法:

string s = "TEST";
cout << to_lower("HELLO WORLD"); // output: "hello word"
cout << to_lower(s); // won't change the original variable.

I wrote this simple helper function:

#include <locale> // tolower

string to_lower(string s) {        
    for(char &c : s)
        c = tolower(c);
    return s;
}

Usage:

string s = "TEST";
cout << to_lower("HELLO WORLD"); // output: "hello word"
cout << to_lower(s); // won't change the original variable.
人海汹涌 2024-07-16 18:54:45

我自己的模板函数执行大写/小写。

#include <string>
#include <algorithm>

//
//  Lowercases string
//
template <typename T>
std::basic_string<T> lowercase(const std::basic_string<T>& s)
{
    std::basic_string<T> s2 = s;
    std::transform(s2.begin(), s2.end(), s2.begin(),
        [](const T v){ return static_cast<T>(std::tolower(v)); });
    return s2;
}

//
// Uppercases string
//
template <typename T>
std::basic_string<T> uppercase(const std::basic_string<T>& s)
{
    std::basic_string<T> s2 = s;
    std::transform(s2.begin(), s2.end(), s2.begin(),
        [](const T v){ return static_cast<T>(std::toupper(v)); });
    return s2;
}

My own template functions which performs upper / lower case.

#include <string>
#include <algorithm>

//
//  Lowercases string
//
template <typename T>
std::basic_string<T> lowercase(const std::basic_string<T>& s)
{
    std::basic_string<T> s2 = s;
    std::transform(s2.begin(), s2.end(), s2.begin(),
        [](const T v){ return static_cast<T>(std::tolower(v)); });
    return s2;
}

//
// Uppercases string
//
template <typename T>
std::basic_string<T> uppercase(const std::basic_string<T>& s)
{
    std::basic_string<T> s2 = s;
    std::transform(s2.begin(), s2.end(), s2.begin(),
        [](const T v){ return static_cast<T>(std::toupper(v)); });
    return s2;
}
苦笑流年记忆 2024-07-16 18:54:45

标准 C++ 本地化库中的 std::ctype::tolower() 将为您正确执行此操作。 以下是从 tolower 参考页面中提取的示例

#include <locale>
#include <iostream>

int main () {
  std::locale::global(std::locale("en_US.utf8"));
  std::wcout.imbue(std::locale());
  std::wcout << "In US English UTF-8 locale:\n";
  auto& f = std::use_facet<std::ctype<wchar_t>>(std::locale());
  std::wstring str = L"HELLo, wORLD!";
  std::wcout << "Lowercase form of the string '" << str << "' is ";
  f.tolower(&str[0], &str[0] + str.size());
  std::wcout << "'" << str << "'\n";
}

std::ctype::tolower() from the standard C++ Localization library will correctly do this for you. Here is an example extracted from the tolower reference page

#include <locale>
#include <iostream>

int main () {
  std::locale::global(std::locale("en_US.utf8"));
  std::wcout.imbue(std::locale());
  std::wcout << "In US English UTF-8 locale:\n";
  auto& f = std::use_facet<std::ctype<wchar_t>>(std::locale());
  std::wstring str = L"HELLo, wORLD!";
  std::wcout << "Lowercase form of the string '" << str << "' is ";
  f.tolower(&str[0], &str[0] + str.size());
  std::wcout << "'" << str << "'\n";
}
ˇ宁静的妩媚 2024-07-16 18:54:45

Boost 的替代方案是 POCO (pocoproject.org)。

POCO 提供两种变体:

  1. 第一个变体在不更改原始字符串的情况下进行复制。
  2. 第二种变体就地更改了原始字符串。
    “In Place”版本的名称中始终带有“InPlace”。

两个版本的演示如下:

#include "Poco/String.h"
using namespace Poco;

std::string hello("Stack Overflow!");

// Copies "STACK OVERFLOW!" into 'newString' without altering 'hello.'
std::string newString(toUpper(hello));

// Changes newString in-place to read "stack overflow!"
toLowerInPlace(newString);

An alternative to Boost is POCO (pocoproject.org).

POCO provides two variants:

  1. The first variant makes a copy without altering the original string.
  2. The second variant changes the original string in place.
    "In Place" versions always have "InPlace" in the name.

Both versions are demonstrated below:

#include "Poco/String.h"
using namespace Poco;

std::string hello("Stack Overflow!");

// Copies "STACK OVERFLOW!" into 'newString' without altering 'hello.'
std::string newString(toUpper(hello));

// Changes newString in-place to read "stack overflow!"
toLowerInPlace(newString);
最美的太阳 2024-07-16 18:54:45

由于没有一个答案提到即将推出的 Ranges 库,该库自 C++20 起就在标准库中可用,并且当前单独可用 在 GitHub 上作为 range-v3,我想添加一种使用它来执行此转换的方法。

就地修改字符串:

str |= action::transform([](unsigned char c){ return std::tolower(c); });

生成新字符串:(

auto new_string = original_string
    | view::transform([](unsigned char c){ return std::tolower(c); });

不要忘记 #include 和所需的 Ranges 标头。)

注意:使用 unsigned char 因为 lambda 的参数受到 cppreference,其中指出:

中的所有其他函数一样,如果参数的值不能表示为 unsigned char<,则 std::tolower 的行为未定义/code> 也不等于 EOF。 要与普通 char(或 signed char)安全地使用这些函数,应首先将参数转换为 unsigned char

char my_tolower(char ch) 
  { 
      return static_cast(std::tolower(static_cast(ch))); 
  } 
  

同样,当迭代器的值类型为 charsigned char 时,它们不应该直接与标准算法一起使用。 相反,首先将值转换为 unsigned char

std::string str_tolower(std::string s) { 
      std::transform(s.begin(), s.end(), s.begin(),  
                  // static_cast(std::tolower) // 错误 
                  // [](int c){ return std::tolower(c);   } // 错误的 
                  // [](char c){ return std::tolower(c);   } // 错误的 
                     [](unsigned char c){ 返回 std::tolower(c);   } // 正确的 
                    ); 
      返回 s; 
  } 
  

Since none of the answers mentioned the upcoming Ranges library, which is available in the standard library since C++20, and currently separately available on GitHub as range-v3, I would like to add a way to perform this conversion using it.

To modify the string in-place:

str |= action::transform([](unsigned char c){ return std::tolower(c); });

To generate a new string:

auto new_string = original_string
    | view::transform([](unsigned char c){ return std::tolower(c); });

(Don't forget to #include <cctype> and the required Ranges headers.)

Note: the use of unsigned char as the argument to the lambda is inspired by cppreference, which states:

Like all other functions from <cctype>, the behavior of std::tolower is undefined if the argument's value is neither representable as unsigned char nor equal to EOF. To use these functions safely with plain chars (or signed chars), the argument should first be converted to unsigned char:

char my_tolower(char ch)
{
    return static_cast<char>(std::tolower(static_cast<unsigned char>(ch)));
}

Similarly, they should not be directly used with standard algorithms when the iterator's value type is char or signed char. Instead, convert the value to unsigned char first:

std::string str_tolower(std::string s) {
    std::transform(s.begin(), s.end(), s.begin(), 
                // static_cast<int(*)(int)>(std::tolower)         // wrong
                // [](int c){ return std::tolower(c); }           // wrong
                // [](char c){ return std::tolower(c); }          // wrong
                   [](unsigned char c){ return std::tolower(c); } // correct
                  );
    return s;
}
墨小沫ゞ 2024-07-16 18:54:45

在 Microsoft 平台上,您可以使用 strlwr 系列函数: http://msdn.microsoft.com/en-us/library/hkxwh33z.aspx

// crt_strlwr.c
// compile with: /W3
// This program uses _strlwr and _strupr to create
// uppercase and lowercase copies of a mixed-case string.
#include <string.h>
#include <stdio.h>

int main( void )
{
   char string[100] = "The String to End All Strings!";
   char * copy1 = _strdup( string ); // make two copies
   char * copy2 = _strdup( string );

   _strlwr( copy1 ); // C4996
   _strupr( copy2 ); // C4996

   printf( "Mixed: %s\n", string );
   printf( "Lower: %s\n", copy1 );
   printf( "Upper: %s\n", copy2 );

   free( copy1 );
   free( copy2 );
}

On microsoft platforms you can use the strlwr family of functions: http://msdn.microsoft.com/en-us/library/hkxwh33z.aspx

// crt_strlwr.c
// compile with: /W3
// This program uses _strlwr and _strupr to create
// uppercase and lowercase copies of a mixed-case string.
#include <string.h>
#include <stdio.h>

int main( void )
{
   char string[100] = "The String to End All Strings!";
   char * copy1 = _strdup( string ); // make two copies
   char * copy2 = _strdup( string );

   _strlwr( copy1 ); // C4996
   _strupr( copy2 ); // C4996

   printf( "Mixed: %s\n", string );
   printf( "Lower: %s\n", copy1 );
   printf( "Upper: %s\n", copy2 );

   free( copy1 );
   free( copy2 );
}
深者入戏 2024-07-16 18:54:45

有一种方法可以将大写字母转换为小写字母无需进行 if 测试,而且非常简单。 isupper() 函数/宏对 clocale.h 的使用应该可以解决与您的位置相关的问题,但如果没有,您可以随时根据自己的喜好调整 UtoL[]。

鉴于 C 的字符实际上只是 8 位整数(暂时忽略宽字符集),您可以创建一个 256 字节数组来保存一组替代字符,并在转换函数中使用字符串中的字符作为下标到转换数组。

不过,不要采用 1 对 1 映射,而是为大写数组成员提供小写字符的 BYTE int 值。 您可能会发现 islower() 和 isupper() 在这里很有用。

在此处输入图像描述

代码如下所示...

#include <clocale>
static char UtoL[256];
// ----------------------------------------------------------------------------
void InitUtoLMap()  {
    for (int i = 0; i < sizeof(UtoL); i++)  {
        if (isupper(i)) {
            UtoL[i] = (char)(i + 32);
        }   else    {
            UtoL[i] = i;
        }
    }
}
// ----------------------------------------------------------------------------
char *LowerStr(char *szMyStr) {
    char *p = szMyStr;
    // do conversion in-place so as not to require a destination buffer
    while (*p) {        // szMyStr must be null-terminated
        *p = UtoL[*p];  
        p++;
    }
    return szMyStr;
}
// ----------------------------------------------------------------------------
int main() {
    time_t start;
    char *Lowered, Upper[128];
    InitUtoLMap();
    strcpy(Upper, "Every GOOD boy does FINE!");

    Lowered = LowerStr(Upper);
    return 0;
}

这种方法同时允许您重新映射任何您想要更改的其他字符。

当在现代处理器上运行时,这种方法有一个巨大的优势,不需要进行分支预测,因为没有包含分支的 if 测试。 这可以节省 CPU 的分支预测逻辑用于其他循环,并且可以防止管道停顿。

有些人可能会认为这种方法与用于将 EBCDIC 转换为 ASCII 的方法相同。

There is a way to convert upper case to lower WITHOUT doing if tests, and it's pretty straight-forward. The isupper() function/macro's use of clocale.h should take care of problems relating to your location, but if not, you can always tweak the UtoL[] to your heart's content.

Given that C's characters are really just 8-bit ints (ignoring the wide character sets for the moment) you can create a 256 byte array holding an alternative set of characters, and in the conversion function use the chars in your string as subscripts into the conversion array.

Instead of a 1-for-1 mapping though, give the upper-case array members the BYTE int values for the lower-case characters. You may find islower() and isupper() useful here.

enter image description here

The code looks like this...

#include <clocale>
static char UtoL[256];
// ----------------------------------------------------------------------------
void InitUtoLMap()  {
    for (int i = 0; i < sizeof(UtoL); i++)  {
        if (isupper(i)) {
            UtoL[i] = (char)(i + 32);
        }   else    {
            UtoL[i] = i;
        }
    }
}
// ----------------------------------------------------------------------------
char *LowerStr(char *szMyStr) {
    char *p = szMyStr;
    // do conversion in-place so as not to require a destination buffer
    while (*p) {        // szMyStr must be null-terminated
        *p = UtoL[*p];  
        p++;
    }
    return szMyStr;
}
// ----------------------------------------------------------------------------
int main() {
    time_t start;
    char *Lowered, Upper[128];
    InitUtoLMap();
    strcpy(Upper, "Every GOOD boy does FINE!");

    Lowered = LowerStr(Upper);
    return 0;
}

This approach will, at the same time, allow you to remap any other characters you wish to change.

This approach has one huge advantage when running on modern processors, there is no need to do branch prediction as there are no if tests comprising branching. This saves the CPU's branch prediction logic for other loops, and tends to prevent pipeline stalls.

Some here may recognize this approach as the same one used to convert EBCDIC to ASCII.

旧夏天 2024-07-16 18:54:45

是否有一种 100% 有效的替代方案?

在选择小写方法之前,您需要问自己几个问题。

  1. 字符串是如何编码的? 纯 ASCII 码? UTF-8? 某种形式的扩展 ASCII 遗留编码?
  2. 无论如何,小写是什么意思? 大小写映射规则因语言而异! 您想要针对用户区域设置进行本地化的内容吗? 您是否想要在您的软件运行的所有系统上表现一致? 您是否只想小写 ASCII 字符并传递其他所有内容?
  3. 有哪些库可用?

一旦您找到了这些问题的答案,您就可以开始寻找适合您需求的解决方案。 没有一种方法适合所有地方的所有人!

Is there an alternative which works 100% of the time?

No

There are several questions you need to ask yourself before choosing a lowercasing method.

  1. How is the string encoded? plain ASCII? UTF-8? some form of extended ASCII legacy encoding?
  2. What do you mean by lower case anyway? Case mapping rules vary between languages! Do you want something that is localised to the users locale? do you want something that behaves consistently on all systems your software runs on? Do you just want to lowercase ASCII characters and pass through everything else?
  3. What libraries are available?

Once you have answers to those questions you can start looking for a soloution that fits your needs. There is no one size fits all that works for everyone everywhere!

云醉月微眠 2024-07-16 18:54:45

C++ 没有为 std::string 实现 tolowertoupper 方法,但可用于 char 。 人们可以轻松地读取字符串中的每个字符,将其转换为所需的大小写并将其放回字符串中。
不使用任何第三方库的示例代码:

#include<iostream>
    
int main(){
    std::string str = std::string("How ARe You");
    for(char &ch : str){
        ch = std::tolower(ch);
    }
    std::cout<<str<<std::endl;
    return 0;
}

对于字符串上基于字符的操作: 对于字符串中的每个字符

C++ doesn't have tolower or toupper methods implemented for std::string, but it is available for char. One can easily read each char of string, convert it into required case and put it back into string.
A sample code without using any third party library:

#include<iostream>
    
int main(){
    std::string str = std::string("How ARe You");
    for(char &ch : str){
        ch = std::tolower(ch);
    }
    std::cout<<str<<std::endl;
    return 0;
}

For character based operation on string : For every character in string

夜吻♂芭芘 2024-07-16 18:54:45

如果您想要简单的东西,这里有一个宏观技术:

#define STRTOLOWER(x) std::transform (x.begin(), x.end(), x.begin(), ::tolower)
#define STRTOUPPER(x) std::transform (x.begin(), x.end(), x.begin(), ::toupper)
#define STRTOUCFIRST(x) std::transform (x.begin(), x.begin()+1, x.begin(),  ::toupper); std::transform (x.begin()+1, x.end(),   x.begin()+1,::tolower)

但是,请注意@AndreasSpindler对这个答案的评论仍然是一个重要的考虑因素,但是,如果您正在处理的内容不只是 ASCII 字符。

Here's a macro technique if you want something simple:

#define STRTOLOWER(x) std::transform (x.begin(), x.end(), x.begin(), ::tolower)
#define STRTOUPPER(x) std::transform (x.begin(), x.end(), x.begin(), ::toupper)
#define STRTOUCFIRST(x) std::transform (x.begin(), x.begin()+1, x.begin(),  ::toupper); std::transform (x.begin()+1, x.end(),   x.begin()+1,::tolower)

However, note that @AndreasSpindler's comment on this answer still is an important consideration, however, if you're working on something that isn't just ASCII characters.

平安喜乐 2024-07-16 18:54:45
// tolower example (C++)
#include <iostream>       // std::cout
#include <string>         // std::string
#include <locale>         // std::locale, std::tolower

int main ()
{
  std::locale loc;
  std::string str="Test String.\n";
  for (std::string::size_type i=0; i<str.length(); ++i)
    std::cout << std::tolower(str[i],loc);
  return 0;
}

欲了解更多信息: http://www.cplusplus.com/reference/locale/tolower/< /a>

// tolower example (C++)
#include <iostream>       // std::cout
#include <string>         // std::string
#include <locale>         // std::locale, std::tolower

int main ()
{
  std::locale loc;
  std::string str="Test String.\n";
  for (std::string::size_type i=0; i<str.length(); ++i)
    std::cout << std::tolower(str[i],loc);
  return 0;
}

For more information: http://www.cplusplus.com/reference/locale/tolower/

如何视而不见 2024-07-16 18:54:45

试试这个功能:)

string toLowerCase(string str) {

    int str_len = str.length();

    string final_str = "";

    for(int i=0; i<str_len; i++) {

        char character = str[i];

        if(character>=65 && character<=92) {

            final_str += (character+32);

        } else {

            final_str += character;

        }

    }

    return final_str;

}

Try this function :)

string toLowerCase(string str) {

    int str_len = str.length();

    string final_str = "";

    for(int i=0; i<str_len; i++) {

        char character = str[i];

        if(character>=65 && character<=92) {

            final_str += (character+32);

        } else {

            final_str += character;

        }

    }

    return final_str;

}
马蹄踏│碎落叶 2024-07-16 18:54:45

看看优秀的 c++17 cpp-unicodelib (GitHub)。 它是单文件且仅包含标头。


#include <exception>
#include <iostream>
#include <codecvt>

// cpp-unicodelib, downloaded from GitHub
#include "unicodelib.h"
#include "unicodelib_encodings.h"

using namespace std;
using namespace unicode;

// converter that allows displaying a Unicode32 string
wstring_convert<codecvt_utf8<char32_t>, char32_t> converter;

std::u32string  in = U"Je suis là!";
cout << converter.to_bytes(in) << endl;

std::u32string  lc = to_lowercase(in);
cout << converter.to_bytes(lc) << endl;

输出

Je suis là!
je suis là!

Have a look at the excellent c++17 cpp-unicodelib (GitHub). It's single-file and header-only.


#include <exception>
#include <iostream>
#include <codecvt>

// cpp-unicodelib, downloaded from GitHub
#include "unicodelib.h"
#include "unicodelib_encodings.h"

using namespace std;
using namespace unicode;

// converter that allows displaying a Unicode32 string
wstring_convert<codecvt_utf8<char32_t>, char32_t> converter;

std::u32string  in = U"Je suis là!";
cout << converter.to_bytes(in) << endl;

std::u32string  lc = to_lowercase(in);
cout << converter.to_bytes(lc) << endl;

Output

Je suis là!
je suis là!
时光沙漏 2024-07-16 18:54:45

关于此解决方案如何工作的说明:


string test = "Hello World";
for(auto& c : test)
{
   c = tolower(c);
}

说明:

for(auto& c : test ) 是一个基于范围的 for 循环 kind
for (range_declaration:range_expression)loop_statement

  1. range_declarationauto& c
    这里的 auto 说明符 用于自动类型推导。 因此,类型会从变量初始值设定项中扣除。

  2. 范围表达式测试
    本例中的范围是字符串 test 的字符。

字符串 test 的字符可通过标识符 c 作为 for 循环内的引用。

An explanation of how this solution works:


string test = "Hello World";
for(auto& c : test)
{
   c = tolower(c);
}

Explanation:

for(auto& c : test) is a range-based for loop of the kind
for (range_declaration:range_expression)loop_statement:

  1. range_declaration: auto& c
    Here the auto specifier is used for for automatic type deduction. So the type gets deducted from the variables initializer.

  2. range_expression: test
    The range in this case are the characters of string test.

The characters of the string test are available as a reference inside the for loop through identifier c.

明月夜 2024-07-16 18:54:45

使用 fplus 库中的 fplus::to_lower_case()

fplus API 搜索中搜索 to_lower_case

示例:

fplus::to_lower_case(std::string("ABC")) == std::string("abc");

Use fplus::to_lower_case() from fplus library.

Search to_lower_case in fplus API Search

Example:

fplus::to_lower_case(std::string("ABC")) == std::string("abc");
陈甜 2024-07-16 18:54:45

Google 的 absl 库有 absl::AsciiStrToLower / absl::AsciiStrToUpper

Google's absl library has absl::AsciiStrToLower / absl::AsciiStrToUpper

罪#恶を代价 2024-07-16 18:54:45

由于您使用的是 std::string,因此您正在使用 c++。 如果使用 c++11 或更高版本,则不需要任何花哨的东西。 如果 wordsvector,则:

    for (auto & str : words) {
        for(auto & ch : str)
            ch = tolower(ch);
    }

没有奇怪的异常。 可能想要使用 w_char,但否则这应该完成所有工作。

Since you are using std::string, you are using c++. If using c++11 or higher, this doesn't need anything fancy. If words is vector<string>, then:

    for (auto & str : words) {
        for(auto & ch : str)
            ch = tolower(ch);
    }

Doesn't have strange exceptions. Might want to use w_char's but otherwise this should do it all in place.

谁许谁一生繁华 2024-07-16 18:54:45

从不同的角度来看,有一个非常常见的用例,即对 Unicode 字符串执行语言环境中立大小写折叠。 对于这种情况,当您意识到可折叠字符集是有限且相对较小(< 2000 个 Unicode 代码点)时,可以获得良好的大小写折叠性能。 它恰好与生成的完美散列(保证零冲突)配合得很好,可用于将每个输入字符转换为其小写等效字符。

使用 UTF-8,您必须认真对待多字节字符并进行相应的迭代。 然而,UTF-8 具有相当简单的编码规则,使得此操作高效。

有关更多详细信息,包括指向 Unicode 标准相关部分和完美哈希生成器的链接,请参阅我的答案 此处,针对问题如何在 C++ 中实现与 unicode 无关的大小写不敏感比较

For a different perspective, there is a very common use case which is to perform locale neutral case folding on Unicode strings. For this case, it is possible to get good case folding performance when you realize that the set of foldable characters is finite and relatively small (< 2000 Unicode code points). It happens to work very well with a generated perfect hash (guaranteed zero collisions) can be used to convert every input character to its lowercase equivalent.

With UTF-8, you do have to be conscientious of multi-byte characters and iterate accordingly. However, UTF-8 has fairly simple encoding rules that make this operation efficient.

For more details, including links to the relevant parts of the Unicode standard and a perfect hash generator, see my answer here, to the question How to achieve unicode-agnostic case insensitive comparison in C++.

南渊 2024-07-16 18:54:45

代码片段

#include<bits/stdc++.h>
using namespace std;


int main ()
{
    ios::sync_with_stdio(false);

    string str="String Convert\n";

    for(int i=0; i<str.size(); i++)
    {
      str[i] = tolower(str[i]);
    }
    cout<<str<<endl;

    return 0;
}

Code Snippet

#include<bits/stdc++.h>
using namespace std;


int main ()
{
    ios::sync_with_stdio(false);

    string str="String Convert\n";

    for(int i=0; i<str.size(); i++)
    {
      str[i] = tolower(str[i]);
    }
    cout<<str<<endl;

    return 0;
}
帅气称霸 2024-07-16 18:54:45

为 ASCII 字符串 to_lower 添加一些可选库,这两个库都是生产级别的并且具有微优化,预计比此处现有的答案更快(TODO:添加基准结果)。

Facebook 的 愚蠢

void toLowerAscii(char* str, size_t length)

Google 的 绕绳

void AsciiStrToLower(std::string* s);

Add some optional libraries for ASCII string to_lower, both of which are production level and with micro-optimizations, which is expected to be faster than the existed answers here(TODO: add benchmark result).

Facebook's Folly:

void toLowerAscii(char* str, size_t length)

Google's Abseil:

void AsciiStrToLower(std::string* s);
○愚か者の日 2024-07-16 18:54:45

我编写了一个适用于任何字符串的模板化版本:

#include <type_traits> // std::decay
#include <ctype.h>    // std::toupper & std::tolower


template <class T = void> struct farg_t { using type = T; };
template <template<typename ...> class T1, 
class T2> struct farg_t <T1<T2>> { using type = T2*; };
//---------------

template<class T, class T2 = 
typename std::decay< typename farg_t<T>::type >::type>
void ToUpper(T& str) { T2 t = &str[0]; 
for (; *t; ++t) *t = std::toupper(*t); }


template<class T, class T2 = typename std::decay< typename 
farg_t<T>::type >::type>
void Tolower(T& str) { T2 t = &str[0]; 
for (; *t; ++t) *t = std::tolower(*t); }

使用 gcc 编译器进行测试:

#include <iostream>
#include "upove_code.h"

int main()
{

    std::string str1 = "hEllo ";
    char str2 [] = "wOrld";

    ToUpper(str1);
    ToUpper(str2);
    std::cout << str1 << str2 << '\n'; 
    Tolower(str1);
    Tolower(str2);
    std::cout << str1 << str2 << '\n'; 
    return 0;
}

输出:

>HELLO WORLD
>
>hello world

I wrote a templated version that works with any string :

#include <type_traits> // std::decay
#include <ctype.h>    // std::toupper & std::tolower


template <class T = void> struct farg_t { using type = T; };
template <template<typename ...> class T1, 
class T2> struct farg_t <T1<T2>> { using type = T2*; };
//---------------

template<class T, class T2 = 
typename std::decay< typename farg_t<T>::type >::type>
void ToUpper(T& str) { T2 t = &str[0]; 
for (; *t; ++t) *t = std::toupper(*t); }


template<class T, class T2 = typename std::decay< typename 
farg_t<T>::type >::type>
void Tolower(T& str) { T2 t = &str[0]; 
for (; *t; ++t) *t = std::tolower(*t); }

Tested with gcc compiler:

#include <iostream>
#include "upove_code.h"

int main()
{

    std::string str1 = "hEllo ";
    char str2 [] = "wOrld";

    ToUpper(str1);
    ToUpper(str2);
    std::cout << str1 << str2 << '\n'; 
    Tolower(str1);
    Tolower(str2);
    std::cout << str1 << str2 << '\n'; 
    return 0;
}

output:

>HELLO WORLD
>
>hello world
半世蒼涼 2024-07-16 18:54:45

这可能是另一个简单的版本,用于将大写字母转换为小写字母,反之亦然。 我使用VS2017社区版来编译这个源代码。

#include <iostream>
#include <string>
using namespace std;

int main()
{
    std::string _input = "lowercasetouppercase";
#if 0
    // My idea is to use the ascii value to convert
    char upperA = 'A';
    char lowerA = 'a';

    cout << (int)upperA << endl; // ASCII value of 'A' -> 65
    cout << (int)lowerA << endl; // ASCII value of 'a' -> 97
    // 97-65 = 32; // Difference of ASCII value of upper and lower a
#endif // 0

    cout << "Input String = " << _input.c_str() << endl;
    for (int i = 0; i < _input.length(); ++i)
    {
        _input[i] -= 32; // To convert lower to upper
#if 0
        _input[i] += 32; // To convert upper to lower
#endif // 0
    }
    cout << "Output String = " << _input.c_str() << endl;

    return 0;
}

注意:如果有特殊字符则需要使用条件检查来处理。

This could be another simple version to convert uppercase to lowercase and vice versa. I used VS2017 community version to compile this source code.

#include <iostream>
#include <string>
using namespace std;

int main()
{
    std::string _input = "lowercasetouppercase";
#if 0
    // My idea is to use the ascii value to convert
    char upperA = 'A';
    char lowerA = 'a';

    cout << (int)upperA << endl; // ASCII value of 'A' -> 65
    cout << (int)lowerA << endl; // ASCII value of 'a' -> 97
    // 97-65 = 32; // Difference of ASCII value of upper and lower a
#endif // 0

    cout << "Input String = " << _input.c_str() << endl;
    for (int i = 0; i < _input.length(); ++i)
    {
        _input[i] -= 32; // To convert lower to upper
#if 0
        _input[i] += 32; // To convert upper to lower
#endif // 0
    }
    cout << "Output String = " << _input.c_str() << endl;

    return 0;
}

Note: if there are special characters then need to be handled using condition check.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文