当前位置：文江博客话题详情

不区分大小写 std::string.find()

发布于 2024-09-07 07:52:29 字数 270 浏览 13 评论 0原文

我正在使用 std::string 的 find() 方法来测试一个字符串是否是另一个字符串的子字符串。现在我需要相同内容的不区分大小写的版本。对于字符串比较，我总是可以使用 stricmp() 但似乎没有 stristr()。

我找到了各种答案，大多数人建议使用 Boost 这在我的情况下不是一个选项。此外，我需要支持 std::wstring/wchar_t。有什么想法吗？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

維他命╮ 2024-09-14 07:52:29

您可以将 std::search 与自定义谓词。

#include <locale>
#include <iostream>
#include <algorithm>
using namespace std;

// templated version of my_equal so it could work with both char and wchar_t
template<typename charT>
struct my_equal {
    my_equal( const std::locale& loc ) : loc_(loc) {}
    bool operator()(charT ch1, charT ch2) {
        return std::toupper(ch1, loc_) == std::toupper(ch2, loc_);
    }
private:
    const std::locale& loc_;
};

// find substring (case insensitive)
template<typename T>
int ci_find_substr( const T& str1, const T& str2, const std::locale& loc = std::locale() )
{
    typename T::const_iterator it = std::search( str1.begin(), str1.end(), 
        str2.begin(), str2.end(), my_equal<typename T::value_type>(loc) );
    if ( it != str1.end() ) return it - str1.begin();
    else return -1; // not found
}

int main(int arc, char *argv[]) 
{
    // string test
    std::string str1 = "FIRST HELLO";
    std::string str2 = "hello";
    int f1 = ci_find_substr( str1, str2 );

    // wstring test
    std::wstring wstr1 = L"ОПЯТЬ ПРИВЕТ";
    std::wstring wstr2 = L"привет";
    int f2 = ci_find_substr( wstr1, wstr2 );

    return 0;
}

You could use std::search with a custom predicate.

#include <locale>
#include <iostream>
#include <algorithm>
using namespace std;

// templated version of my_equal so it could work with both char and wchar_t
template<typename charT>
struct my_equal {
    my_equal( const std::locale& loc ) : loc_(loc) {}
    bool operator()(charT ch1, charT ch2) {
        return std::toupper(ch1, loc_) == std::toupper(ch2, loc_);
    }
private:
    const std::locale& loc_;
};

// find substring (case insensitive)
template<typename T>
int ci_find_substr( const T& str1, const T& str2, const std::locale& loc = std::locale() )
{
    typename T::const_iterator it = std::search( str1.begin(), str1.end(), 
        str2.begin(), str2.end(), my_equal<typename T::value_type>(loc) );
    if ( it != str1.end() ) return it - str1.begin();
    else return -1; // not found
}

int main(int arc, char *argv[]) 
{
    // string test
    std::string str1 = "FIRST HELLO";
    std::string str2 = "hello";
    int f1 = ci_find_substr( str1, str2 );

    // wstring test
    std::wstring wstr1 = L"ОПЯТЬ ПРИВЕТ";
    std::wstring wstr2 = L"привет";
    int f2 = ci_find_substr( wstr1, wstr2 );

    return 0;
}

回复收藏 0 原文

傻比既视感 2024-09-14 07:52:29

新的 C++11 风格：

#include <algorithm>
#include <string>
#include <cctype>

/// Try to find in the Haystack the Needle - ignore case
bool findStringIC(const std::string & strHaystack, const std::string & strNeedle)
{
  auto it = std::search(
    strHaystack.begin(), strHaystack.end(),
    strNeedle.begin(),   strNeedle.end(),
    [](unsigned char ch1, unsigned char ch2) { return std::toupper(ch1) == std::toupper(ch2); }
  );
  return (it != strHaystack.end() );
}

std::search 的说明可以在 cplusplus.com 上找到。

The new C++11 style:

#include <algorithm>
#include <string>
#include <cctype>

/// Try to find in the Haystack the Needle - ignore case
bool findStringIC(const std::string & strHaystack, const std::string & strNeedle)
{
  auto it = std::search(
    strHaystack.begin(), strHaystack.end(),
    strNeedle.begin(),   strNeedle.end(),
    [](unsigned char ch1, unsigned char ch2) { return std::toupper(ch1) == std::toupper(ch2); }
  );
  return (it != strHaystack.end() );
}

Explanation of the std::search can be found on cplusplus.com.

回复收藏 0 原文

淡淡绿茶香 2024-09-14 07:52:29

为什么不使用 Boost.StringAlgo：

#include <boost/algorithm/string/find.hpp>

bool Foo()
{
   //case insensitive find

   std::string str("Hello");

   boost::iterator_range<std::string::const_iterator> rng;

   rng = boost::ifind_first(str, std::string("EL"));

   return rng;
}

why not use Boost.StringAlgo:

#include <boost/algorithm/string/find.hpp>

bool Foo()
{
   //case insensitive find

   std::string str("Hello");

   boost::iterator_range<std::string::const_iterator> rng;

   rng = boost::ifind_first(str, std::string("EL"));

   return rng;
}

回复收藏 0 原文

待＂谢繁草 2024-09-14 07:52:29

为什么不在调用 find() 之前将两个字符串都转换为小写呢？

tolower

注意：

对于长字符串效率较低。
注意国际化问题。

回复收藏 0 原文

梦言归人 2024-09-14 07:52:29

由于您正在进行子字符串搜索（std::string）而不是元素（字符）搜索，因此不幸的是，据我所知，标准库中没有可以立即访问的现有解决方案来执行此操作。

不过，这很容易做到：只需将两个字符串都转换为大写（或都转换为小写 - 我在本例中选择大写）。

std::string upper_string(const std::string& str)
{
    string upper;
    transform(str.begin(), str.end(), std::back_inserter(upper), toupper);
    return upper;
}

std::string::size_type find_str_ci(const std::string& str, const std::string& substr)
{
    return upper(str).find(upper(substr) );
}

这不是一个快速的解决方案（接近悲观领域），但这是我所知道的唯一一个临时解决方案。如果您担心效率，那么实现自己的不区分大小写的子字符串查找器也不难。

另外，我需要支持
std::wstring/wchar_t。有什么想法吗？

语言环境中的 tolower/toupper 也适用于宽字符串，因此上面的解决方案应该同样适用（简单地将 std::string 更改为 std::wstring）。

[编辑] 正如所指出的，另一种方法是通过指定您自己的字符特征来从 basic_string 调整您自己的不区分大小写的字符串类型。如果您可以接受给定字符串类型的所有字符串搜索、比较等不区分大小写，则此方法有效。

Since you're doing substring searches (std::string) and not element (character) searches, there's unfortunately no existing solution I'm aware of that's immediately accessible in the standard library to do this.

Nevertheless, it's easy enough to do: simply convert both strings to upper case (or both to lower case - I chose upper in this example).

std::string upper_string(const std::string& str)
{
    string upper;
    transform(str.begin(), str.end(), std::back_inserter(upper), toupper);
    return upper;
}

std::string::size_type find_str_ci(const std::string& str, const std::string& substr)
{
    return upper(str).find(upper(substr) );
}

This is not a fast solution (bordering into pessimization territory) but it's the only one I know of off-hand. It's also not that hard to implement your own case-insensitive substring finder if you are worried about efficiency.

Additionally, I need to support
std::wstring/wchar_t. Any ideas?

tolower/toupper in locale will work on wide-strings as well, so the solution above should be just as applicable (simple change std::string to std::wstring).

[Edit] An alternative, as pointed out, is to adapt your own case-insensitive string type from basic_string by specifying your own character traits. This works if you can accept all string searches, comparisons, etc. to be case-insensitive for a given string type.

回复收藏 0 原文

感情旳空白 2024-09-14 07:52:29

如果您想根据 Unicode 和区域设置规则进行“真实”比较，请使用 ICU 的 Collator 类。

回复收藏 0 原文

澜川若宁 2024-09-14 07:52:29

提供 Boost 版本也是有意义的：这将修改原始字符串。

#include <boost/algorithm/string.hpp>

string str1 = "hello world!!!";
string str2 = "HELLO";
boost::algorithm::to_lower(str1)
boost::algorithm::to_lower(str2)

if (str1.find(str2) != std::string::npos)
{
    // str1 contains str2
}

或使用完美的 boost xpression 库

#include <boost/xpressive/xpressive.hpp>
using namespace boost::xpressive;
....
std::string long_string( "very LonG string" );
std::string word("long");
smatch what;
sregex re = sregex::compile(word, boost::xpressive::icase);
if( regex_match( long_string, what, re ) )
{
    cout << word << " found!" << endl;
}

在此示例中，您应该请注意，您的搜索词没有任何正则表达式特殊字符。

Also make sense to provide Boost version: This will modify original strings.

#include <boost/algorithm/string.hpp>

string str1 = "hello world!!!";
string str2 = "HELLO";
boost::algorithm::to_lower(str1)
boost::algorithm::to_lower(str2)

if (str1.find(str2) != std::string::npos)
{
    // str1 contains str2
}

or using perfect boost xpression library

#include <boost/xpressive/xpressive.hpp>
using namespace boost::xpressive;
....
std::string long_string( "very LonG string" );
std::string word("long");
smatch what;
sregex re = sregex::compile(word, boost::xpressive::icase);
if( regex_match( long_string, what, re ) )
{
    cout << word << " found!" << endl;
}

In this example you should pay attention that your search word don't have any regex special characters.

回复收藏 0 原文

回心转意 2024-09-14 07:52:29

#include <iostream>
using namespace std;

template <typename charT>
struct ichar {
    operator charT() const { return toupper(x); }
    charT x;
};
template <typename charT>
static basic_string<ichar<charT> > *istring(basic_string<charT> &s) { return (basic_string<ichar<charT> > *)&s; }
template <typename charT>
static ichar<charT> *istring(const charT *s) { return (ichar<charT> *)s; }

int main()
{
    string s = "The STRING";
    wstring ws = L"The WSTRING";
    cout << istring(s)->find(istring("str")) << " " << istring(ws)->find(istring(L"wstr"))  << endl;
}

有点脏，但是很短而且很漂亮。快速地。

#include <iostream>
using namespace std;

template <typename charT>
struct ichar {
    operator charT() const { return toupper(x); }
    charT x;
};
template <typename charT>
static basic_string<ichar<charT> > *istring(basic_string<charT> &s) { return (basic_string<ichar<charT> > *)&s; }
template <typename charT>
static ichar<charT> *istring(const charT *s) { return (ichar<charT> *)s; }

int main()
{
    string s = "The STRING";
    wstring ws = L"The WSTRING";
    cout << istring(s)->find(istring("str")) << " " << istring(ws)->find(istring(L"wstr"))  << endl;
}

A little bit dirty, but short & fast.

回复收藏 0 原文

盛装女皇 2024-09-14 07:52:29

我喜欢 Kiril V. Lyadvinsky 和抄送。但我的问题比不区分大小写更具体一些；我需要一个支持 Unicode 的惰性命令行参数解析器，它可以在处理字母数字字符串搜索时消除误报/否定，这些字符串搜索可能在用于格式化我正在搜索的字母数字关键字的基本字符串中包含特殊字符，例如 Wolfjäger 不应匹配 jäger，但应该匹配。

它基本上只是 Kiril/CC 的答案，并对字母数字精确长度匹配进行了额外处理。

/* Undefined behavior when a non-alpha-num substring parameter is used. */
bool find_alphanum_string_CI(const std::wstring& baseString, const std::wstring& subString)
{
    /* Fail fast if the base string was smaller than what we're looking for */
    if (subString.length() > baseString.length()) 
        return false;

    auto it = std::search(
        baseString.begin(), baseString.end(), subString.begin(), subString.end(),
        [](char ch1, char ch2)
        {
            return std::toupper(ch1) == std::toupper(ch2);
        }
    );

    if(it == baseString.end())
        return false;

    size_t match_start_offset = it - baseString.begin();

    std::wstring match_start = baseString.substr(match_start_offset, std::wstring::npos);

    /* Typical special characters and whitespace to split the substring up. */
    size_t match_end_pos = match_start.find_first_of(L" ,<.>;:/?\'\"[{]}=+-_)(*&^%$#@!~`");

    /* Pass fast if the remainder of the base string where
       the match started is the same length as the substring. */
    if (match_end_pos == std::wstring::npos && match_start.length() == subString.length()) 
        return true;

    std::wstring extracted_match = match_start.substr(0, match_end_pos);

    return (extracted_match.length() == subString.length());
}

I love the answers from Kiril V. Lyadvinsky and CC. but my problem was a little more specific than just case-insensitivity; I needed a lazy Unicode-supported command-line argument parser that could eliminate false-positives/negatives when dealing with alphanumeric string searches that could have special characters in the base string used to format alphanum keywords I was searching against, e.g., Wolfjäger shouldn't match jäger but <jäger> should.

It's basically just Kiril/CC's answer with extra handling for alphanumeric exact-length matches.

/* Undefined behavior when a non-alpha-num substring parameter is used. */
bool find_alphanum_string_CI(const std::wstring& baseString, const std::wstring& subString)
{
    /* Fail fast if the base string was smaller than what we're looking for */
    if (subString.length() > baseString.length()) 
        return false;

    auto it = std::search(
        baseString.begin(), baseString.end(), subString.begin(), subString.end(),
        [](char ch1, char ch2)
        {
            return std::toupper(ch1) == std::toupper(ch2);
        }
    );

    if(it == baseString.end())
        return false;

    size_t match_start_offset = it - baseString.begin();

    std::wstring match_start = baseString.substr(match_start_offset, std::wstring::npos);

    /* Typical special characters and whitespace to split the substring up. */
    size_t match_end_pos = match_start.find_first_of(L" ,<.>;:/?\'\"[{]}=+-_)(*&^%$#@!~`");

    /* Pass fast if the remainder of the base string where
       the match started is the same length as the substring. */
    if (match_end_pos == std::wstring::npos && match_start.length() == subString.length()) 
        return true;

    std::wstring extracted_match = match_start.substr(0, match_end_pos);

    return (extracted_match.length() == subString.length());
}

回复收藏 0 原文

メ斷腸人バ 2024-09-14 07:52:29

最有效的方法

简单快速。

性能保证是线性的，初始化成本为 2 * NEEDLE_LEN 比较。（格利克）

#include <cstring>
#include <string>
#include <iostream>

int main(void) {

    std::string s1{"abc de fGH"};
    std::string s2{"DE"};

    auto pos = strcasestr(s1.c_str(), s2.c_str());

    if(pos != nullptr)
        std::cout << pos - s1.c_str() << std::endl;

    return 0;
}

The Most Efficient Way

Simple and Fast.

Performance is guaranteed to be linear, with an initialization cost of 2 * NEEDLE_LEN comparisons. (glic)

#include <cstring>
#include <string>
#include <iostream>

int main(void) {

    std::string s1{"abc de fGH"};
    std::string s2{"DE"};

    auto pos = strcasestr(s1.c_str(), s2.c_str());

    if(pos != nullptr)
        std::cout << pos - s1.c_str() << std::endl;

    return 0;
}

回复收藏 0 原文

老街孤人 2024-09-14 07:52:29

wxWidgets有非常丰富的字符串API
wxString

可以用它来完成（使用大小写转换方式）

int Contains(const wxString& SpecProgramName, const wxString& str)
{
  wxString SpecProgramName_ = SpecProgramName.Upper();
  wxString str_ = str.Upper();
  int found = SpecProgramName.Find(str_);
  if (wxNOT_FOUND == found)
  {
    return 0;
  }
  return 1;
}

wxWidgets has a very rich string API
wxString

it can be done with (using the case conversion way)

int Contains(const wxString& SpecProgramName, const wxString& str)
{
  wxString SpecProgramName_ = SpecProgramName.Upper();
  wxString str_ = str.Upper();
  int found = SpecProgramName.Find(str_);
  if (wxNOT_FOUND == found)
  {
    return 0;
  }
  return 1;
}

回复收藏 0 原文

~没有更多了~