不区分大小写的 std:: 字符串集

发布于 2024-10-04 07:17:33 字数 184 浏览 0 评论 0原文

如何在 std::set 中进行不区分大小写的插入或搜索字符串？

例如-

std::set<std::string> s;
s.insert("Hello");
s.insert("HELLO"); //not allowed, string already exists.

原文

How do you have a case insensitive insertion Or search of a string in std::set?

For example-

std::set<std::string> s;
s.insert("Hello");
s.insert("HELLO"); //not allowed, string already exists.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

山色无中 2024-10-11 07:17:33

您需要定义一个自定义比较器：

struct InsensitiveCompare { 
    bool operator() (const std::string& a, const std::string& b) const {
        return strcasecmp(a.c_str(), b.c_str()) < 0;
    }
};

std::set<std::string, InsensitiveCompare> s;

如果 strcasecmp 不可用，您可以尝试 stricmp 或 strcoll。

You need to define a custom comparator:

struct InsensitiveCompare { 
    bool operator() (const std::string& a, const std::string& b) const {
        return strcasecmp(a.c_str(), b.c_str()) < 0;
    }
};

std::set<std::string, InsensitiveCompare> s;

You may try stricmp or strcoll if strcasecmp is not available.

回复收藏 0 原文

前事休说 2024-10-11 07:17:33

std::set 提供了提供您自己的比较器的可能性（就像大多数 std 容器一样）。然后您可以执行您喜欢的任何类型的比较。完整示例可在此处获取

回复收藏 0 原文

花开柳相依 2024-10-11 07:17:33

这是一个通用解决方案，也适用于 std::string 之外的其他字符串类型（使用 std::wstring、std::string_view 进行测试），char const*）。基本上任何定义范围的内容个字符应该可以工作。

这里的关键点是使用 boost::as_literal 允许我们在比较器中统一处理以 null 结尾的字符数组、字符指针和范围。

通用代码（“iset.h”）：

#pragma once
#include <set>
#include <algorithm>
#include <boost/algorithm/string.hpp>
#include <boost/range/as_literal.hpp>

// Case-insensitive generic string comparator.
struct range_iless
{
    template< typename InputRange1, typename InputRange2 >
    bool operator()( InputRange1 const& r1, InputRange2 const& r2 ) const 
    {
        // include the standard begin() and end() aswell as any custom overloads for ADL
        using std::begin; using std::end;  

        // Treat null-terminated character arrays, character pointers and ranges uniformly.
        // This just creates cheap iterator ranges (it doesn't copy container arguments)!
        auto ir1 = boost::as_literal( r1 );
        auto ir2 = boost::as_literal( r2 );

        // Compare case-insensitively.
        return std::lexicographical_compare( 
            begin( ir1 ), end( ir1 ), 
            begin( ir2 ), end( ir2 ), 
            boost::is_iless{} );
    }
};

// Case-insensitive set for any Key that consists of a range of characters.
template< class Key, class Allocator = std::allocator<Key> >
using iset = std::set< Key, range_iless, Allocator >;

使用示例（“main.cpp”）：

#include "iset.h"  // above header file
#include <iostream>
#include <string>
#include <string_view>

// Output range to stream.
template< typename InputRange, typename Stream, typename CharT >
void write_to( Stream& s, InputRange const& r, CharT const* sep )
{
    for( auto const& elem : r )
        s << elem << sep;
    s << std::endl;
}

int main()
{
    iset< std::string  >     s1{  "Hello",  "HELLO",  "world" };
    iset< std::wstring >     s2{ L"Hello", L"HELLO", L"world" };
    iset< char const*  >     s3{  "Hello",  "HELLO",  "world" };
    iset< std::string_view > s4{  "Hello",  "HELLO",  "world" };

    write_to( std::cout,  s1,  " " );    
    write_to( std::wcout, s2, L" " );    
    write_to( std::cout,  s3,  " " );    
    write_to( std::cout,  s4,  " " );    
}

Coliru 现场演示

This is a generic solution that also works with other string types than std::string (tested with std::wstring, std::string_view, char const*). Basically anything that defines a range of characters should work.

The key point here is to use boost::as_literal that allows us to treat null-terminated character arrays, character pointers and ranges uniformly in the comparator.

Generic code ("iset.h"):

#pragma once
#include <set>
#include <algorithm>
#include <boost/algorithm/string.hpp>
#include <boost/range/as_literal.hpp>

// Case-insensitive generic string comparator.
struct range_iless
{
    template< typename InputRange1, typename InputRange2 >
    bool operator()( InputRange1 const& r1, InputRange2 const& r2 ) const 
    {
        // include the standard begin() and end() aswell as any custom overloads for ADL
        using std::begin; using std::end;  

        // Treat null-terminated character arrays, character pointers and ranges uniformly.
        // This just creates cheap iterator ranges (it doesn't copy container arguments)!
        auto ir1 = boost::as_literal( r1 );
        auto ir2 = boost::as_literal( r2 );

        // Compare case-insensitively.
        return std::lexicographical_compare( 
            begin( ir1 ), end( ir1 ), 
            begin( ir2 ), end( ir2 ), 
            boost::is_iless{} );
    }
};

// Case-insensitive set for any Key that consists of a range of characters.
template< class Key, class Allocator = std::allocator<Key> >
using iset = std::set< Key, range_iless, Allocator >;

Usage example ("main.cpp"):

#include "iset.h"  // above header file
#include <iostream>
#include <string>
#include <string_view>

// Output range to stream.
template< typename InputRange, typename Stream, typename CharT >
void write_to( Stream& s, InputRange const& r, CharT const* sep )
{
    for( auto const& elem : r )
        s << elem << sep;
    s << std::endl;
}

int main()
{
    iset< std::string  >     s1{  "Hello",  "HELLO",  "world" };
    iset< std::wstring >     s2{ L"Hello", L"HELLO", L"world" };
    iset< char const*  >     s3{  "Hello",  "HELLO",  "world" };
    iset< std::string_view > s4{  "Hello",  "HELLO",  "world" };

    write_to( std::cout,  s1,  " " );    
    write_to( std::wcout, s2, L" " );    
    write_to( std::cout,  s3,  " " );    
    write_to( std::cout,  s4,  " " );    
}

Live Demo at Coliru

回复收藏 0 原文

指尖凝香 2024-10-11 07:17:33

据我所知，这比 stricmp() 更可移植，因为 stricmp() 实际上不是 std 库的一部分，而仅由大多数编译器供应商实现。因此，下面是我自己推出的解决方案。

#include <string>
#include <cctype>
#include <iostream>
#include <set>

struct caseInsensitiveLess
{
  bool operator()(const std::string& x, const std::string& y)
  {
    unsigned int xs ( x.size() );
    unsigned int ys ( y.size() );
    unsigned int bound ( 0 );

    if ( xs < ys ) 
      bound = xs; 
    else 
      bound = ys;

    {
      unsigned int i = 0;
      for (auto it1 = x.begin(), it2 = y.begin(); i < bound; ++i, ++it1, ++it2)
      {
        if (tolower(*it1) < tolower(*it2))
          return true;

        if (tolower(*it2) < tolower(*it1))
          return false;
      }
    }
    return false; 
  }
};

int main()
{
  std::set<std::string, caseInsensitiveLess> ss1;
  std::set<std::string> ss2;

  ss1.insert("This is the first string");
  ss1.insert("THIS IS THE FIRST STRING");
  ss1.insert("THIS IS THE SECOND STRING");
  ss1.insert("This IS THE SECOND STRING");
  ss1.insert("This IS THE Third");

  ss2.insert("this is the first string");
  ss2.insert("this is the first string");
  ss2.insert("this is the second string");
  ss2.insert("this is the second string");
  ss2.insert("this is the third");

  for ( auto& i: ss1 )
   std::cout << i << std::endl;

  std::cout << std::endl;

  for ( auto& i: ss2 )
   std::cout << i << std::endl;

}

不区分大小写的输出集和常规集显示相同
订购：

This is the first string
THIS IS THE SECOND STRING
This IS THE Third

this is the first string
this is the second string
this is the third

From what I have read this is more portable than stricmp() because stricmp() is not in fact part of the std library, but only implemented by most compiler vendors. As a result below is my solution to just roll your own.

#include <string>
#include <cctype>
#include <iostream>
#include <set>

struct caseInsensitiveLess
{
  bool operator()(const std::string& x, const std::string& y)
  {
    unsigned int xs ( x.size() );
    unsigned int ys ( y.size() );
    unsigned int bound ( 0 );

    if ( xs < ys ) 
      bound = xs; 
    else 
      bound = ys;

    {
      unsigned int i = 0;
      for (auto it1 = x.begin(), it2 = y.begin(); i < bound; ++i, ++it1, ++it2)
      {
        if (tolower(*it1) < tolower(*it2))
          return true;

        if (tolower(*it2) < tolower(*it1))
          return false;
      }
    }
    return false; 
  }
};

int main()
{
  std::set<std::string, caseInsensitiveLess> ss1;
  std::set<std::string> ss2;

  ss1.insert("This is the first string");
  ss1.insert("THIS IS THE FIRST STRING");
  ss1.insert("THIS IS THE SECOND STRING");
  ss1.insert("This IS THE SECOND STRING");
  ss1.insert("This IS THE Third");

  ss2.insert("this is the first string");
  ss2.insert("this is the first string");
  ss2.insert("this is the second string");
  ss2.insert("this is the second string");
  ss2.insert("this is the third");

  for ( auto& i: ss1 )
   std::cout << i << std::endl;

  std::cout << std::endl;

  for ( auto& i: ss2 )
   std::cout << i << std::endl;

}

Output with case insensitive set and regular set showing the same
ordering:

This is the first string
THIS IS THE SECOND STRING
This IS THE Third

this is the first string
this is the second string
this is the third

回复收藏 0 原文

~没有更多了~

关于作者

海夕

暂无简介

0 文章

0 评论

21 人气

关注发私信

友情链接

文江博客

不区分大小写的 std:: 字符串集

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

lioqio

Single

禾厶谷欠

alipaysp_2zg8elfGgC

qq_N6d4X7

放低过去

友情链接

不区分大小写的 std:: 字符串集

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（4）

关于作者

相关话题

热门标签

推荐作者

lioqio

Single

禾厶谷欠

alipaysp_2zg8elfGgC

qq_N6d4X7

放低过去

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。