C++像人类一样进行字符串排序？

发布于 2024-08-31 05:58:56 字数 1960 浏览 6 评论 0原文

我想按照人类对字母数字字符串进行排序的方式对其进行排序。即，“A2”位于“A10”之前，“a”当然位于“Z”之前！有没有什么办法可以不写一个迷你解析器呢？理想情况下，它还会将“A1B1”放在“A1B10”之前。我看到问题 “自然（人类字母数字）排序Microsoft SQL 2005” 有一个可能的答案，但它使用各种库函数，“使用 IComparer 对人类字符串进行排序”。

以下是当前失败的测试用例：

#include <set>
#include <iterator>
#include <iostream>
#include <vector>
#include <cassert>

template <typename T>
struct LexicographicSort {
  inline bool operator() (const T& lhs, const T& rhs) const{
    std::ostringstream s1,s2;
    s1 << toLower(lhs); s2 << toLower(rhs);
    bool less = s1.str() < s2.str();
    //Answer: bool less = doj::alphanum_less<std::string>()(s1.str(), s2.str());
    std::cout<<s1.str()<<" "<<s2.str()<<" "<<less<<"\n";
    return less;
  }

  inline std::string toLower(const std::string& str) const {
    std::string newString("");
    for (std::string::const_iterator charIt = str.begin();
         charIt!=str.end();++charIt) {
          newString.push_back(std::tolower(*charIt));
        }
        return newString;
      }
};


int main(void) {
  const std::string reference[5] = {"ab","B","c1","c2","c10"};
  std::vector<std::string> referenceStrings(&(reference[0]), &(reference[5]));

  //Insert in reverse order so we know they get sorted
  std::set<std::string,LexicographicSort<std::string> > strings(referenceStrings.rbegin(), referenceStrings.rend());

  std::cout<<"Items:\n";
  std::copy(strings.begin(), strings.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
  std::vector<std::string> sortedStrings(strings.begin(), strings.end());
  assert(sortedStrings == referenceStrings);
}

原文

I would like to sort alphanumeric strings the way a human being would sort them. I.e., "A2" comes before "A10", and "a" certainly comes before "Z"! Is there any way to do with without writing a mini-parser? Ideally it would also put "A1B1" before "A1B10". I see the question "Natural (human alpha-numeric) sort in Microsoft SQL 2005" with a possible answer, but it uses various library functions, as does "Sorting Strings for Humans with IComparer".

Below is a test case that currently fails:

#include <set>
#include <iterator>
#include <iostream>
#include <vector>
#include <cassert>

template <typename T>
struct LexicographicSort {
  inline bool operator() (const T& lhs, const T& rhs) const{
    std::ostringstream s1,s2;
    s1 << toLower(lhs); s2 << toLower(rhs);
    bool less = s1.str() < s2.str();
    //Answer: bool less = doj::alphanum_less<std::string>()(s1.str(), s2.str());
    std::cout<<s1.str()<<" "<<s2.str()<<" "<<less<<"\n";
    return less;
  }

  inline std::string toLower(const std::string& str) const {
    std::string newString("");
    for (std::string::const_iterator charIt = str.begin();
         charIt!=str.end();++charIt) {
          newString.push_back(std::tolower(*charIt));
        }
        return newString;
      }
};


int main(void) {
  const std::string reference[5] = {"ab","B","c1","c2","c10"};
  std::vector<std::string> referenceStrings(&(reference[0]), &(reference[5]));

  //Insert in reverse order so we know they get sorted
  std::set<std::string,LexicographicSort<std::string> > strings(referenceStrings.rbegin(), referenceStrings.rend());

  std::cout<<"Items:\n";
  std::copy(strings.begin(), strings.end(), std::ostream_iterator<std::string>(std::cout, "\n"));
  std::vector<std::string> sortedStrings(strings.begin(), strings.end());
  assert(sortedStrings == referenceStrings);
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

杀手六號 2024-09-07 05:58:56

有什么方法可以不写迷你解析器吗？

让别人来做吗？

我正在使用这个实现：http://www.davekoelle.com/alphanum.html，我也对其进行了修改以支持 wchar_t 。

回复收藏 0 原文

明媚如初 2024-09-07 05:58:56

这实际上取决于“解析器”的含义。如果您想避免编写解析器，我认为您应该利用库函数。

将字符串视为统一为字母、数字或“其他”的子序列序列。
使用 isalnum 获取每个字符串的下一个字母数字序列，并回溯检查 + 或 -（如果它是数字）。就地使用 strtold 查找数字子序列的结尾。
如果一个是数字，一个是字母，则带有数字子序列的字符串位于第一位。
如果一个字符串已用完字符，则该字符串排在第一位。
使用 strcoll 比较当前语言环境中的字母子序列。
使用 strtold 比较当前语言环境中的数字子序列。
重复此操作，直至完成一根或两根弦。
与 strcmp 打破联系。

该算法在比较超出 long double 精度的数字字符串时存在一些弱点。

回复收藏 0 原文

南薇 2024-09-07 05:58:56

有没有什么方法可以在不编写迷你解析器的情况下做到这一点？我认为答案是否定的。但编写解析器并不那么困难。不久前我不得不这样做来对我们公司的股票编号进行排序。基本上只需扫描数字并将其转换为数组即可。检查每个字符的“类型”：字母、数字，也许您还有其他需要特殊处理的字符。就像我必须特殊对待连字符一样，因为我们希望 ABC 在 AB-A 之前排序。然后开始剥字符。只要它们与第一个字符的类型相同，它们就会进入同一个存储桶。一旦类型发生变化，您就开始将它们放入不同的存储桶中。然后你还需要一个比较函数来逐个桶进行比较。当两个桶都是 alpha 时，您只需进行正常的 alpha 比较。当两者都是数字时，将两者都转换为整数并进行整数比较，或者将较短的部分填充到较长的长度或等效的长度。当它们是不同类型时，您需要一个规则来进行比较，例如 AA 是在 A-1 之前还是之后？

这不是一项微不足道的工作，您必须为可能出现的所有奇怪情况制定规则，但我认为您可以在几个小时内将其组合在一起。

回复收藏 0 原文

蓝咒 2024-09-07 05:58:56

如果不进行任何解析，就无法比较人类书写的数字（首先去除前导零的高值）和作为同一字符串一部分的普通字符。

不过，解析不需要非常复杂。一个简单的哈希表，用于处理区分大小写和剥离特殊字符（'A'='a'=1，'B'='b'='2，...或'A'=1，'a' =2,'B'=3,..., '-'=0(strip))，将字符串重新映射到哈希值数组，然后截断数字情况（如果遇到数字并且最后一个字符是数字，将最后一个数字乘以 10，然后加上当前值）。

从那里开始，按正常方式排序。

回复收藏 0 原文

~没有更多了~