ICU 和字符串比较

发布于 2024-11-07 16:53:51 字数 1046 浏览 0 评论 0原文

谁能解释为什么以下比较不相等?

void CompareTest()
{
  UErrorCode status = U_ZERO_ERROR;
  UChar ruleset[500]; *ruleset = 0;
  int32_t rlen = 0;

  UCollator *coll = ucol_open("de_DE", &status);


  static const UChar rules[] = L"&\\u0000 = '' = '-'";
  int32_t len=(int32_t)u_strlen(rules);

  const UChar *defRules = ucol_getRules(coll, &rlen);
  if(rlen > 0)
  {
    u_strcpy(ruleset, defRules); 
  }
  u_strcat(ruleset, rules);

  status = U_ZERO_ERROR;
  UCollator *collRule = ucol_openRules(ruleset, u_strlen(ruleset), UCOL_OFF,     UCOL_DEFAULT_STRENGTH,NULL, &status);

  ucol_setAttribute(collRule, UCOL_NORMALIZATION_MODE, UCOL_ON, &status);
  ucol_setAttribute(collRule, UCOL_STRENGTH, UCOL_QUATERNARY, &status);


  UCollationResult uResult = ucol_strcoll(collRule, L"post-war", -1, L"post war", -1);
  uResult = ucol_strcoll(collRule, L"post-war", -1, L"postwar", -1);
  uResult = ucol_strcoll(collRule, L"ÄÖÜ", -1, L"äöü", -1);
  uResult = ucol_strcoll(collRule, L"ß", -1, L"ss", -1);


}

can anybody explain why the following compare are not equal?

void CompareTest()
{
  UErrorCode status = U_ZERO_ERROR;
  UChar ruleset[500]; *ruleset = 0;
  int32_t rlen = 0;

  UCollator *coll = ucol_open("de_DE", &status);


  static const UChar rules[] = L"&\\u0000 = '' = '-'";
  int32_t len=(int32_t)u_strlen(rules);

  const UChar *defRules = ucol_getRules(coll, &rlen);
  if(rlen > 0)
  {
    u_strcpy(ruleset, defRules); 
  }
  u_strcat(ruleset, rules);

  status = U_ZERO_ERROR;
  UCollator *collRule = ucol_openRules(ruleset, u_strlen(ruleset), UCOL_OFF,     UCOL_DEFAULT_STRENGTH,NULL, &status);

  ucol_setAttribute(collRule, UCOL_NORMALIZATION_MODE, UCOL_ON, &status);
  ucol_setAttribute(collRule, UCOL_STRENGTH, UCOL_QUATERNARY, &status);


  UCollationResult uResult = ucol_strcoll(collRule, L"post-war", -1, L"post war", -1);
  uResult = ucol_strcoll(collRule, L"post-war", -1, L"postwar", -1);
  uResult = ucol_strcoll(collRule, L"ÄÖÜ", -1, L"äöü", -1);
  uResult = ucol_strcoll(collRule, L"ß", -1, L"ss", -1);


}

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

咽泪装欢 2024-11-14 16:53:51

您不需要进行任何规则自定义。

  UCollator * collRule = coll;
  ucol_setAttribute(collRule, UCOL_NORMALIZATION_MODE, UCOL_ON, &status); // no effect for these samples.
  ucol_setAttribute(collRule, UCOL_STRENGTH, UCOL_PRIMARY, &status);
  ucol_setAttribute(collRule, UCOL_ALTERNATE_HANDLING, UCOL_SHIFTED, &status);

结果:

post-war -> [45 43 4B 4D 53 27 49 00]
post war -> [45 43 4B 4D 53 27 49 00]
(post-war === post war) -> 0
post-war -> [45 43 4B 4D 53 27 49 00]
postwar -> [45 43 4B 4D 53 27 49 00]
(post-war === postwar) -> 0
ÄÖÜ -> [27 43 4F 00]
äöü -> [27 43 4F 00]
(ÄÖÜ === äöü) -> 0
ß -> [4B 4B 00]
ss -> [4B 4B 00]
(ß === ss) -> 0

You don't need to do any rule customization.

  UCollator * collRule = coll;
  ucol_setAttribute(collRule, UCOL_NORMALIZATION_MODE, UCOL_ON, &status); // no effect for these samples.
  ucol_setAttribute(collRule, UCOL_STRENGTH, UCOL_PRIMARY, &status);
  ucol_setAttribute(collRule, UCOL_ALTERNATE_HANDLING, UCOL_SHIFTED, &status);

Result:

post-war -> [45 43 4B 4D 53 27 49 00]
post war -> [45 43 4B 4D 53 27 49 00]
(post-war === post war) -> 0
post-war -> [45 43 4B 4D 53 27 49 00]
postwar -> [45 43 4B 4D 53 27 49 00]
(post-war === postwar) -> 0
ÄÖÜ -> [27 43 4F 00]
äöü -> [27 43 4F 00]
(ÄÖÜ === äöü) -> 0
ß -> [4B 4B 00]
ss -> [4B 4B 00]
(ß === ss) -> 0
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文