按匈牙利语字母顺序对匈牙利语字符串列表进行排序

发布于 2024-12-05 22:11:45 字数 2528 浏览 5 评论 0原文

我目前正在处理一些匈牙利语数据。 我必须对匈牙利字符串列表进行排序。

根据此排序规则页面

匈牙利字母顺序为:A=Á、B、C、CS、D、DZ、DZS、E=É、F、G、 GY、H、I=Í、J、K、L、LY、M、N、NY、O=Ó、Ö=Ő、P、Q、R、S、SZ、T、TY、 U=Ú、Ü=Ű、V、W、X、Y、Z、ZS

因此元音被视为相同 (A=Á, ...) 所以在结果中你可以使用 Collat​​or

Abdffg
Ádsdfgsd
Aegfghhrf

到目前为止,没问题:)

但是现在,我有要求按照匈牙利字母

A Á BC Cs D Dz Dzs E É FG Gy HI Í JKL Ly MN Ny O Ó Ö Ő P (Q) RS Sz T Ty U Ú Ü Ű V (W) (X) (Y) Z Zs

A 被认为与 Á

不同/java/text/Collat​​or.html#setStrength%28int%29" rel="noreferrer">来自 Collat​​or 的强度 不会更改输出中的顺序。 A 和 Á 仍然混淆。

是否有任何库/技巧可以根据匈牙利字母顺序对字符串列表进行排序?

到目前为止,我正在做的是:

  • 使用 Collat​​or 排序,以便正确排序 C/C、D、DZ、DZS...
  • 通过基于地图比较每个单词的第一个字符再次排序

这对于任务来说看起来太麻烦了不是吗?

List<String> words = Arrays.asList(
        "Árfolyam", "Az",
        "Állásajánlatok","Adminisztráció",
        "Zsfgsdgsdfg", "Qdfasfas"

);

final Map<String, Integer> map = new HashMap<String, Integer>();
      map.put("A",0);
      map.put("Á",1);
      map.put("E",2);
      map.put("É",3);

      map.put("O",4);
      map.put("Ó",5);
      map.put("Ö",6);
      map.put("Ő",7);

      map.put("U",8);
      map.put("Ú",9);
      map.put("Ü",10);
      map.put("Ű",11);


      final Collator c = Collator.getInstance(new Locale("hu"));
      c.setStrength(Collator.TERTIARY);
      Collections.sort(words, c);

      Collections.sort(words, new Comparator<String>(){
          public int compare(String s1, String s2) {

              int f = c.compare(s1,s2);
              if (f == 0) return 0;

              String a = Character.toString(s1.charAt(0));
              String b = Character.toString(s2.charAt(0));

              if (map.get(a) != null && map.get(b) != null) {
                  if (map.get(a) < map.get(b)) {
                      return -1;
                  }
                  else if (map.get(a) == map.get(b)) {
                      return 0;
                  }
                  else {
                      return 1;
                  }
              }


              return 0;
          }
      });

感谢您的意见

I am working at the moment with some data in hungarians.
I have to sort a list of hungarians strings.

According to this Collation Sequence page

Hungarian alphabetic order is: A=Á, B, C, CS, D, DZ, DZS, E=É, F, G,
GY, H, I=Í, J, K, L, LY, M, N, NY, O=Ó, Ö=Ő, P, Q, R, S, SZ, T, TY,
U=Ú, Ü=Ű, V, W, X, Y, Z, ZS

So vowels are treated the same (A=Á, ...) so in the result you can have some like that using Collator :

Abdffg
Ádsdfgsd
Aegfghhrf

Up to here, no problem :)

But now, I have the requirement to sort according to the Hungarian alphabet

A Á B C Cs D Dz Dzs E É F G Gy H I Í J K L Ly M N Ny O Ó Ö Ő P (Q) R S
Sz T Ty U Ú Ü Ű V (W) (X) (Y) Z Zs

A is considered different than Á

Playing with the Strength from Collator doesnt change the order in the output. A and Á are still mixed up.

Is there any librairies/tricks to sort a list of string according to the hungarian alphabetical order?

So far what I am doing is :

  • Sort with Collator so that the C/Cs, D,DZ, DZS... are sorted correctly
  • Sort again by comparing the first characters of each word based on a map

This looks too much hassle for the task no?

List<String> words = Arrays.asList(
        "Árfolyam", "Az",
        "Állásajánlatok","Adminisztráció",
        "Zsfgsdgsdfg", "Qdfasfas"

);

final Map<String, Integer> map = new HashMap<String, Integer>();
      map.put("A",0);
      map.put("Á",1);
      map.put("E",2);
      map.put("É",3);

      map.put("O",4);
      map.put("Ó",5);
      map.put("Ö",6);
      map.put("Ő",7);

      map.put("U",8);
      map.put("Ú",9);
      map.put("Ü",10);
      map.put("Ű",11);


      final Collator c = Collator.getInstance(new Locale("hu"));
      c.setStrength(Collator.TERTIARY);
      Collections.sort(words, c);

      Collections.sort(words, new Comparator<String>(){
          public int compare(String s1, String s2) {

              int f = c.compare(s1,s2);
              if (f == 0) return 0;

              String a = Character.toString(s1.charAt(0));
              String b = Character.toString(s2.charAt(0));

              if (map.get(a) != null && map.get(b) != null) {
                  if (map.get(a) < map.get(b)) {
                      return -1;
                  }
                  else if (map.get(a) == map.get(b)) {
                      return 0;
                  }
                  else {
                      return 1;
                  }
              }


              return 0;
          }
      });

Thanks for your input

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

眼泪都笑了 2024-12-12 22:11:45

我发现了一个好主意,您可以使用 RuleBasedCollat​​or。

来源:http://download.oracle.com/javase/tutorial/ i18n/text/rule.html

这是匈牙利规则:

 < a,A < á,Á < b,B < c,C < cs,Cs,CS < d,D < dz,Dz,DZ < dzs,Dzs,DZS 
 < e,E < é,É < f,F < g,G < gy,Gy,GY < h,H < i,I < í,Í < j,J
 < k,K < l,L < ly,Ly,LY < m,M < n,N < ny,Ny,NY < o,O < ó,Ó 
 < ö,Ö < ő,Ő < p,P < q,Q < r,R < s,S < sz,Sz,SZ < t,T 
 < ty,Ty,TY < u,U < ú,Ú < ü,Ü < ű,Ű < v,V < w,W < x,X < y,Y < z,Z < zs,Zs,ZS

I found a good idea, you can use a RuleBasedCollator.

Source: http://download.oracle.com/javase/tutorial/i18n/text/rule.html

And here is the Hungarian rule:

 < a,A < á,Á < b,B < c,C < cs,Cs,CS < d,D < dz,Dz,DZ < dzs,Dzs,DZS 
 < e,E < é,É < f,F < g,G < gy,Gy,GY < h,H < i,I < í,Í < j,J
 < k,K < l,L < ly,Ly,LY < m,M < n,N < ny,Ny,NY < o,O < ó,Ó 
 < ö,Ö < ő,Ő < p,P < q,Q < r,R < s,S < sz,Sz,SZ < t,T 
 < ty,Ty,TY < u,U < ú,Ú < ü,Ü < ű,Ű < v,V < w,W < x,X < y,Y < z,Z < zs,Zs,ZS
只是我以为 2024-12-12 22:11:45

按流,您可以按如下方式排序:

public List<String> sortBy(List<String> sortable) {

  Collator coll = Collator.getInstance(new Locale("hu","HU"));

  return sortable.stream()
                 .sorted(Comparator.comparing(s -> s, coll))
                 .collect(Collectors.toList());
}

By stream you can sort like below:

public List<String> sortBy(List<String> sortable) {

  Collator coll = Collator.getInstance(new Locale("hu","HU"));

  return sortable.stream()
                 .sorted(Comparator.comparing(s -> s, coll))
                 .collect(Collectors.toList());
}
待天淡蓝洁白时 2024-12-12 22:11:45

任何解决方案都会导致将字符串(名称)“Czár”和“Csóka”排序为 Czár、Csóka?这是正确的顺序,因为 Csóka 中的 CS 被视为一个字母并且位于 C 之后。
然而,即使使用所有匈牙利语单词的列表,识别双字符辅音也是不可能的,因为可能存在这样的情况:两个单词可能逐个字符地看起来完全相同,但一个单词中有两个辅音在一起,而另一个单词则有两个辅音。是在同一位置代表一个字母的两个字符。

Will any of the solutions result in ordering the strings (names) 'Czár' and 'Csóka' as Czár, Csóka? This would be the correct order, since CS in Csóka is considered one letter and is after C.
However, recognizing double-character consonants is impossible even with a list of all Hungarian words, since there might be cases, where two words could look exactly the same character by character, but in one there are two consonants together, while in the other there are two characters reprezenting one letter at the very same place.

甜味超标? 2024-12-12 22:11:45

更改地图的顺序。

将数字表示形式作为键,将字母作为值。这将允许您使用按键排序的 TreeMap。

然后你可以只执行map.get(1),它将返回字母表的第一个字母。

Change the order of your map.

Put the numeric representation as the key and the letter as the value. This will allow you to use a TreeMap which will be sorted by key.

You can then just do map.get(1) and it will return the first letter of the alphabet.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文