将西里尔文转换为拉丁文 - 拉丁入侵者/例外
我正在使用简单的字典将西里尔字母替换为拉丁字母,大多数时候它工作得很好,但是当有一些拉丁字母作为输入时我遇到了问题。大多数时候是公司名称。
几个例子:
PROCRED 正在转换为 RROSRED
ОВЕХ 为 OVEH
CITY 为 SITU
我该怎么办?
这是我正在使用的字典
public string ConvertCyrillicToLatin(string text)
{
Dictionary<string, string> words = new Dictionary<string, string>();
words.Add("А", "A");
words.Add("Б", "B");
words.Add("В", "V");
words.Add("Г", "G");
words.Add("Д", "D");
words.Add("Ђ", "Đ");
words.Add("Е", "E");
words.Add("Ж", "Ž");
words.Add("З", "Z");
words.Add("И", "I");
words.Add("Ј", "J");
words.Add("К", "K");
words.Add("Л", "L");
words.Add("Љ", "Lj");
words.Add("М", "M");
words.Add("Н", "N");
words.Add("Њ", "Nj");
words.Add("О", "O");
words.Add("П", "P");
words.Add("Р", "R");
words.Add("С", "S");
words.Add("Т", "T");
words.Add("Ћ", "Ć");
words.Add("У", "U");
words.Add("Ф", "F");
words.Add("Х", "H");
words.Add("Ц", "C");
words.Add("Ч", "Č");
words.Add("Џ", "Dž");
words.Add("Ш", "Š");
words.Add("а", "a");
words.Add("б", "b");
words.Add("в", "v");
words.Add("г", "g");
words.Add("д", "d");
words.Add("ђ", "đ");
words.Add("е", "e");
words.Add("ж", "ž");
words.Add("з", "z");
words.Add("и", "i");
words.Add("ј", "j");
words.Add("к", "k");
words.Add("л", "l");
words.Add("љ", "lj");
words.Add("м", "m");
words.Add("н", "n");
words.Add("њ", "nj");
words.Add("о", "o");
words.Add("п", "p");
words.Add("р", "r");
words.Add("с", "s");
words.Add("т", "t");
words.Add("ћ", "ć");
words.Add("у", "u");
words.Add("ф", "f");
words.Add("х", "h");
words.Add("ц", "c");
words.Add("ч", "č");
words.Add("џ", "dž");
words.Add("ш", "š");
var source = text;
foreach (KeyValuePair<string, string> pair in words)
{
source = source.Replace(pair.Key, pair.Value);
}
return source;
}
UPDATE 1
根据评论中的要求,这是我的豁免列表:
"СIТУ":"CITY",
"OBEX":"OBEX"
现在它只有这两个示例,用于测试,但不可能有一个具有如此多可能性的真正功能豁免列表。
我希望如果应用程序遇到拉丁字母,只需忽略它并保持原样。它已经像西里尔字母中不存在的拉丁字母或存在但具有相同含义的拉丁字母一样工作,例如字母AEODGTEJKLMN...我遇到了在拉丁字母和西里尔字母中看起来相同但含义不同的字母的问题,字母像 С(S)、Х(H)、У(Y)、P(R)...
更新 2
以下是评论中要求输入的几个示例。 斜线符号当然不会在输入中存在,我只是添加了它,以便您能够区分拉丁部分
...ПОВЕРИОЦ /LЕNS OBEX DОО/, У СКЛАДУ СА ОДРЕДБОМ...
... ИЗЈАВА ПРИВРЕДНОГ ДРУШТВА /GRАDЈЕVINSКО РRЕDUZЕСЕ IМРЕХ LОZNIСА/ СА АДРЕСОМ...
...ЗА УГОВОР О ОТВАРАЊУ КРЕДИТНЕ ЛИНИЈЕ СА КОМПАНИЈОМ /"DOWN CITУ"/ И РАСПОН МЕСЕЧНЕ КАМАТНЕ СТОПЕ...
...КОРИСТ ПОВЕРИОЦА /ATР 银行之旅/, САСЕДИШТЕМ...
I am using simple dictionary to replace Cyrillic letters with Latin ones and most of the time its working just fine but I am having issues when there are some Latin letters as an input. Most of the time its company names.
Few examples:
PROCRED is being converted as RROSRED
ОВЕХ as OVEH
CITY as SITU
What can I do about this?
This is the dictionary I am using
public string ConvertCyrillicToLatin(string text)
{
Dictionary<string, string> words = new Dictionary<string, string>();
words.Add("А", "A");
words.Add("Б", "B");
words.Add("В", "V");
words.Add("Г", "G");
words.Add("Д", "D");
words.Add("Ђ", "Đ");
words.Add("Е", "E");
words.Add("Ж", "Ž");
words.Add("З", "Z");
words.Add("И", "I");
words.Add("Ј", "J");
words.Add("К", "K");
words.Add("Л", "L");
words.Add("Љ", "Lj");
words.Add("М", "M");
words.Add("Н", "N");
words.Add("Њ", "Nj");
words.Add("О", "O");
words.Add("П", "P");
words.Add("Р", "R");
words.Add("С", "S");
words.Add("Т", "T");
words.Add("Ћ", "Ć");
words.Add("У", "U");
words.Add("Ф", "F");
words.Add("Х", "H");
words.Add("Ц", "C");
words.Add("Ч", "Č");
words.Add("Џ", "Dž");
words.Add("Ш", "Š");
words.Add("а", "a");
words.Add("б", "b");
words.Add("в", "v");
words.Add("г", "g");
words.Add("д", "d");
words.Add("ђ", "đ");
words.Add("е", "e");
words.Add("ж", "ž");
words.Add("з", "z");
words.Add("и", "i");
words.Add("ј", "j");
words.Add("к", "k");
words.Add("л", "l");
words.Add("љ", "lj");
words.Add("м", "m");
words.Add("н", "n");
words.Add("њ", "nj");
words.Add("о", "o");
words.Add("п", "p");
words.Add("р", "r");
words.Add("с", "s");
words.Add("т", "t");
words.Add("ћ", "ć");
words.Add("у", "u");
words.Add("ф", "f");
words.Add("х", "h");
words.Add("ц", "c");
words.Add("ч", "č");
words.Add("џ", "dž");
words.Add("ш", "š");
var source = text;
foreach (KeyValuePair<string, string> pair in words)
{
source = source.Replace(pair.Key, pair.Value);
}
return source;
}
UPDATE 1
As requested in the comment, here is my exemption list:
"СIТУ":"CITY",
"OBEX":"OBEX"
Now it have just these two examples, for test, but its impossible to have a real functional exemption list with so many possibilities.
I am expecting that if application came across a Latin letter, just to ignore it and leave it as it is. Its already working like that for Latin letters which doesnt exist as Cyrillic or which exist but have the same meaning, like letters AEODGTEJKLMN... I am having issues with letters which looks the same in both Latin and Cyrillic alphabet but have different meaning, letters like С(S), Х(H), У(Y), P(R)...
UPDATE 2
Here are the few examples of input asked in the comment. The slash sign of course doesnt exit in the input, I just added it so that you can distinguish the Latin part
...ПОВЕРИОЦ /LЕNS OBEX DОО/, У СКЛАДУ СА ОДРЕДБОМ...
...ИЗЈАВА ПРИВРЕДНОГ ДРУШТВА /GRАDЈЕVINSКО РRЕDUZЕСЕ IМРЕХ LОZNIСА/ СА АДРЕСОМ...
...ЗА УГОВОР О ОТВАРАЊУ КРЕДИТНЕ ЛИНИЈЕ СА КОМПАНИЈОМ /"DOWN CITУ"/ И РАСПОН МЕСЕЧНЕ КАМАТНЕ СТОПЕ...
...КОРИСТ ПОВЕРИОЦА /ATР BANK TOUR/, СА СЕДИШТЕМ...
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
在下面的代码中,使用两个字典将带有西里尔字符的文本转换为拉丁语。如果单词包含拉丁字符,则使用第一个
LatinType
字典。否则使用第二个CyrillicType
。通过此代码,问题的“UPDATE 2”中的文本将被编码为以下内容:
In code below two dictionaries are used for converting text with Cyrillic character to the Latin. If a word contains the Latin characters the first
LatinType
dictionary is used. Otherwise the secondCyrillicType
is used.By this code the text from the "UPDATE 2" of the question will be coded to the following: