如何在 C# 中将特殊字符替换为其等效字符(例如将“a”替换为“á”)?

发布于 2024-08-24 07:00:38 字数 176 浏览 3 评论 0原文

我需要从 Excel 文件中获取葡萄牙语文本内容,并创建一个 xml,该 xml 将由不支持“ç”、“á”、“é”等字符的应用程序使用。我不能只删除这些字符,而是将它们替换为等效字符(例如“c”、“a”、“e”)。

我认为有一种更好的方法来做到这一点,而不是单独检查每个字符并将其替换为对应的字符。关于如何做有什么建议吗?

I need to get the Portuguese text content out of an Excel file and create an xml which is going to be used by an application that doesn't support characters such as "ç", "á", "é", and others. And I can't just remove the characters, but replace them with their equivalent ("c", "a", "e", for example).

I assume there's a better way to do it than check each character individually and replace it with their counterparts. Any suggestions on how to do it?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

不乱于心 2024-08-31 07:00:38

您可以尝试类似的方法,

var decomposed = "áéö".Normalize(NormalizationForm.FormD);
var filtered = decomposed.Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark);
var newString = new String(filtered.ToArray());

这会分解文本中的重音符号,过滤它们并创建一个新字符串。组合变音符号位于 非空格标记 unicode 类别中。

You could try something like

var decomposed = "áéö".Normalize(NormalizationForm.FormD);
var filtered = decomposed.Where(c => char.GetUnicodeCategory(c) != UnicodeCategory.NonSpacingMark);
var newString = new String(filtered.ToArray());

This decomposes accents from the text, filters them and creates a new string. Combining diacritics are in the Non spacing mark unicode category.

晚风撩人 2024-08-31 07:00:38
string text = {text to replace characters in};

Dictionary<char, char> replacements = new Dictionary<char, char>();

// add your characters to the replacements dictionary, 
// key: char to replace
// value: replacement char

replacements.Add('ç', 'c');
...

System.Text.StringBuilder replaced = new System.Text.StringBuilder();
for (int i = 0; i < text.Length; i++)
{
    char character = text[i];
    if (replacements.ContainsKey(character))
    {
        replaced.Append(replacements[character]);
    }
    else
    {
        replaced.Append(character);
    }
}

// 'replaced' is now your converted text
string text = {text to replace characters in};

Dictionary<char, char> replacements = new Dictionary<char, char>();

// add your characters to the replacements dictionary, 
// key: char to replace
// value: replacement char

replacements.Add('ç', 'c');
...

System.Text.StringBuilder replaced = new System.Text.StringBuilder();
for (int i = 0; i < text.Length; i++)
{
    char character = text[i];
    if (replacements.ContainsKey(character))
    {
        replaced.Append(replacements[character]);
    }
    else
    {
        replaced.Append(character);
    }
}

// 'replaced' is now your converted text
残疾 2024-08-31 07:00:38

为了供将来参考,这正是我最终得到的结果:

temp = stringToConvert.Normalize(NormalizationForm.FormD);
            IEnumerable<char> filtered = temp;
            filtered = filtered.Where(c => char.GetUnicodeCategory(c) != System.Globalization.UnicodeCategory.NonSpacingMark);
            final = new string(filtered.ToArray());

For future reference, this is exactly what I ended up with:

temp = stringToConvert.Normalize(NormalizationForm.FormD);
            IEnumerable<char> filtered = temp;
            filtered = filtered.Where(c => char.GetUnicodeCategory(c) != System.Globalization.UnicodeCategory.NonSpacingMark);
            final = new string(filtered.ToArray());
2024-08-31 07:00:38

使用此解决方案性能更好:

string test = "áéíóúç";

string result = Regex.Replace(test .Normalize(NormalizationForm.FormD), "[^A-Za-z| ]", string.empty);

The perform is better with this solution:

string test = "áéíóúç";

string result = Regex.Replace(test .Normalize(NormalizationForm.FormD), "[^A-Za-z| ]", string.empty);
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文