将 Unicode 字符转换为 ASCII (.NET) 中最接近(最相似)的字符

发布于 2024-08-28 18:30:07 字数 339 浏览 7 评论 0原文

如何将不同的 Unicode 字符转换为其最接近的 ASCII 等效字符?喜欢 ä ->答:我用谷歌搜索但没有找到任何合适的解决方案。 Encoding.ASCII.GetBytes("Ä")[0] 这个技巧不起作用。 (结果是)。

我发现有一个类 Encoder 具有 Fallback 属性,该属性恰好适用于 char 无法转换的情况,但实现( EncoderReplacementFallback) 很愚蠢并转换为 ?

有什么想法吗?

How do I to convert different Unicode characters to their closest ASCII equivalents? Like Ä -> A. I googled but didn't find any suitable solution. The trick Encoding.ASCII.GetBytes("Ä")[0] didn't work. (Result was ?).

I found that there is a class Encoder that has a Fallback property that is exactly for cases when char can't be converted, but implementations (EncoderReplacementFallback) are stupid and convert to ?.

Any ideas?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

难理解 2024-09-04 18:30:07

如果只是删除变音符号,则前往这个答案

static string RemoveDiacritics(string stIn) {
  string stFormD = stIn.Normalize(NormalizationForm.FormD);
  StringBuilder sb = new StringBuilder();

  for(int ich = 0; ich < stFormD.Length; ich++) {
    UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[ich]);
    if(uc != UnicodeCategory.NonSpacingMark) {
      sb.Append(stFormD[ich]);
    }
  }

  return(sb.ToString().Normalize(NormalizationForm.FormC));
}

If it is just removing of the diacritical marks, then head to this answer:

static string RemoveDiacritics(string stIn) {
  string stFormD = stIn.Normalize(NormalizationForm.FormD);
  StringBuilder sb = new StringBuilder();

  for(int ich = 0; ich < stFormD.Length; ich++) {
    UnicodeCategory uc = CharUnicodeInfo.GetUnicodeCategory(stFormD[ich]);
    if(uc != UnicodeCategory.NonSpacingMark) {
      sb.Append(stFormD[ich]);
    }
  }

  return(sb.ToString().Normalize(NormalizationForm.FormC));
}
鹿童谣 2024-09-04 18:30:07

MS Dynamics 存在一个问题,它不允许 x20 到 x7f 之外的任何字符,并且该范围内的某些字符也是无效的。我的答案是创建一个以无效字符为键的数组,返回有效字符的最佳猜测。
它不太漂亮,但它有效。

Function PlainAscii(InText)
Dim i, c, a
Const cUTF7 = "^[\x20-\x7e]+$"
Const IgnoreCase = False
    PlainAscii = ""
    If InText = "" Then Exit Function
    If RegExTest(InText, cUTF7, IgnoreCase) Then
        PlainAscii = InText
    Else
        For i = 1 To Len(InText)
            c = Mid(InText, i, 1)
            a = Asc(c)
            If a = 10 Or a = 13 Or a = 9 Then
                ' Do Nothing - Allow LF, CR & TAB
            ElseIf a < 32 Then
                c = " "
            ElseIf a > 126 Then
                c = CvtToAscii(a)
            End If
            PlainAscii = PlainAscii & c
        Next
    End If
End Function

Function CvtToAscii(inChar)
' Maps The Characters With The 8th Bit Set To 7 Bit Characters
Dim arrChars
    arrChars = Array(" ", " ", "$", " ", ",", "f", """", " ", "t", "t", "^", "%", "S", "<", "O", " ", "Z", " ", " ", "'", "'", """", """", ".", "-", "-", "~", "T", "S", ">", "o", " ", "Z", "Y", " ", "!", "$", "$", "o", "$", "|", "S", " ", "c", " ", " ", " ", "_", "R", "_", ".", " ", " ", " ", " ", "u", "P", ".", ",", "i", " ", " ", " ", " ", " ", " ", "A", "A", "A", "A", "A", "A", "A", "C", "E", "E", "E", "E", "I", "I", "I", "I", "D", "N", "O", "O", "O", "O", "O", "X", "O", "U", "U", "U", "U", "Y", "b", "B", "a", "a", "a", "a", "a", "a", "a", "c", "e", "e", "e", "e", "i", "i", "i", "i", "o", "n", "o", "o", "o", "o", "o", "/", "O", "u", "u", "u", "u", "y", "p", "y")
    CvtToAscii = arrChars(inChar - 127)
End Function

Function RegExTest(ByVal strStringToSearch, strExpression, IgnoreCase)
Dim objRegEx
    On Error Resume Next
    Err.Clear
    strStringToSearch = Replace(Replace(strStringToSearch, vbCr, ""), vbLf, "")
    RegExTest = False
    Set objRegEx = New RegExp
    With objRegEx
        .Pattern = strExpression    '//the reg expression that should be searched for
        If Err.Number = 0 Then
            .IgnoreCase = CBool(IgnoreCase)    '//not case sensitive
            .Global = True              '//match all instances of pattern
            RegExTest = .Test(strStringToSearch)
        End If
    End With
    Set objRegEx = Nothing
    On Error Goto 0
End Function

你的答案必然会有所不同。

MS Dynamics has a problem where it won't allow for any character outside of x20 to x7f and some characters within that range are also invalid. My answer was to create an array keyed to the invalid characters returning the best guess of the valid characters.
It ain't pretty, but it works.

Function PlainAscii(InText)
Dim i, c, a
Const cUTF7 = "^[\x20-\x7e]+$"
Const IgnoreCase = False
    PlainAscii = ""
    If InText = "" Then Exit Function
    If RegExTest(InText, cUTF7, IgnoreCase) Then
        PlainAscii = InText
    Else
        For i = 1 To Len(InText)
            c = Mid(InText, i, 1)
            a = Asc(c)
            If a = 10 Or a = 13 Or a = 9 Then
                ' Do Nothing - Allow LF, CR & TAB
            ElseIf a < 32 Then
                c = " "
            ElseIf a > 126 Then
                c = CvtToAscii(a)
            End If
            PlainAscii = PlainAscii & c
        Next
    End If
End Function

Function CvtToAscii(inChar)
' Maps The Characters With The 8th Bit Set To 7 Bit Characters
Dim arrChars
    arrChars = Array(" ", " ", "$", " ", ",", "f", """", " ", "t", "t", "^", "%", "S", "<", "O", " ", "Z", " ", " ", "'", "'", """", """", ".", "-", "-", "~", "T", "S", ">", "o", " ", "Z", "Y", " ", "!", "$", "$", "o", "$", "|", "S", " ", "c", " ", " ", " ", "_", "R", "_", ".", " ", " ", " ", " ", "u", "P", ".", ",", "i", " ", " ", " ", " ", " ", " ", "A", "A", "A", "A", "A", "A", "A", "C", "E", "E", "E", "E", "I", "I", "I", "I", "D", "N", "O", "O", "O", "O", "O", "X", "O", "U", "U", "U", "U", "Y", "b", "B", "a", "a", "a", "a", "a", "a", "a", "c", "e", "e", "e", "e", "i", "i", "i", "i", "o", "n", "o", "o", "o", "o", "o", "/", "O", "u", "u", "u", "u", "y", "p", "y")
    CvtToAscii = arrChars(inChar - 127)
End Function

Function RegExTest(ByVal strStringToSearch, strExpression, IgnoreCase)
Dim objRegEx
    On Error Resume Next
    Err.Clear
    strStringToSearch = Replace(Replace(strStringToSearch, vbCr, ""), vbLf, "")
    RegExTest = False
    Set objRegEx = New RegExp
    With objRegEx
        .Pattern = strExpression    '//the reg expression that should be searched for
        If Err.Number = 0 Then
            .IgnoreCase = CBool(IgnoreCase)    '//not case sensitive
            .Global = True              '//match all instances of pattern
            RegExTest = .Test(strStringToSearch)
        End If
    End With
    Set objRegEx = Nothing
    On Error Goto 0
End Function

Your answer is necessarily going to be different.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文