将 unicode 转义序列转换为字符串

发布于 2024-10-06 16:05:42 字数 2059 浏览 2 评论 0原文

嗨,我有这个问题。我从服务器获取 JSON 字符串作为 unicode 转义序列,我需要将此序列转换为 unicode 字符串。我找到了一些解决方案,但任何解决方案都不适用于所有 json 响应。

例如,我从服务器得到这个字符串。

string encodedText="{\"DATA\":{\"idUser\":18167521,\"nick\":\"KecMessanger2\",\"photo\":\"1\",\"sex\":1,\"photoAlbums\":0,\"videoAlbums\":0,\"sefNick\":\"kecmessanger2\",\"profilPercent\":0,\"emphasis\":false,\"age\":25,\"isBlocked\":false,\"PHOTO\":{\"normal\":\"http://213.215.107.125/fotky/1816/75/n_18167521.jpg?v=1\",\"medium\":\"http://213.215.107.125/fotky/1816/75/m_18167521.jpg?v=1\",\"24x24\":\"http://213.215.107.125/fotky/1816/75/s_18167521.jpg?v=1\"},\"PLUS\":{\"active\":false,\"activeTo\":\"0000-00-00\"},\"LOCATION\":{\"idRegion\":\"1\",\"regionName\":\"Banskobystricku00fd kraj\",\"idCity\":\"109\",\"cityName\":\"Rimavsku00e1 Sobota\"},\"STATUS\":{\"isLoged\":true,\"isChating\":false,\"idChat\":0,\"roomName\":\"\",\"lastLogin\":1291898043},\"PROJECT_STATUS\":{\"photoAlbums\":0,\"photoAlbumsFavs\":0,\"videoAlbums\":0,\"videoAlbumsFavs\":0,\"videoAlbumsExts\":0,\"blogPosts\":0,\"emailNew\":0,\"postaNew\":0,\"clubInvitations\":0,\"dashboardItems\":26},\"STATUS_MESSAGE\":{\"statusMessage\":\"Nepru00edtomnu00fd.\",\"addTime\":\"1291887539\"},\"isFriend\":false,\"isIamFriend\":false}}"; 

jsonstring中的statusMessage由Nepru00edtomnu00fd组成,.net unicode字符串中是Neprítomný

jsonstring 中的区域由 .net unicode 字符串中的 Banskobystricku00fd 组成,它是 BanskoBystrický

其他示例:

  1. Nepru00edtomnu00fd -> Neprítomný
  2. Banskobystricku00fd ->班斯科Bystrický
  3. Trenu010du00edn -> 。

我需要将 unicode 转义序列转换为斯洛伐克语言的 .net 字符串

在转换时我使用了这个函数:

private static string UnicodeStringToNET(string input)
{
    var regex = new Regex(@"\\[uU]([0-9A-F]{4})", RegexOptions.IgnoreCase);
    return input = regex.Replace(input, match => ((char)int.Parse(match.Groups[1].Value,
      NumberStyles.HexNumber)).ToString());
}

哪里可能有问题?

Hi I have this problem. From server I get JSON string as unicode escape sequences an I need convert this sequences to unicode string. I find some solution, but any doesn’t work for all json response.

For example from server I get this string.

string encodedText="{\"DATA\":{\"idUser\":18167521,\"nick\":\"KecMessanger2\",\"photo\":\"1\",\"sex\":1,\"photoAlbums\":0,\"videoAlbums\":0,\"sefNick\":\"kecmessanger2\",\"profilPercent\":0,\"emphasis\":false,\"age\":25,\"isBlocked\":false,\"PHOTO\":{\"normal\":\"http://213.215.107.125/fotky/1816/75/n_18167521.jpg?v=1\",\"medium\":\"http://213.215.107.125/fotky/1816/75/m_18167521.jpg?v=1\",\"24x24\":\"http://213.215.107.125/fotky/1816/75/s_18167521.jpg?v=1\"},\"PLUS\":{\"active\":false,\"activeTo\":\"0000-00-00\"},\"LOCATION\":{\"idRegion\":\"1\",\"regionName\":\"Banskobystricku00fd kraj\",\"idCity\":\"109\",\"cityName\":\"Rimavsku00e1 Sobota\"},\"STATUS\":{\"isLoged\":true,\"isChating\":false,\"idChat\":0,\"roomName\":\"\",\"lastLogin\":1291898043},\"PROJECT_STATUS\":{\"photoAlbums\":0,\"photoAlbumsFavs\":0,\"videoAlbums\":0,\"videoAlbumsFavs\":0,\"videoAlbumsExts\":0,\"blogPosts\":0,\"emailNew\":0,\"postaNew\":0,\"clubInvitations\":0,\"dashboardItems\":26},\"STATUS_MESSAGE\":{\"statusMessage\":\"Nepru00edtomnu00fd.\",\"addTime\":\"1291887539\"},\"isFriend\":false,\"isIamFriend\":false}}"; 

statusMessage in jsonstring consist Nepru00edtomnu00fd, in .net unicode string is it Neprítomný.

region in jsonstring consist Banskobystricku00fd in .net unicode string is it BanskoBystrický.

Other examples:

  1. Nepru00edtomnu00fd -> Neprítomný
  2. Banskobystricku00fd -> BanskoBystrický
  3. Trenu010du00edn -> Trenčín

I need convert unicode escape sequences to .net string in slovak language.

On converting I used this func:

private static string UnicodeStringToNET(string input)
{
    var regex = new Regex(@"\\[uU]([0-9A-F]{4})", RegexOptions.IgnoreCase);
    return input = regex.Replace(input, match => ((char)int.Parse(match.Groups[1].Value,
      NumberStyles.HexNumber)).ToString());
}

Where can be problem?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

橘亓 2024-10-13 16:05:42

这是我为完成这项工作而编写的方法(基于之前的答案)。它处理 \uhhhh 和 \Uhhhhhhhh,并且它将保留转义的 unicode 转义(因此,如果您的字符串需要包含文字 \uffff,您可以这样做)。临时占位符 \uf00b 位于私人使用区域中,因此它通常不应出现在统一码字符串。

    public static string ParseUnicodeEscapes(string escapedString)
    {
        const string literalBackslashPlaceholder = "\uf00b";
        const string unicodeEscapeRegexString = @"(?:\\u([0-9a-fA-F]{4}))|(?:\\U([0-9a-fA-F]{8}))";
        // Replace escaped backslashes with something else so we don't
        // accidentally expand escaped unicode escapes.
        string workingString = escapedString.Replace("\\\\", literalBackslashPlaceholder);

        // Replace unicode escapes with actual unicode characters.
        workingString = new Regex(unicodeEscapeRegexString).Replace(workingString,
            match => ((char) Int32.Parse(match.Value.Substring(2), NumberStyles.HexNumber))
            .ToString(CultureInfo.InvariantCulture));

        // Replace the escaped backslash placeholders with non-escaped literal backslashes.
        workingString = workingString.Replace(literalBackslashPlaceholder, "\\");
        return workingString;
    }

Here's a method (based on previous answers) that I wrote to do the job. It handles both \uhhhh and \Uhhhhhhhh, and it will preserve escaped unicode escapes (so if your string needs to contain a literal \uffff, you can do that). The temporary placeholder character \uf00b is in a private use area, so it shouldn't typically occur in Unicode strings.

    public static string ParseUnicodeEscapes(string escapedString)
    {
        const string literalBackslashPlaceholder = "\uf00b";
        const string unicodeEscapeRegexString = @"(?:\\u([0-9a-fA-F]{4}))|(?:\\U([0-9a-fA-F]{8}))";
        // Replace escaped backslashes with something else so we don't
        // accidentally expand escaped unicode escapes.
        string workingString = escapedString.Replace("\\\\", literalBackslashPlaceholder);

        // Replace unicode escapes with actual unicode characters.
        workingString = new Regex(unicodeEscapeRegexString).Replace(workingString,
            match => ((char) Int32.Parse(match.Value.Substring(2), NumberStyles.HexNumber))
            .ToString(CultureInfo.InvariantCulture));

        // Replace the escaped backslash placeholders with non-escaped literal backslashes.
        workingString = workingString.Replace(literalBackslashPlaceholder, "\\");
        return workingString;
    }
丶视觉 2024-10-13 16:05:42

你的转义序列不以 \ 开头,比如“\u00fd”,所以你的正则表达式应该只是

"[uU]([0-9A-F]{4})"

......

Your escape sequences do not start with a \ like "\u00fd" so you Regex should be only

"[uU]([0-9A-F]{4})"

...

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文