恢复运行时 unicode 字符串
我正在构建一个通过 tcp 接收带有编码 unicode 的运行时字符串的应用程序,示例字符串为“\u7cfb\u8eca\u4e21\uff1a\u6771\u5317 ...”。我有以下内容,但不幸的是我只能在编译时受益,因为:不完整的通用字符名称 \u 因为它在编译时期望 4 个十六进制字符。
QString restoreUnicode(QString strText)
{
QRegExp rx("\\\\u([0-9a-z]){4}");
return strText.replace(rx, QString::fromUtf8("\u\\1"));
}
我正在运行时寻找解决方案,我可以预见分解这些字符串并进行一些操作以将“\u”分隔符后面的那些十六进制转换为基数10,然后将它们传递到 QChar 的构造函数中,但我正在寻找如果存在更好的方法,因为我非常关心这种方法所产生的时间复杂性并且不是专家。
有没有人有任何解决方案或提示。
I'm building an application that receives runtime strings with encoded unicode via tcp, an example string would be "\u7cfb\u8eca\u4e21\uff1a\u6771\u5317 ...". I have the following but unfortunately I can only benefit from it at compile time due to: incomplete universal character name \u since its expecting 4 hexadecimal characters at compile time.
QString restoreUnicode(QString strText)
{
QRegExp rx("\\\\u([0-9a-z]){4}");
return strText.replace(rx, QString::fromUtf8("\u\\1"));
}
I'm seeking a solution at runtime, I could I foreseen break up these strings and do some manipulation to convert those hexadecimals after the "\u" delimiters into base 10 and then pass them into the constructor of a QChar but I'm looking for a better way if one exists as I am very concerned about the time complexity incurred by such a method and am not an expert.
Does anyone have any solutions or tips.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您应该自己解码该字符串。只需获取 Unicode 条目 (
rx.indexIn(strText)
),解析它 (int result; std::istringstream iss(s); if (!(iss>>std:: hex>>result).fail()) ...
并将原始字符串\\uXXXX
替换为(wchar_t)result
。You should decode the string by yourself. Just take the Unicode entry (
rx.indexIn(strText)
), parse it (int result; std::istringstream iss(s); if (!(iss>>std::hex>>result).fail()) ...
and replace the original string\\uXXXX
with(wchar_t)result
.对于闭包以及将来遇到此线程的任何人,这是在优化这些变量的范围之前我的初始解决方案。不是它的粉丝,但考虑到我无法控制的流中的 unicode 和/或 ascii 的不可预测性(仅限客户端),它的工作原理,虽然 Unicode 存在率很低,但最好处理它而不是丑陋的 \u1234 ETC。
For closure and anyone who comes across this thread in future, here is my initial solution before optimising the scope of these variables. Not a fan of it but it works given the unpredictable nature of unicode and/or ascii in the stream of which I have no control over (client only), whilst Unicode presence is low, it is good to handle it instead of ugly \u1234 etc.