Android短信中的特殊字符
我已经观察这个问题很多年了,但不知道它来自哪里。我担心这个 bug 在 2011 年的 Android 新版本中仍然可以观察到,我希望你最终能帮助我完全理解它,如果不能解决它的话。
让我们考虑给定的(真实的)情况。 “A”先生正在他的 Xperia Arc(官方 2.3.3)上使用索尼的自定义 SMS/MMS 应用程序。 B 先生在他的 Milestone 上使用 Android SMS/MMS 堆栈应用程序(Cyanogen 6.12,非官方 2.2)。他们都使用法语 Android(如果这很重要的话)。
当 A 向 B 发送包含“ç”、“ê”等特殊字符的短信时,B 会收到一条将这些字符替换为空格的消息。不过像“é”这样的字符工作得很好。 当 B 向 A 发送短信时,一切正常。 当 A 向自己发送这条短信时,一切正常。
结论:这不是移动提供商的错,因为它以一种方式工作,而不是另一种方式。
所以,我一开始就猜测是A的定制应用出了问题。替换为B手机上的apk。一切都保持不变。我反编译了该应用程序,但没有找到短信字符串的编码是在哪里完成的。我得出的结论是这个错误不是来自应用程序,而是来自 Android 编码字符串的方式......
我运行了另一个测试: 我写了一条只有标准字符的短信,大约 1.5 条短信中有 250 个字符。然后,我在短信中附加一个“ç”。 在 A 的电话上:计数器显示已消耗 10 个字符。 在 B 的电话上:柜台显示短信现在需要 3 条短信:字符串大小增加了一倍!
结论 : 在 A 的电话上,默认字符集包括“ç”。 在 B 的手机上,当出现“ç”时,字符集发生变化,每个字符需要原来空间的两倍。 (或者我错过了什么?)
问题: 为什么不同版本的 Android 不使用相同的默认字符集? 例如,在 Android 上,这些默认字符集是否取决于 rom? 我们可以在某个地方(在菜单中或直接在已root的手机上)配置/更改这些字符集吗? 还有另一种简单的方法可以解决这个问题吗?
欢迎任何帮助、解释或经验:)
I've observed this issue for years now, not knowing where it came from. I am concerned that this bug is still observable in the new versions of Android, in 2011, and I hope you can finally help me to fully understand it, if not solve it.
Let's consider the given (real) situation. Mister "A" is using a custom SMS/MMS app from Sony on his Xperia Arc (official 2.3.3). Mister B is using the android SMS/MMS stack app on his Milestone (Cyanogen 6.12, unofficial 2.2). Both of them use Android in French (if that matters).
When A sends a sms to B containing special characters like "ç", "ê", B receives a message with these characters replaced by a space. Characters like "é" are working fine though.
When B sends the sms to A, everything works fine.
When A sends this sms to himself, everything works fine.
Conclusion : this is not the mobile provider's fault since it works in one way and not the other.
So, I guessed at first that something was wrong with A's custom app. Replaced it with the apk from B's phone. Everything remained the same. I decompiled the app and I didn't find where the encoding of the sms string was done. I concluded the bug is not coming from the app, but from the way Android encodes the strings...
I ran another test :
I wrote an sms with only standard characters, something like 250 characters in 1.5 sms. Then, I append a "ç" to the sms.
On A's phone : the counter says it consumed 10 characters.
On B's phone : the counter says the sms now takes 3 sms : the string size doubled !
Conclusion :
On A's phone, the default charset includes "ç".
On B's phone, when "ç" appears, the charset changes and each character needs then twice the original space.
(Or am I missing something ?)
Questions :
Why different version of Android aren't using the same default charset ?
On Android, are these default charset depending on the rom, for example ?
Can we configure/change these charset somewhere (in the menu or directly on a rooted phone) ?
Is there another easy way to fix this ?
Any help, explanation or experience is welcome :)
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您正面临编码问题。从描述来看,“A”正在以一种字符集发送数据,但不包括有关该字符集的信息。根本原因是要在两个系统之间传递扩展(非 ASCII)字符,它们必须就要使用的编码达成一致。如果限制为 8 位值,则系统同意使用相同的代码页。在 SMS 中,有一个用于 7 或 8 位编码的特殊 GSM 代码页,或者可以使用 UTF-16,它使用 2 个字节来表示每个字符。当您输入 250 个字符后跟一个扩展字符时,您所看到的内容将显示应用程序中发生的情况。 SMS 消息的长度限制为 140 个八位位组。当您使用 8 位编码时,您的 250 个字符适合 2 条消息 (250 < 280),但是一旦添加“ç”,应用程序就会更改为使用 UTF-16 编码,因此突然间所有字符都占用 2 个八位字节,您可以一条消息只能容纳 70 个字符。现在需要 3.5 条 SMS 消息才能传输整个消息。
在 Android 上,SMS 消息的解码是 SmsCbMessage.java 中的框架电话代码的一部分。它计算出消息正文的语言代码和编码。如果这是不正确的(消息是用英语代码页编码的,但使用法语扩展字符),那么您可能会出现奇怪的字符。
你是对的,这不是移动网络的问题。我怀疑这是电话 A 的消息应用程序,尽管 Android 可能无法正确识别有效短信的编码。我想知道 A 和 iPhone 或其他制造商的设备之间是如何工作的。
You are suffering from encoding problems. From the description it looks like 'A' is sending data in one charset and not including information about what charset that is. The root cause is that to pass extended (non-ascii) characters between two systems they have to agree on an encoding to use. If you are restricted to 8 bit values then the systems agree to use the same codepages. In SMS there is a special GSM codepage for 7 or 8 bit encodings or UTF-16 can be used which uses 2 bytes to represent each character. What you see when you enter 250 characters followed by a single extended character shows you what is happening in the application. An SMS message is restricted to 140 octets. When you are using an 8 bit encoding your 250 chars fit into 2 messages (250 < 280) however once you added the "ç" the app changed to using UTF-16 encoding so suddenly all your characters are taking 2 octets and you can only fit 70 characters into a message. Now it takes 3.5 SMS messages to transfer the entire message.
On Android the decoding of the SMS message is part of the framework telephony code in SmsCbMessage.java. It works out the language code and encoding of the message body. If this is incorrect (the message was encoded with an english codepage but uses french extended chars) then you can get odd characters appearing.
You are right that this is not the mobile network at fault. I suspect it is phone A's messaging application although it is possible that Android is failing to correctly identify the encoding of a valid SMS. I wonder how it works between A and an iPhone or some other manufacturers device.
当我必须在 SMS unicode 应用程序中显示一些特殊字符时,我遇到了同样的问题。我使用的方法是获取需要作为短信发送的字符串,在 for 循环中运行它以获取每个字符,找到其 ascii 代码,使用该整数值使用分隔符对该字符串进行编码。该字符串可以作为短信发送,需要使用与发送时相同的分隔符进行解码,然后将其中的每个 ascii 代码字符转换为字符(特定于语言),通过附加转换后的字符形成字符串。该文本将与作为短信发送的文本相同。
问候
I have encountered the same problem when I had to show a few special characters in an sms unicode app. The method I used was take the string that I need to send as sms, run it in a for loop to take each character , find its ascii code , use that integer value to encode that string using a delimiter. This string can be sent as sms, which needs to be decoded using the same delimiter that is used for sending, then convert each ascii code char in it to characters (language specific), form a string by appending the converted chars. This text will be same as the one that was sent as sms.
Regards