我使用的是传统的二进制消息格式,需要 ASCII-6(6 位 ascii)编码的字符序列。我找不到 ASCII-6 的定义,但他们在规范中定义了以 A=0x01、B=0x02 等开头的字符映射。
我想知道 java 中是否存在 ASCII-6 的现有字符集。如果没有,您可以以某种方式创建或定义自己的字符集吗?如果没有,是否有比创建字符到 ascii-6 编码值的映射更好的解决方案?
I'm using a legacy binary message format that requires a character sequence in ASCII-6 (6 bit ascii) encoding. I couldn't find a definition for ASCII-6 but they define the character mappings in their spec starting with A=0x01, B=0x02, etc.
I'm wondering if there is an existing characterset in java for ASCII-6. If not can you create or define your own characterset somehow? If not is there a better solution than to create a map of characters to ascii-6 encoded values?
发布评论
评论(2)
我不确定是否存在任何 6 位编码,其中 A 为
0x01
,B 为0x02
等,但大多数六位编码中的字符可以与 ASCII-7 互换字符通过整数运算。例如,SIXBIT DEC 编码中的字符可以更改为 ASCII- 7 个字符加 32(基数 10),反之亦然,因为 SIXBIT 编码方案仅携带 ASCII-7 字符集中的可打印字符。实现对涉及
Byte
和Character
的此类转换的支持将需要您编写Charset
并使用CharsetProvider
。棘手的部分是将 6 位序列映射到 Unicode 字符,因为字节
是字符集编码器
和解码器
操作在。另一方面,如果您正在对每个需要 8 位宽字符的 6 位编码字符进行操作,那么所述算术运算就变得很容易,否则您将需要跟踪编码器/解码器是否处于无效状态。I'm not sure if any 6-bit encoding exists where A is
0x01
, B is0x02
etc. but characters in most six-bit encodings are interchangeable with ASCII-7 characters through integer arithmetic. For example, characters in the SIXBIT DEC encoding can be changed to ASCII-7 characters by addition of 32 (base 10), and vice versa, as the SIXBIT encoding scheme carries only the printable characters in the ASCII-7 character set.Implementing support for such an transformation involving
Byte
s andCharacter
s will require you to write aCharset
and register it using aCharsetProvider
. The tricky part is in mapping sequences of 6-bits to Unicode characters, as theByte
is the most fundamental unit that charsetEncoders
andDecoders
operate on. On the other hand, if you are operating against 6-bit encoded characters that each require 8-bit wide characters, then the said arithmetic operation becomes easy, otherwise you will need to track whether the encoder/decoder is in an invalid state.您可以通过编写一个扩展
CharsetProvider
的类并使其可供您的应用程序使用来定义您自己的字符编码。例如,JCharset 对一些较少使用的编码执行此操作。据我所知,即使它们不支持旧的 ASCII 变体,但您可以通过研究该实现来了解它是如何完成的。这并不是特别难,只是有点乏味。You can define your own character encoding by writing a class that extends
CharsetProvider
and making it available to your application. For instance, JCharset does this for some lesser-used encodings. As fas as I can see even they don't support old ASCII variants, but you can see how it's done by studying that implementation. It's not particularly hard, just somewhat tedious.