如何将 128 到 255 的字节映射到等效的 UTF16-LE 代理项对

发布于 2024-12-21 22:04:24 字数 1060 浏览 2 评论 0原文

我正在尝试实现这一目标：

我在 java web 服务中有一个 PDF byte[]，我必须将其作为 base64 字符串发送到 .NET 客户端，该客户端会执行此操作以重建文件。

Encoding.Convert(Encoding.Unicode, Encoding.Default, Convert.FromBase64String(inputJava))

我无法更改客户端代码，现在 java web 服务正在调用另一个 .NET web 服务，该服务将 byte[] 转换为 base64 字符串：

System.Text.Encoding.Convert(System.Text.Encoding.GetEncoding(1252), System.Text.Encoding.Unicode, b);

除了我可以通过各种方式制作的 base64 之外（例如使用 org.apache .commons.codec.binary.Base64），我必须将原始 byte[] 转换为 UTF-16LE byte[]...

我尝试了这个：

byte[] output = new byte[b.length * 2];
for(int i=0; i < b.length; i++) 
{
  int val = b[i];
  if(val < 0) val += 256;

  output[2*i + 0] = (byte) (val);   
  output[2*i + 1] = 0; 
}

这对于低于 128 的值（例如 1 => 0100, 2 => 0200, ... , 127 => 7F00) 但对于高于 (128 -> 255) 的值，我不知道如何获得等效的 2 字节值；我知道对于字节 156 (9C)，对应的值为 8301 (0x5301)，对于字节 224 (E0)，对应的值为 12501 (0x7D01)，但我没有找到一种算法来获取所有其他值。

字节值和相应的 UTF-16LE 代理对之间是否存在映射表或将值从 128 映射到 255 的算法？

提前致谢！

原文

I'm trying to achieve this:

I have a PDF byte[] in java web service that I must send as a base64 string to a .NET client that does this to reconstruct the file.

Encoding.Convert(Encoding.Unicode, Encoding.Default, Convert.FromBase64String(inputJava))

I cannot change the client code and right now the java web service is calling another .NET web service that does this to turn the byte[] into a base64 string:

System.Text.Encoding.Convert(System.Text.Encoding.GetEncoding(1252), System.Text.Encoding.Unicode, b);

Beside the base64 that I can make in various ways (e.g. with org.apache.commons.codec.binary.Base64), I have to turn the original byte[] into a UTF-16LE byte[]...

I tried this:

byte[] output = new byte[b.length * 2];
for(int i=0; i < b.length; i++) 
{
  int val = b[i];
  if(val < 0) val += 256;

  output[2*i + 0] = (byte) (val);   
  output[2*i + 1] = 0; 
}

This works fine for values below 128 (e.g. for 1 => 0100, 2 => 0200, ... , 127 => 7F00) but for values above (128 -> 255) I don't know how to get the equivalent 2bytes values; I know that for byte 156 (9C) the corresponding value is 8301 (0x5301) and for byte 224 (E0) the corresponding value is 12501 (0x7D01) but I didn't manage to find an algorithm to get all the other values.

Is there a mapping table between byte value and the corresponding UTF-16LE surrogate pair or an algorithm to map values from 128 to 255?

Thanks in advance!

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

鱼窥荷 2024-12-28 22:04:24

你不需要代理对；它们是用于处理基本多语言平面 (BMP) 之外的字符的构造，并且所有 windows-1252 字符均采用 BMP。

官方的 windows-1252（别名 cp1252）到 Unicode 映射表是
http://unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS /CP1252.TXT
它是一个易于处理格式的纯文本文件，因此如果您找不到用于转换的现有工具，那么根据该文件编写映射应该相当简单。

该文件在官方 IANA 注册表中被间接引用：
http://www.iana.org/assignments/character-sets

回复收藏 0 原文

非要怀念 2024-12-28 22:04:24

byte[] encoded = new String(b, "windows-1252").getBytes("UTF-16LE");

byte[] encoded = new String(b, "windows-1252").getBytes("UTF-16LE");

回复收藏 0 原文

夏夜暖风 2024-12-28 22:04:24

我终于找到了解决办法。看起来只有从 128 到 159 的字节需要代理对。我使用这段代码来模拟 .NET Unicode 编码：

public class Encoder {
   static Map<Integer, Integer> mapTiny = new HashMap<Integer, Integer>() {
        public Integer get(Object key) {
            Integer code = super.get(key);
            if (code == null)
                code = (Integer) key;
            return code;
        }
    };

    static {
        mapTiny.put(128,8364);
        mapTiny.put(130,8218);
        mapTiny.put(131,402);
        mapTiny.put(132,8222);
        mapTiny.put(133,8230);
        mapTiny.put(134,8224);
        mapTiny.put(135,8225);
        mapTiny.put(136,710);
        mapTiny.put(137,8240);
        mapTiny.put(138,352);
        mapTiny.put(139,8249);
        mapTiny.put(140,338);
        mapTiny.put(142,381);
        mapTiny.put(145,8216);
        mapTiny.put(146,8217);
        mapTiny.put(147,8220);
        mapTiny.put(148,8221);
        mapTiny.put(149,8226);
        mapTiny.put(150,8211);
        mapTiny.put(151,8212);
        mapTiny.put(152,732);
        mapTiny.put(153,8482);
        mapTiny.put(154,353);
        mapTiny.put(155,8250);
        mapTiny.put(156,339);
        mapTiny.put(158,382);
        mapTiny.put(159,376);
    }


public static String encode(byte[] b) throws IOException {

        ByteArrayInputStream in = new ByteArrayInputStream(b);
        ByteArrayOutputStream convFileByteArray = new ByteArrayOutputStream();
        int i = in.read();
        while (i != -1) {
            convFileByteArray.write(new byte[] { (byte) (mapTiny.get(i) & 0xff), (byte) ((mapTiny.get(i) >> 8) & 0xff) });
            i = in.read();
        }
        return Base64.encodeToString(convFileByteArray.toByteArray(), false);
    }

}

I finally found a solution. It looks like that only bytes from 128 to 159 need the surrogate pairs. I use this piece of code to emulate .NET Unicode encoding:

public class Encoder {
   static Map<Integer, Integer> mapTiny = new HashMap<Integer, Integer>() {
        public Integer get(Object key) {
            Integer code = super.get(key);
            if (code == null)
                code = (Integer) key;
            return code;
        }
    };

    static {
        mapTiny.put(128,8364);
        mapTiny.put(130,8218);
        mapTiny.put(131,402);
        mapTiny.put(132,8222);
        mapTiny.put(133,8230);
        mapTiny.put(134,8224);
        mapTiny.put(135,8225);
        mapTiny.put(136,710);
        mapTiny.put(137,8240);
        mapTiny.put(138,352);
        mapTiny.put(139,8249);
        mapTiny.put(140,338);
        mapTiny.put(142,381);
        mapTiny.put(145,8216);
        mapTiny.put(146,8217);
        mapTiny.put(147,8220);
        mapTiny.put(148,8221);
        mapTiny.put(149,8226);
        mapTiny.put(150,8211);
        mapTiny.put(151,8212);
        mapTiny.put(152,732);
        mapTiny.put(153,8482);
        mapTiny.put(154,353);
        mapTiny.put(155,8250);
        mapTiny.put(156,339);
        mapTiny.put(158,382);
        mapTiny.put(159,376);
    }


public static String encode(byte[] b) throws IOException {

        ByteArrayInputStream in = new ByteArrayInputStream(b);
        ByteArrayOutputStream convFileByteArray = new ByteArrayOutputStream();
        int i = in.read();
        while (i != -1) {
            convFileByteArray.write(new byte[] { (byte) (mapTiny.get(i) & 0xff), (byte) ((mapTiny.get(i) >> 8) & 0xff) });
            i = in.read();
        }
        return Base64.encodeToString(convFileByteArray.toByteArray(), false);
    }

}

回复收藏 0 原文

~没有更多了~