应使用什么代码页/字符集将来自 MVS 系统的数据解释为 Java 环境?

发布于 2024-07-19 08:41:21 字数 1218 浏览 11 评论 0原文

我遇到了一个有趣的问题(与遗留系统交互时经常出现这种情况)。 我正在开发一个应用程序(当前在 x86 Linux 或 Windows 系统上运行),它可以接收来自各种系统的请求,其中之一是 MVS 系统。

我正在尝试确定应该使用哪个代码页/字符集来解释来自 MVS 系统的请求数据。

过去,我使用“cp500”(IBM-500)来解释 z/OS 系统的字节日期,但我担心,由于 MVS 有点像遗留系统,而且 IBM 似乎改变了主意在使用什么编码方面保持一致(必须有数十种 EBCDIC 编码),cp500 可能不是正确的编码。

我在 Java 中找到的关于字符集的最佳资源是: http://mindprod.com/jgloss/encoding< /a> . 然而从这个网站和IBM Infocenters,我还没有能够得到明确的答案。

编辑:从我对 Pax 的回复中添加如下:

我的问题中请求数据的来源有一个明显的漏洞。 在本例中,数据的来源是通过 Websphere MQ 接口。 Websphere MQ 确实具有转换为正确编码的功能,但这仅用于使用 MQMessage.readString() 读取数据,该功能已被弃用。 我更愿意使用它,但是我使用的是专有的接口框架,在该框架中我无法更改从 MQQueue 读取消息的方式,MQQueue 直接从队列读取字节,因此我需要处理翻译。

最终答案:我想跟进此事。 事实证明,正确的字符集确实是 cp500 (IBM-500)。 然而,我的印象是结果可能会有所不同。 对于遇到同样问题的其他人的一些提示:

Utilize Charset.availableCharsets();。 这将为您提供运行时支持的字符集的映射。 我迭代这些集合并打印出转换为该字符集的字节数据。 虽然它没有给我想要的答案(主要是因为我无法读取传入的数据),但我想它可能对其他人有帮助。

请参阅:http://mindprod.com/jgloss/encoding 了解支持的字符集列表。

最后,虽然我还没有确认这一点,但请确保您使用的是正确的 JRE。 我认为 IBM 运行时比 OpenJDK 或 Sun 的运行时支持更多的 EBCDIC 字符集。

I've come into an interesting problem (as is often the case in interacting with legacy systems). I'm working on an application (which currently runs on a x86 Linux or Windows system) that can receive requests from a variety of systems, one of them being an MVS system.

I am attempting to determine which codepage/charset I should be using to interpret request data coming from the MVS system.

In the past, I've used 'cp500' (IBM-500) to interpret byte date coming for z/OS systems, however I fear that since MVS is a bit of a legacy system, and that since IBM seemed to change it's mind consistently with respect to what encoding to use (there must be tens of EBCDIC encodings), that cp500 may not be the correct encoding.

The best resource I've found on character sets in Java is: http://mindprod.com/jgloss/encoding . However from this site, and IBM Infocenters, I have not been able to get a clear answer.

EDIT: Added from my response to Pax below:

There was a glaring hole in my question in the origin of the request data. In this case, the origin of the data is through a Websphere MQ interface. Websphere MQ does have facilities for translating to the proper encoding, however that is only for reading the data using MQMessage.readString(), which has since been deprecated. I would prefer to use this, however I am using a proprietary interface framework in which I can't change how the message is read off the MQQueue, which is reading bytes directly off the Queue and thus I am left handle translation.

Final Answer: I wanted to follow up on this. It turns out the correct Character Set was indeed cp500 (IBM-500). However, i'm under the impression that results may vary. Some tips for anyone else with the same issue:

Utilize Charset.availableCharsets();. This will give you a map of Supported Character Sets in your run time. I iterated through these sets and printed out my byte data translated into that character set. While it didn't give me the answer I wanted (mainly because I wasn't able to read data as it was coming in), I imagine it could be helpful for others.

Refer to: http://mindprod.com/jgloss/encoding for a list of supported char sets.

Lastly, though I have not confirmed this, but ensure you are using the right JRE. I'm thinking that the IBM Runtimes support more EBCDIC character sets then OpenJDK or Sun's Runtimes.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

我爱人 2024-07-26 08:41:21

“MVS 有点像遗留系统”? 哈! 它仍然是可靠性是首要考虑因素的应用程序的首选操作系统。 现在回答你的问题:-)

这完全取决于生成数据的内容。 例如,如果您只是从主机下载文件,FTP 协商可能会处理它。 但既然您提到了 Java,它可能是通过 JDBC 连接到 DB2/z,并且 JDBC 驱动程序将很好地处理它(如果您使用 IBM 自己的 JRE 而不是 Sun 版本,效果会更好)。

主机上的 EBCDIC 本身有很多不同的编码,因此您至少需要让我们知道数据来自哪里。 最新版本的 DB2 在数据库中存储 Unicode 方面没有任何问题,这将减轻您的所有担忧。

第一个任务是找出数据的来源并从 SysProg 获取编码(如果未自动处理)。

更新:

安德鲁,根据您添加的文本,您声明无法使用提供的翻译,您将不得不使用手动方法。 您需要识别数据源并从中获取 CCSID。 然后手动进行 Unicode(或者您使用的任何代码页,如果不是 Unicode)之间的转换。

CCSID 500 是 EBCDIC International(无欧元)的默认代码页,但这些机器在全球范围内使用。 z/OS 转换服务是您通常在大型机上进行转换的方式。

虽然这是一个 iSeries 页面,但它列出了大量的 CCSID 及其字形,也适用于大型机。

您可能只需要弄清楚您使用的是 CCSID 500 还是 37(或其中一种外语版本),并计算出与 Unicode CCSID 1208 的映射。您的 SysProg 将能够告诉您是哪一种。 如果您在一家美国公司工作,可能 500 或 37,但 IBM 花费了大量精力来支持多个代码页。 当他们所有的大型机软件都默认存储并使用 Unicode 时,我会很高兴,这会让事情变得更容易。

"MVS is a bit of a legacy system"? Ha! It's still the OS of choice for applications where reliability is the number one concern. Now on to your question :-)

It depends entirely on what is generating the data. For example, if you're just downloading files from the host, the FTP negotiation may handle it. But since you mention Java, it's probably connecting via JDBC to DB2/z, and the JDBC drivers will handle it quite well (much better if you're using IBM's own JRE rather than the Sun version).

EBCDIC itself on the host has quite a few different encodings so you need to at least let us know where the data is coming from. Recent versions of DB2 have no issue with storing Unicode in the database which would alleviate all your concerns.

First task, find out where the data is coming from and get the encoding from your SysProg (if it's not automatically handled).

Update:

Andrew, based on your added text where you state you can't use the provided translations, you're going to have to use the manual method. You need to identify the source of the data and get the CCSID from that. Then do the translation to and from Unicode (or whatever code page you're using if not Unicode) manually.

CCSID 500 is the default code page for EBCDIC International (no Euro) but these machines are used all over the planet. z/OS conversion services is how you usually do the conversion on the mainframe.

Although this is an iSeries page, it lists a huge number of CCSIDs and their glyphs, applicable to the mainframe as well.

You probably just need to figure out whether you're using CCSID 500 or 37 (or one of the foreign language versions) and work out the mapping with Unicode CCSID 1208. Your SysProg will be able to tell you which one. If you're working for a US company, it probably 500 or 37, but IBM expends a great deal of effort supporting multiple code pages. I'll be glad when all their mainframe software stores and uses Unicode by default, it'll make things much easier.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文