IBM 大型机上 Shift-JIS 中的日语 COBOL 代码;传输到PC后代表如何?

发布于 2024-08-02 16:24:30 字数 617 浏览 5 评论 0原文

我们有一个日本客户,在大型机上有 COBOL 源代码。他声称大型机上的代码是用 Shift-JIS2 表示的(我们认为我们对此非常了解)。当该代码传输到 PC 时,最常用的编码是什么? 我们已经向他发送了一个程序来处理 COBOL 代码,但它似乎令人窒息。客户不会直接给我们代码,所以实验很困难。他的实验似乎表明UTF-8;我假设 Shift-JIS2 中可编码的日语字符相应地转换为 Unicode 等效项。有人有这里的经验吗?

编辑:我想我们解决了我们的谜团。客户(呃!)在 PC 上使用 CP-932(“ShiftJIS”),但他的 COBOL 程序的标识符中包含日语字符,这就是我们的工具令人窒息的原因。

编辑:后续:更多的惊喜。 SHIFT-JIS 通常将我们所认为的 ASCII 文本编码为所谓的“FULLWIDTH”字符,它们占用与东亚表意文字相同的屏幕空间;常规 ASCII 字符充当半角。所以,有一个全宽“A” 、“B”、...“Z”以及全宽“-”。显然,要处理日语 COBOL,我们的 COBOL 解析器不仅必须接受西方 ASCII,而且还必须接受 FULLWIDTH 等效项,尤其是。全宽字母和令人惊讶的全宽连字符用于分隔 COBOL 标识符中的“字母”。

编辑:IBM Enterprise COBOL 允许在标识符中使用 DBCS 字符。哎呀!

We have a Japanese client that has source code in COBOL on an mainframe. He claims the code on the mainframe is represented in Shift-JIS2 (and we think we understand that pretty well). When that code is transferred to an PC, what is the most common encoding used?
We've sent him a program to process that COBOL code and it seems to choke. The customer won't give us the code directly, so experiments are hard. His experiments seem to indicate UTF-8; I assume the Japanese characters encodable in Shift-JIS2 are correspondingly converted to Unicode equivalents. Anybody have any experience here?

EDIT: I think we solved our mystery. The client is (duh!) using CP-932 ("ShiftJIS") on the PC, but his COBOL program has Japanese characters in the identifiers, and that's why our tool is choking.

EDIT: Followup: A bit more of a surprise. SHIFT-JIS often encodes what we think of as ASCII text as so-called "FULLWIDTH" characters, that take the same screen space as an East Asian ideograph; conventionalo ASCII characters act as half-width. So, there's a FULLWIDTH "A"
, "B", ... "Z" as well as FULLWIDTH "-". Apparantly, to process Japanese COBOL, our COBOL parser has to accept not only Western ASCII, but also the FULLWIDTH equivalents, esp. the FULLWIDTH letters and surprisingly a FULLWIDTH HYPHEN used to seperate "letters" in a COBOL identifier.

EDIT: IBM Enterprise COBOL allows DBCS characters in identifiers. Yikes!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

梦幻之岛 2024-08-09 16:24:30

日本仍然广泛使用三种编码:EUC-JP、ISO-2022-JP 和 Shift-JIS。

ISO-2022-JP 通常用于电子邮件。虽然您会在 Unix 机器上看到 EUC-JP。不过,我个人除了 Shift-JIS 之外没有处理过任何东西。 (也不是大型机。)

There's three encodings that are all still very much in use in Japan: EUC-JP, ISO-2022-JP, and Shift-JIS.

ISO-2022-JP is usually used for E-mails. While you'll see EUC-JP in Unix machines. I personally haven't dealt with anything other than Shift-JIS though. (Nor mainframes.)

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文