IBM 大型机上 Shift-JIS 中的日语 COBOL 代码;传输到PC后代表如何?
我们有一个日本客户,在大型机上有 COBOL 源代码。他声称大型机上的代码是用 Shift-JIS2 表示的(我们认为我们对此非常了解)。当该代码传输到 PC 时,最常用的编码是什么? 我们已经向他发送了一个程序来处理 COBOL 代码,但它似乎令人窒息。客户不会直接给我们代码,所以实验很困难。他的实验似乎表明UTF-8;我假设 Shift-JIS2 中可编码的日语字符相应地转换为 Unicode 等效项。有人有这里的经验吗?
编辑:我想我们解决了我们的谜团。客户(呃!)在 PC 上使用 CP-932(“ShiftJIS”),但他的 COBOL 程序的标识符中包含日语字符,这就是我们的工具令人窒息的原因。
编辑:后续:更多的惊喜。 SHIFT-JIS 通常将我们所认为的 ASCII 文本编码为所谓的“FULLWIDTH”字符,它们占用与东亚表意文字相同的屏幕空间;常规 ASCII 字符充当半角。所以,有一个全宽“A” 、“B”、...“Z”以及全宽“-”。显然,要处理日语 COBOL,我们的 COBOL 解析器不仅必须接受西方 ASCII,而且还必须接受 FULLWIDTH 等效项,尤其是。全宽字母和令人惊讶的全宽连字符用于分隔 COBOL 标识符中的“字母”。
编辑:IBM Enterprise COBOL 允许在标识符中使用 DBCS 字符。哎呀!
We have a Japanese client that has source code in COBOL on an mainframe. He claims the code on the mainframe is represented in Shift-JIS2 (and we think we understand that pretty well). When that code is transferred to an PC, what is the most common encoding used?
We've sent him a program to process that COBOL code and it seems to choke. The customer won't give us the code directly, so experiments are hard. His experiments seem to indicate UTF-8; I assume the Japanese characters encodable in Shift-JIS2 are correspondingly converted to Unicode equivalents. Anybody have any experience here?
EDIT: I think we solved our mystery. The client is (duh!) using CP-932 ("ShiftJIS") on the PC, but his COBOL program has Japanese characters in the identifiers, and that's why our tool is choking.
EDIT: Followup: A bit more of a surprise. SHIFT-JIS often encodes what we think of as ASCII text as so-called "FULLWIDTH" characters, that take the same screen space as an East Asian ideograph; conventionalo ASCII characters act as half-width. So, there's a FULLWIDTH "A"
, "B", ... "Z" as well as FULLWIDTH "-". Apparantly, to process Japanese COBOL, our COBOL parser has to accept not only Western ASCII, but also the FULLWIDTH equivalents, esp. the FULLWIDTH letters and surprisingly a FULLWIDTH HYPHEN used to seperate "letters" in a COBOL identifier.
EDIT: IBM Enterprise COBOL allows DBCS characters in identifiers. Yikes!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
日本仍然广泛使用三种编码:EUC-JP、ISO-2022-JP 和 Shift-JIS。
ISO-2022-JP 通常用于电子邮件。虽然您会在 Unix 机器上看到 EUC-JP。不过,我个人除了 Shift-JIS 之外没有处理过任何东西。 (也不是大型机。)
There's three encodings that are all still very much in use in Japan: EUC-JP, ISO-2022-JP, and Shift-JIS.
ISO-2022-JP is usually used for E-mails. While you'll see EUC-JP in Unix machines. I personally haven't dealt with anything other than Shift-JIS though. (Nor mainframes.)