C#:将 COMP-3 压缩十进制转换为人类可读的值
我有一系列来自大型机的 ASCII 平面文件,由 C# 应用程序处理。 引入了带有压缩十进制 (COMP-3) 字段的新提要,该字段需要转换为数值。
文件通过 FTP 使用 ASCII 传输模式进行传输。 我担心二进制字段可能包含被解释为非常低的 ASCII 代码或控制字符而不是值 - 或者更糟糕的是,可能会在 FTP 过程中丢失。
更重要的是,这些字段被读取为字符串。 我可能可以灵活地解决这部分(即某种流),但业务会给我带来阻力。
要求是“从十六进制转换为 ASCII”,但显然这并没有产生正确的值。 任何帮助,将不胜感激; 它不必是特定于语言的,只要您可以解释转换过程的逻辑即可。
I have a series of ASCII flat files coming in from a mainframe to be processed by a C# application. A new feed has been introduced with a Packed Decimal (COMP-3) field, which needs to be converted to a numerical value.
The files are being transferred via FTP, using ASCII transfer mode. I am concerned that the binary field may contain what will be interpreted as very-low ASCII codes or control characters instead of a value - Or worse, may be lost in the FTP process.
What's more, the fields are being read as strings. I may have the flexibility to work around this part (i.e. a stream of some sort), but the business will give me pushback.
The requirement read "Convert from HEX to ASCII", but clearly that didn't yield the correct values. Any help would be appreciated; it need not be language-specific as long as you can explain the logic of the conversion process.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(8)
我一直在观看许多论坛上关于将 Comp-3 BCD 数据从“遗留”大型机文件转换为可在 C# 中使用的内容的帖子。 首先,我想说的是,我对其中一些帖子收到的回复不太感兴趣,尤其是那些本质上说“你为什么用这些与 C#/C++ 无关的帖子来打扰我们”以及“如果您需要有关某种 COBOL 约定的答案,为什么不去访问面向 COBOL 的站点”。 对我来说,这完全是胡说八道,因为(不幸的是)软件开发人员在未来的许多年里都需要了解如何处理现实世界中存在的一些遗留问题。 因此,即使我因以下代码而在这篇文章中受到猛烈抨击,我也将与您分享我必须处理的有关 COMP-3/EBCDIC 转换的真实世界经验(是的,我就是那个谈论“的人”软盘、纸带、光盘包等... - 我自 1979 年以来一直是一名软件工程师”)。
首先 - 了解您从 IBM 等遗留大型机系统读取的任何文件都将以 EBCDIC 格式向您呈现数据,并且为了将任何该数据转换为您可以处理的 C#/C++ 字符串必须使用正确的代码页转换才能将数据转换为 ASCII 格式。 如何处理此问题的一个很好的示例是:
StreamReader readFile = new StreamReader(path, Encoding.GetEncoding(037); // 037 = EBCDIC 到 ASCII 转换。
这将确保您从此流中读取的任何内容都将被转换转换为 ASCII,并且可以以字符串格式使用。这包括 COBOL 声明的“分区十进制”(图 9)和“文本”(图 X)字段。但是,这不一定将 COMP-3 字段转换为正确的“读入 char[] 或 byte[] 数组时相当于“binary”。为此,这是正确翻译该代码页的唯一方法(即使使用 UTF-8、UTF-16、默认或其他)代码页,您将要像这样打开文件:
FileStream fileStream = new FileStream(path, FIleMode.Open, FIleAccess.Read, FileShare.Read);
,“FileShare.Read”选项是“可选”的。
当然 您已经隔离了要转换为十进制值的字段(如果需要,然后随后转换为 ASCII 字符串),您可以使用以下代码 - 这基本上是从 MicroSoft“UnpackDecimal”帖子中窃取的,您可以访问:
http:// www.microsoft.com/downloads/details.aspx?familyid=0e4bba52-cc52-4d89-8590-cda297ff7fbd&displaylang=en
我已经隔离了(我认为)这个逻辑中最重要的部分并将其整合起来一分为二的方法,你可以做你想做的事。 出于我的目的,我选择将其保留为返回 Decimal 值,然后我可以按照我想要的方式进行操作。 基本上,该方法称为“unpack”,并向其传递一个 byte[] 数组(不超过 12 个字节)和一个 int 形式的标度,即您希望在 Decimal 值中返回的小数位数。 我希望这对你和我一样有用。
如果您有任何问题,请将它们发布在这里 - 因为我怀疑我会像其他选择发布与今天的问题相关的问题的人一样受到“攻击”......
谢谢,
约翰——长老。
I have been watching the posts on numerous boards concerning converting Comp-3 BCD data from "legacy" mainframe files to something useable in C#. First, I would like to say that I am less than enamoured by the responses that some of these posts have received - especially those that have said essentially "why are you bothering us with these non-C#/C++ related posts" and also "If you need an answer about some sort of COBOL convention, why don't you go visit a COBOL oriented site". This, to me, is complete BS as there is going to be a need for probably many years to come, (unfortunately), for software developers to understand how to deal with some of these legacy issues that exist in THE REAL WORLD. So, even if I get slammed on this post for the following code, I am going to share with you a REAL WORLD experience that I had to deal with regarding COMP-3/EBCDIC conversion (and yes, I am he who talks of "floppy disks, paper-tape, Disc Packs etc... - I have been a software engineer since 1979").
First - understand that any file that you read from a legacy main-frame system like IBM is going to present the data to you in EBCDIC format and in order to convert any of that data to a C#/C++ string you can deal with you are going to have to use the proper code page translation to get the data into ASCII format. A good example of how to handle this would be:
StreamReader readFile = new StreamReader(path, Encoding.GetEncoding(037); // 037 = EBCDIC to ASCII translation.
This will ensure that anything that you read from this stream will then be converted to ASCII and can be used in a string format. This includes "Zoned Decimal" (Pic 9) and "Text" (Pic X) fields as declared by COBOL. However, this does not necessarily convert COMP-3 fields to the correct "binary" equivelant when read into a char[] or byte[] array. To do this, the only way that you are ever going to get this translated properly (even using UTF-8, UTF-16, Default or whatever) code pages, you are going to want to open the file like this:
FileStream fileStream = new FileStream(path, FIleMode.Open, FIleAccess.Read, FileShare.Read);
Of course, the "FileShare.Read" option is "optional".
When you have isolated the field that you want to convert to a decimal value (and then subsequently to an ASCII string if need be), you can use the following code - and this has been basically stolen from the MicroSoft "UnpackDecimal" posting that you can get at:
http://www.microsoft.com/downloads/details.aspx?familyid=0e4bba52-cc52-4d89-8590-cda297ff7fbd&displaylang=en
I have isolated (I think) what are the most important parts of this logic and consolidated it into two a method that you can do with what you want. For my purposes, I chose to leave this as returning a Decimal value which I could then do with what I wanted. Basically, the method is called "unpack" and you pass it a byte[] array (no longer than 12 bytes) and the scale as an int, which is the number of decimal places you want to have returned in the Decimal value. I hope this works for you as well as it did for me.
If you have any questions, post them on here - because I suspect that I am going to get "flamed" like everyone else who has chosen to post questions that are pertinent to todays issues...
Thanks,
John - The Elder.
首先,您必须消除由 ASCII 传输模式引起的行尾 (EOL) 转换问题。 当 BCD 值恰好对应于 EOL 字符时,您担心数据损坏是完全正确的。 这个问题最糟糕的方面是它很少发生且出乎意料。
最好的解决办法就是将传输模式改为BIN。 这是合适的,因为您正在传输的数据是二进制的。 如果无法使用正确的 FTP 传输模式,您可以在代码中消除 ASCII 模式损坏。 您所要做的就是将 \r\n 对转换回 \n。 如果我是你,我会确保这经过充分测试。
一旦解决了 EOL 问题,COMP-3 转换就非常简单了。 我在 MS 知识库中找到了这篇文章,其中包含 BASIC 示例代码。 请参阅下面的此代码的 VB.NET 端口。
由于您正在处理 COMP-3 值,因此您正在读取的文件格式几乎肯定具有固定的记录大小和固定的字段长度。 如果我是你,我会在你进一步讨论之前先了解文件格式规范。 您应该使用 BinaryReader 来处理此数据。 如果有人在这一点上反驳,我就会走开。 让他们找其他人来纵容他们的愚蠢吧。
下面是 BASIC 示例代码的 VB.NET 移植。 我还没有对此进行测试,因为我无权访问 COMP-3 文件。 如果这不起作用,我会参考原始的 MS 示例代码以获取指导,或参考此问题的其他答案中的参考资料。
First of all you must eliminate the end of line (EOL) translation problems that will be caused by ASCII transfer mode. You are absolutely right to be concerned about data corruption when the BCD values happen to correspond to EOL characters. The worst aspect of this problem is that it will occur rarely and unexpectedly.
The best solution is to change the transfer mode to BIN. This is appropriate since the data you are transferring is binary. If it is not possible to use the correct FTP transfer mode, you can undo the ASCII mode damage in code. All you have to do is convert \r\n pairs back to \n. If I were you I would make sure this is well tested.
Once you've dealt with the EOL problem, the COMP-3 conversion is pretty straigtforward. I was able to find this article in the MS knowledgebase with sample code in BASIC. See below for a VB.NET port of this code.
Since you're dealing with COMP-3 values, the file format you're reading almost surely has fixed record sizes with fixed field lengths. If I were you, I would get my hands of a file format specification before you go any further with this. You should be using a BinaryReader to work with this data. If someone is pushing back on this point, I would walk away. Let them find someone else to indulge their folly.
Here's a VB.NET port of the BASIC sample code. I haven't tested this because I don't have access to a COMP-3 file. If this doesn't work, I would refer back to the original MS sample code for guidance, or to references in the other answers to this question.
如果我在这里偏离基地,我很抱歉,但也许我粘贴在这里的代码示例可以帮助您。 这来自 VBRocks...
I apologize if I am way off base here, but perhaps this code sample I'll paste here could help you. This came from VBRocks...
如果原始数据采用 EBCDIC 格式,则您的 COMP-3 字段已出现乱码。 FTP 进程已将 COMP-3 字段中的字节值从 EBCDIC 转换为 ASCII,这不是您想要的。 要纠正此问题,您可以:
1) 使用 BINARY 模式进行传输,以便获得原始 EBCDIC 数据。 然后,将 COMP-3 字段转换为数字,并将记录上的任何其他 EBCDIC 文本转换为 ASCII。 压缩字段将每个数字存储在半字节中,下半字节作为符号(F 为正值,其他值(通常为 D 或 E)为负值)。 将 123.4 存储在 PIC 999.99 USAGE COMP-3 中将是 X'01234F'(三个字节),将 -123 存储在同一字段中将是 X'01230D'。
2) 让发送者将该字段转换为 USAGE IS DISPLAY SIGN IS LEADING(或 TRAILING)数字字段。 这会将数字存储为 EBCDIC 数字字符串,并将符号作为单独的负号 (-) 或空白字符。 所有数字和符号在 FTP 传输上都正确转换为相应的 ASCII。
If the original data was in EBCDIC your COMP-3 field has been garbled. The FTP process has done an EBCDIC to ASCII translation of the byte values in the COMP-3 field which isn't what you want. To correct this you can:
1) Use BINARY mode for the transfer so you get the raw EBCDIC data. Then you convert the COMP-3 field to a number and translate any other EBCDIC text on the record to ASCII. A packed field stores each digit in a half byte with the lower half byte as a sign (F is positive and other values, usually D or E, are negative). Storing 123.4 in a PIC 999.99 USAGE COMP-3 would be X'01234F' (three bytes) and -123 in the same field is X'01230D'.
2) Have the sender convert the field into a USAGE IS DISPLAY SIGN IS LEADING(or TRAILING) numeric field. This stores the number as a string of EBCDIC numeric digits with the sign as a separate negative(-) or blank character. All digits and the sign translate correctly to their ASCII equivalent on the FTP transfer.
EBCDIC 转换的一些有用链接:
转换表 - 对于检查压缩十进制字段中的某些值很有用:
http://www.simotime.com/asc2ebc1.htm
msdn 中的代码页列表:
http://msdn.microsoft.com/en-us /library/dd317756(VS.85).aspx
以及一段在 C# 中转换字节数组字段的代码:
Some useful links for EBCDIC translation:
Translation table - useful to do check some of the values in the packed decimal fields:
http://www.simotime.com/asc2ebc1.htm
List of code pages in msdn:
http://msdn.microsoft.com/en-us/library/dd317756(VS.85).aspx
And a piece of code to convert the byte array fields in C#:
EBCDIC 或 ASCII 中的打包字段相同。 不要对它们运行 EBCDIC 到 ASCII 的转换。 在 .Net 中,将它们转储到 byte[] 中。
您使用按位掩码和移位来打包/解包。
-- 但按位运算仅适用于 .Net 中的整数类型,因此您需要跳过一些环节!
优秀的 COBOL 或 C 艺术家可以为您指明正确的方向。
找到一位老家伙并支付你的会费(大约三杯啤酒就可以了)。
The packed fields are the same in EBCDIC or ASCII. Do not run the EBCDIC to ASCII conversion on them. In .Net dump them into a byte[].
You use bitwise masks and shifts to pack/unpack.
-- But bitwise ops only apply to integer types in .Net so you need to jump through some hoops!
A good COBOL or C artist can point you in the right direction.
Find one of the old guys and pay your dues (about three beers should do it).
“ASCII 传输类型”将文件作为常规文本文件传输。 因此,当我们以 ASCII 传输类型传输打包的十进制或二进制数据文件时,文件会损坏。 “二进制传输类型”将以二进制模式传输数据,将文件作为二进制数据而不是文本数据处理。 所以这里我们必须使用Binary传输类型。
参考:https://www.codeproject.com/Tips/673240/ EBCDIC-to-ASCII-Converter
文件准备好后,下面是将压缩十进制转换为人类可读十进制的代码。
The “ASCII transfer type” will transfer the files as regular text files. So files becoming corrupt when we transfer packed decimal or binary data files in ASCII transfer type. The “Binary transfer type” will transfer the data in binary mode which handles the files as binary data instead of text data. So we have to use Binary transfer type here.
Reference : https://www.codeproject.com/Tips/673240/EBCDIC-to-ASCII-Converter
Once your file is ready, here is the code to convert packed decimal to human readable decimal.
文件必须以二进制形式传输。 这是一种更短的方法:
Files must be transferred as binary. Here's a much shorter way to do it: