当前位置：文江博客话题详情

读取 Cobol 生成的文件

发布于 2024-10-15 17:21:18 字数 759 浏览 9 评论 0原文

我目前正在编写 ac# 应用程序，该应用程序位于两个现有应用程序之间。我对第二个应用程序的了解是它处理第一个应用程序生成的文件。第一个应用程序是用 Cobol 编写的。

步骤： 1) Cobol 应用程序，写入一些文件并复制到目录中。 2) 第二个应用程序拾取这些文件并处理它们。

我的 C# 应用程序位于 1) 和 2) 之间。它必须获取 1) 生成的文件，读取它，修改它并保存它，以便应用程序 2) 甚至不知道我去过那里。

我有一些问题。

首先，如果我在记事本中打开由 1) 生成的文件，其中大部分内容无法读取，而其他部分则无法读取。
如果我读取文件、修改它并保存，我必须使用 cobol 应用程序使用的相同符号保存文件，这样应用程序 2) 就不会知道我去过那里。

我尝试过以这种方式读取文件，但它仍然无法读取：

代码：

        string ss = @"filename";

        using (FileStream fs = new FileStream(ss, FileMode.Open))
        {
            StreamReader sr = new StreamReader(fs);
            string gg = sr.ReadToEnd();
        }

此外，如果我找到一种使其可读的方法（使用某种编码技术），我担心当我再次保存文件时，我可能会改变它的原始格式。

有什么想法吗？建议？

原文

I’m currently on the task of writing a c# application, which is going sit between two existing apps. All I know about the second application is that it processes files generated by the first one. The first application is written in Cobol.

Steps:
1) Cobol application, writes some files and copies to a directory.
2) The second application picks these files up and processes them.

My C# app would sit between 1) an 2). It would have to pick up the file generated by 1), read it, modify it and save it, so that application 2)
wouldn’t know I have even been there.

I have a few problems.

First of all if I open a file generated by 1) in notepad, most of it is unreadable while other parts are.
If I read the file, modify it and save, I must save the file with the same notation used by the cobol application, so that app 2), doesn´t know I´ve been there.

I´ve tried reading the file this way, but it´s still unreadable:

Code:

        string ss = @"filename";

        using (FileStream fs = new FileStream(ss, FileMode.Open))
        {
            StreamReader sr = new StreamReader(fs);
            string gg = sr.ReadToEnd();
        }

Also if I find a way of making it readable (using some sort of encoding technique), I´m afraid that when I save the file again, I may change it´s original format.

Any thoughts? Suggestions?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

○闲身 2024-10-22 17:21:19

要读取 COBOL 生成的文件，您需要知道：

首先，您需要该文件的记录布局（抄写簿）。 COBOL 记录布局将如下所示：

01  PATIENT-TREATMENTS.
    05  PATIENT-NAME                PIC X(30).
    05  PATIENT-SS-NUMBER           PIC 9(9).
    05  NUMBER-OF-TREATMENTS        PIC 99 COMP-3.
    05  TREATMENT-HISTORY OCCURS 0 TO 50 TIMES
           DEPENDING ON NUMBER-OF-TREATMENTS
           INDEXED BY TREATMENT-POINTER.
        10  TREATMENT-DATE.
            15  TREATMENT-DAY        PIC 99.
            15  TREATMENT-MONTH      PIC 99.
            15  TREATMENT-YEAR       PIC 9(4).
        10  TREATING-PHYSICIAN       PIC X(30).
        10  TREATMENT-CODE           PIC 99.

您还需要一份 IBM 的操作原理（S/360、S370、z/OS，对于我们的目的来说并不重要）。最新版本可从 IBM 获取，网址为

http://www-01.ibm.com /support/docview.wss?uid=isg2b9de5f05a9d57819852571c500428f9a（但您需要一个 IBM 帐户。
旧版本免费提供，网址为 http://www.hack.org/mc/texts/principles-of-operation.pdf

第 8 章（十进制指令）和第 9 章（浮点概述和支持指令）是我们感兴趣的部分，

如果没有这些

，您就需要了解 COBOL 数据类型，例如：

PIC 定义了字母数字格式字段（PIC 9(4)）。，例如 4 个十进制数字，如果缺少，可能会填充空格字符）。Pic 999V99 是 5 个十进制数字，带有隐含的小数点，依此类推，
BINARY [通常]是有符号的定点二进制。通常的大小是半字（2 个八位字节）和全字（4 个八位字节）。
COMP-1 是单精度浮点。
COMP-2 是双精度浮点。

如果数据源是 IBM 大型机，COMP-1 和 COMP-2 可能不是 IEE 浮点：它将是 IBM 的 base-16 超出 64 浮点格式。您需要诸如S/370 操作原理之类的东西来帮助您理解它。

COMP-3 是“压缩十进制”，具有不同的长度。压缩十进制是一种表示十进制数的紧凑方式。该声明将如下所示：PIC S9999V99 COMP-3。这表明它是有符号的，由 6 个十进制数字组成，并带有隐含的小数点。压缩十进制将每个十进制数字表示为八位字节的半字节（十六进制值 0-9）。高位数字是最左边八位位组的高半字节。最右边八位位组的低半字节是表示符号的十六进制值 AF。因此，上述 PIC 子句将需要 ceil( (6+1)/2 ) 或 4 个八位字节。由上述 PIC 子句表示的值 -345.67 将类似于 0x0034567D。实际符号值可能会有所不同（默认为 C/正、D/负，但 A、C、E 和 F 被视为正，而只有 B 和 D 被视为负）。再次，请参阅S\370 操作原理以了解有关表示的详细信息。

与 COMP-3 相关的是十进制分区。这可能被声明为“PIC S9999V99”（有符号，5 位十进制数字，带有隐含的小数点）。 EBCDIC 中的十进制数字是十六进制值 0xFO - 0xF9。 “Unpack”（大型机指令）采用打包的十进制字段并将其转换为字符字段。过程是：

从最右边的八位字节开始。将其反转，使符号半字节位于顶部，并将其放入目标字段最右边的八位字节中。
从右到左（源和目标），剥离压缩十进制字段的每个剩余半字节，并将其放入目标中下一个可用八位字节的低半字节。用十六进制 F 填充高半字节。
当源字段或目标字段耗尽时，操作结束。
如果目标字段未用尽，则通过用十进制“0”(oxF0) 填充剩余的八位字节来左填充零。

因此，我们的示例值 -345.67，如果使用默认符号值（十六进制 D）存储，将被解包为 0xF0F0F0F3F4F5F6D7（“0003456P”，在 EBDIC 中）。

[就这样吧。稍后有一个测验]

如果 COBOL 应用程序位于 IBM 大型机上，文件是否已从其本机 EBCDIC 转换为 ASCII？如果没有，您将必须自己进行映射（提示：它不一定像看起来那么简单，因为这可能是一个选择性过程 - 只有字符字段被转换（COMP-1、COMP-2、COMP 因为它们是二进制八位组的序列）。更糟糕的是，由于不同国家的实现以及不同打印机上使用的打印链不同，因此存在多种 EBCDIC 表示形式。

-3 和 BINARY 被排除在外，大型机硬件往往喜欢在半字、字或双字边界上对齐的不同内容，因此记录布局可能不会直接映射到文件中的八位字节，因为可能会在字段之间插入填充八位字节以保持所需的字对齐

。

To read the COBOL-genned file, you'll need to know:

First, you'll need the record layout (copybook) for the file. A COBOL record layout will look something like this:

01  PATIENT-TREATMENTS.
    05  PATIENT-NAME                PIC X(30).
    05  PATIENT-SS-NUMBER           PIC 9(9).
    05  NUMBER-OF-TREATMENTS        PIC 99 COMP-3.
    05  TREATMENT-HISTORY OCCURS 0 TO 50 TIMES
           DEPENDING ON NUMBER-OF-TREATMENTS
           INDEXED BY TREATMENT-POINTER.
        10  TREATMENT-DATE.
            15  TREATMENT-DAY        PIC 99.
            15  TREATMENT-MONTH      PIC 99.
            15  TREATMENT-YEAR       PIC 9(4).
        10  TREATING-PHYSICIAN       PIC X(30).
        10  TREATMENT-CODE           PIC 99.

You'll also need a copy of IBM's Principles of Operation (S/360, S370, z/OS, doesn't really matter for our purposes). Latest is available from IBM at

http://www-01.ibm.com/support/docview.wss?uid=isg2b9de5f05a9d57819852571c500428f9a (but you'll need an IBM account.
An older edition is available, gratis, at http://www.hack.org/mc/texts/principles-of-operation.pdf

Chapters 8 (Decimal Instructions) and 9 (Floating Point Overview and Support Instructions) are the interesting bits for our purposes.

Without that, you're pretty much lost.

Then, you need to understand COBOL data types. For instance:

PIC defines an alphameric formatted field (PIC 9(4), for example is 4 decimal digits, that might be filled with for space characters if missing). Pic 999V99 is 5 decimal digits, with an implied decimal point. So-on and so forthe.
BINARY is [usually] a signed fixed point binary integer. Usual sizes are halfword (2 octets) and fullword (4 octets).
COMP-1 is single precision floating point.
COMP-2 is double precision floating point.

If the datasource is an IBM mainframe, COMP-1 and COMP-2 likely won't be IEE floating point: it will be IBM's base-16 excess 64 floating point format. You'll need something like the S/370 Principles of Operation to help you understand it.

COMP-3 is 'packed decimal', of varying lengths. Packed decimal is a compact way of representing a decimal number. The declaration will look something like this: PIC S9999V99 COMP-3. This says that is it signed, consists of 6 decimal digits with an implied decimal point. Packed decimal represents each decimal digit as a nibble of an octet (hex values 0-9). The high-order digit is the upper nibble of the leftmost octet. The low nibble of the rightmost octet is a hex value A-F representing the sign. So the above PIC clause will require ceil( (6+1)/2 ) or 4 octets. the value -345.67, as represented by the above PIC clause will look like 0x0034567D. The actual sign value may vary (the default is C/positive, D/negative, but A, C, E and F are treated as positive, while only B and D are treated as negative). Again, see the S\370 Principles of Operation for details on the representation.

Related to COMP-3 is zoned decimal. This might be declared as `PIC S9999V99' (signed, 5 decimal digits, with an implied decimal point). Decimal digits, in EBCDIC, are the hex values 0xFO - 0xF9. 'Unpack' (mainframe machine instruction) takes a packed decimal field and turns in into a character field. The process is:

start with the rightmost octet. Invert it, so the sign nibble is on top and place it into the rightmost octet of the destination field.
Working from right to left (source and the target both), strip off each remaining nibble of the packed decimal field, and place it into the low nibble of the next available octet in the destination. Fill the high nibble with a hex F.
The operation ends when either the source or destination field is exhausted.
If the destination field is not exhausted, if it left-padded with zeroes by filling the remaining octets with decimal '0' (oxF0).

So our example value, -345.67, if stored with the default sign value (hex D), would get unpacked as 0xF0F0F0F3F4F5F6D7 ('0003456P', in EBDIC).

[There you go. There's a quiz later]

If the COBOL app lives on an IBM mainframe, has the file been converted from its native EBCDIC to ASCII? If not, you'll have to do the mapping your self (Hint: its not necessarily as straightforward as that might seem, since this might be a selective process -- only character fields get converted (COMP-1, COMP-2, COMP-3 and BINARY get excluded since they are a sequence of binary octets). Worse, there are multiple flavors of EBCDIC representations, due to the varying national implementations and varying print chains in use on different printers.

Oh...one last thing. The mainframe hardware tends to like different things aligned on halfword, word or doubleword boundaries, so the record layout may not map directly to the octets in the file as there may be padding octets inserted between fields to maintain the needed word alignment.

Good Luck.

回复收藏 0 原文

千紇 2024-10-22 17:21:19

我从您问题所附的评论中看到，您正在处理“经典”COBOL 批处理文件结构：标题记录、详细记录和尾部记录。

如果您负责创建预告片记录，这可能是个坏消息！典型的“尾部”记录用于识别文件结尾并提供控制信息，例如其之前的记录数量以及“详细”记录的各种校验和和/或总计。换句话说，您可能需要阅读并总结整个文件才能创建预告片。除此之外，文件中的大部分数据可能是压缩十进制、分区十进制或其他 COBOLish 数字数据类型，您可能会遇到困难。

您可能想问为什么要将预告片记录添加到这些文件中。通常，“预告片”是由创建“详细”记录的同一程序或应用程序生成的。预告片应该充当发送应用程序/程序写入其应写入的所有数据的验证。接收应用程序使用汇总总数、计数等来验证详细记录是否与前面的详细信息相符。这应该作为另一种验证，证明发送应用程序没有混淆数据或者数据没有在途中损坏（不，这不是一个笑话 - 但也许应该是）。当“中间人”创建预告片时，它就违背了练习的整个目的（无论它一开始可能有多么缺陷）。

回复收藏 0 原文

左耳近心 2024-10-22 17:21:19

了解您正在处理哪种 Cobol 方言会很有用，因为有
没有单一的 Cobol 格式。一些 Cobol 编译器 (Micro Focus) 在文件前面放置了“文件描述”（对于 Micro Focus VB / 索引文件）。
看看 RecordEditor (http://record-editor.sourceforge.net/ ）。它有一个文件向导，可能对您非常有用。
- 在文件向导中将文件设置为固定宽度文件（在 Cobol 中最常见）。该程序可让您尝试不同的记录长度。当您获得正确的记录长度时，文本字段应该对齐。
- 向导的后面有字段搜索，可以查找二进制、Comp-3、文本字段。
- 这里有一些关于使用记录编辑器向导处理未知文件的注释
  http://record-editor.sourceforge.net/Unkown.htm
除非该文件来自大型机/AS400，不太可能使用 EBCDIC（cp037 - 编码第 37 页是 US EBCDIC），任何文本很可能采用 Ascii。
该文件可能包含压缩十进制 (Comp3) 和二进制整数数据。大多数 Cobols
即使在 Intel（小端硬件）上也使用 Big-Endian（对于 Comp 整数）。
关于 Cobol PIC s9(6)V99 comp 需要记住的一件事是存储为二进制整数，其中 x'0001' 代表 0.01。因此，除非您有 Cobol 定义，否则您无法判断二进制 1 是否为 1 0.1、0.01 等