如何消除现场的垃圾价值?

发布于 2024-11-26 21:19:35 字数 1171 浏览 2 评论 0原文

我有一些长度可变的csv记录,例如:

0005464560,45667759,ZAMTR,!To ACC 12345678,DR,79.85

0006786565,34567899,ZAMTR,!To ACC 26575443,DR,1000

我需要分隔每个字段和我需要最后一个字段,它应该是钱。

然而,当我读取文件并将记录解串到字段中时,我发现最后一个字段在其末尾包含垃圾值。金额(金额)字段应为 8 个字符,前面 5 位数字,1 个点,后面 2 位数字。输入的值可以是任何值,例如 13.5、1000 和 354.23 。

    "FILE SECTION"

        FD INPUT_FILE.
            01 INPUT_REC                                   PIC X(66).

    "WORKING STORAGE SECTion"

            01 WS_INPUT_REC                                 PIC X(66).

            01 WS_AMOUNT_NUM                                PIC 9(5).9(2).
            01 WS_AMOUNT_TXT                                PIC X(8).

"MAIN SECTION"

                        UNSTRING INPUT_REC DELIMITED BY ","
                        INTO WS_ID_1, WS_ID_2, WS_CODE, WS_DESCRIPTION, WS_FLAG, WS_AMOUNT_TXT

                        MOVE WS_AMOUNT_TXT(1:8) TO WS_AMOUNT_NUM(1:8)

                        DISPLAY WS_AMOUNT_NUM

从显示来看,该值很正常:345.23、1000,就像什么一样,但是,当我将该字段写入文件后,它们变成了:

79.85^M^@^@ 137.35^M^@

我检查了来自 WS_AMOUNT_TXT 字段的 WS_AMOUNT_NUM 字段,发现 ^@ 是一种 LOW-VALUE。但是,我找不到^M是什么,它不是空格,也不是高值。

I have some csv record which are variable in length , for example:

0005464560,45667759,ZAMTR,!To ACC 12345678,DR,79.85

0006786565,34567899,ZAMTR,!To ACC 26575443,DR,1000

I need to seperate each of these fields and I need the last field which should be a money.

However, as I read the file, and unstring the record into fields, I found that the last field contain junk value at the end of itself. The amount(money) field should be 8 characters, 5 digit at the front, 1 dot, 2 digit at the end. The values from the input could be any value such as 13.5, 1000 and 354.23 .

    "FILE SECTION"

        FD INPUT_FILE.
            01 INPUT_REC                                   PIC X(66).

    "WORKING STORAGE SECTion"

            01 WS_INPUT_REC                                 PIC X(66).

            01 WS_AMOUNT_NUM                                PIC 9(5).9(2).
            01 WS_AMOUNT_TXT                                PIC X(8).

"MAIN SECTION"

                        UNSTRING INPUT_REC DELIMITED BY ","
                        INTO WS_ID_1, WS_ID_2, WS_CODE, WS_DESCRIPTION, WS_FLAG, WS_AMOUNT_TXT

                        MOVE WS_AMOUNT_TXT(1:8) TO WS_AMOUNT_NUM(1:8)

                        DISPLAY WS_AMOUNT_NUM

From the display, the value is rather normal: 345.23, 1000, just as what are, however, after I wrote the field into a file, here is what they become:

79.85^M^@^@
137.35^M^@

I have inspect the field WS_AMOUNT_NUM, which came from the field WS_AMOUNT_TXT, and found that ^@ is a kind of LOW-VALUE. However, I cannot find what is ^M, it is not a space, not a high-value.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

清晨说晚安 2024-12-03 21:19:36

我猜测,但看起来您可能正在将可变长度记录从文件读取为固定长度
COBOL 记录。垃圾
COBOL 记录的末尾给你带来了一些悲伤。很难说这些垃圾的持续性如何
从一次读取到下一次读取(技术上超出实际输入记录长度范围的数据
不明确的)。那个垃圾最终
包含在 UNSTRING 之后的 WS_AMOUNT_TXT

有多种方法可以解决此问题。我在这里给你的建议可能不会
是最佳的,但它很简单并且应该可以完成工作。

UNSTRING 语句中的最后一个 INTO 字段 WS_AMOUNT_TXT 是接收所有尾随的字段。
垃圾。这些垃圾需要被剥掉。知道最后一个字段中唯一有效的字符是
数字和小数字符,您可以按如下方式清理它:

PERFORM VARYING WS_I FROM LENGTH OF WS_AMOUNT_TXT BY -1
          UNTIL WS_I = ZERO
    IF WS_AMOUNT_TXT(WS_I:1) IS NUMERIC OR
       WS_AMOUNT_TXT(WS_I:1) = '.'
       MOVE ZERO TO WS_I
    ELSE
       MOVE SPACE TO WS_AMOUNT_TXT(WS_I:1)
    END-IF
END-PERFORM

上面代码的基本思想是从最后一个 UNSTRING 输出字段的末尾开始扫描
到开头用空格替换任何不是有效数字或小数点的内容。
一旦找到有效的数字/小数,就退出循环,假设其余的将
是有效的。

清理后,使用内部函数 NUMVAL,如我对 你的
上一个问题

将 WS_AMOUNT_TXT 转换为数字数据类型。

最后一条建议,在每次READ之前MOVE SPACES TO INPUT_REC以清除剩余的数据
来自可能留在缓冲区中的先前读取。这将在您阅读非常“短”的内容时保护您
在“长”数据之后进行记录 - 否则您可能会绊倒上一次读取留下的数据。

希望这有帮助。

编辑刚刚注意到这个答案 关于读取可变长度文件的问题。使用可变长度输入记录是更好的方法。鉴于
实际输入记录长度,您可以执行以下操作:

UNSTRING INPUT_REC(1:REC_LEN) INTO...

其中 REC_LEN 是在 INPUT_REC 文件 FD<OCCURS DEPENDING ON 之后指定的变量/代码>。您遇到的所有垃圾都发生在 REC_LEN 定义的记录末尾之后。使用如上所示的引用修改可以在 UNSTRING 完成分离各个数据字段的工作之前将其修剪掉。

编辑2:
无法将引用修改与 UNSTRING 一起使用。该死...其他一些 COBOL 方言是可能的,但 OpenVMS COBOL 则不行。请尝试以下操作:

MOVE INPUT_REC(1:REC_LEN) TO WS_BUFFER
UNSTRING WS_BUFFER INTO...

其中 WS_BUFFER 是一个工作存储 PIC X 变量,其长度足以保存最长的输入记录。当您将较短的字母数字字段移动到较长的字段时,目标字段将左对齐,并使用用于填充剩余空间的空格(即WS_BUFFER)。由于 NUMVAL 函数可以接受前导和尾随空格,因此您完全拥有了所需的内容。

我有理由推动你朝这个方向发展。读取短记录时,任何在记录缓冲区尾部结束的垃圾都是未定义的。其中一些垃圾可能最终会变成数字或小数点。如果发生这种情况,我最初建议的清理例程将会失败。

编辑3:
生成的 WS_AMOUNT_TXT 中没有 ^@,但仍然有 ^M

看起来文件系统正在处理(那个 ^M 的东西)在每个记录的末尾作为数据。

如果您正在读取的文件来自 Windows 平台,并且您现在
在 UNIX 平台上阅读它可以解释问题。 Windows下的记录
终止而在 UNIX 上,它们以终止。仅有的。这
UNIX 文件系统将视为就好像它是记录的一部分一样。

如果是这种情况,您可以非常确定会有一个。在
每条记录读取结束。有多种方法可以解决此问题:

方法 1: 正如您已经指出的,使用 Notepad++ 或其他某种方式预编辑文件
删除的工具在通过 COBOL 程序处理之前的字符。
就我个人而言,我认为这不是最好的解决办法。我更喜欢使用 COBOL
唯一的解决方案,因为它涉及较少的处理步骤。

方法 2: 在处理每个输入记录之前修剪它的最后一个字符。最后一个
字符应始终为。如果您尝试以下操作
正在以可变长度读取记录并具有可用的实际输入记录长度。

SUBTRACT 1 FROM REC_LEN
MOVE INPUT_REC(1:REC_LEN) TO WS_BUFFER
UNSTRING WS_BUFFER INTO...

方法3:治疗作为 UNSTRING 时的分隔符,如下所示:

UNSTRING INPUT_REC DELIMITED BY "," OR x"0D"
    INTO WS_ID_1, WS_ID_2, WS_CODE, WS_DESCRIPTION, WS_FLAG, WS_AMOUNT_TXT

方法 4: 通过替换尾部来调节 UNSTRING 的最后一个接收字段
带空格的非数字/非小数点字符。我在前面概述了这个解决方案
问题。您还可以使用 REPLACING 选项(格式 2)探索 INSPECT 语句。这应该能够做几乎相同的事情 - 只需将所有 x"00" 替换为 SPACE 并将 x"0D" 替换为 SPACE 即可。

有志者事竟成。上述任何解决方案都应该适合您。选择您最舒服的一个。

I am guessing, but it looks like you may be reading variable length records from a file into a fixed length
COBOL record. The junk
at the end of the COBOL record is giving you some grief. Hard to say how consistent that junk is going
to be from one read to the next (data beyond the bounds of actual input record length are technically
undefined). That junk ends up
being included in WS_AMOUNT_TXT after the UNSTRING

There are a number of ways to solve this problem. The suggestion I am giving you here may not
be optimal, but it is simple and should get the job done.

The last INTO field, WS_AMOUNT_TXT, in your UNSTRING statement is the one that receives all of the trailing
junk. That junk needs to be stripped off. Knowing that the only valid characters in the last field are
digits and the decimal character, you could clean it up as follows:

PERFORM VARYING WS_I FROM LENGTH OF WS_AMOUNT_TXT BY -1
          UNTIL WS_I = ZERO
    IF WS_AMOUNT_TXT(WS_I:1) IS NUMERIC OR
       WS_AMOUNT_TXT(WS_I:1) = '.'
       MOVE ZERO TO WS_I
    ELSE
       MOVE SPACE TO WS_AMOUNT_TXT(WS_I:1)
    END-IF
END-PERFORM

The basic idea in the above code is to scan from the end of the last UNSTRING output field
to the beginning replacing anything that is not a valid digit or decimal point with a space.
Once a valid digit/decimal is found, exit the loop on the assumption that the rest will
be valid.

After cleanup use the intrinsic function NUMVAL as outlined in my answer to your
previous question

to convert WS_AMOUNT_TXT into a numeric data type.

One final piece of advice, MOVE SPACES TO INPUT_REC before each READ to blow away data left over
from a previous read that might be left in the buffer. This will protect you when reading a very "short"
record after a "long" one - otherwise you may trip over data left over from the previous read.

Hope this helps.

EDIT Just noticed this answer to your question about reading variable length files. Using a variable length input record is a better approach. Given the
actual input record length you can do something like:

UNSTRING INPUT_REC(1:REC_LEN) INTO...

Where REC_LEN is the variable specified after OCCURS DEPENDING ON for the INPUT_REC file FD. All the junk you are encountering occurs after the end of the record as defined by REC_LEN. Using reference modification as illustrated above trims it off before UNSTRING does its work to separate out the individual data fields.

EDIT 2:
Cannot use reference modification with UNSTRING. Darn... It is possible with some other COBOL dialects but not with OpenVMS COBOL. Try the following:

MOVE INPUT_REC(1:REC_LEN) TO WS_BUFFER
UNSTRING WS_BUFFER INTO...

Where WS_BUFFER is a working storage PIC X variable long enough to hold the longest input record. When you MOVE a short alpha-numeric field to a longer one, the destination field is left justified with spaces used to pad remaining space (ie. WS_BUFFER). Since leading and trailing spaces are acceptable to the NUMVAL fucnction you have exactly what you need.

I have a reason for pushing you in this direction. Any junk that ends up at the trailing end of a record buffer when reading a short record is undefined. There is a possibility that some of that junk just might end up being a digit or a decimal point. Should this occur, the cleanup routine I originally suggested would fail.

EDIT 3:
There are no ^@ in the resulting WS_AMOUNT_TXT, but still there are a ^M

Looks like the file system is treating <CR> (that ^M thing) at the end of each record as data.

If the file you are reading came from a Windows platform and you are now
reading it on a UNIX platform that would explain the problem. Under Windows records
are terminated with <CR><LF> while on UNIX they are terminated with <LF> only. The
UNIX file system treats <CR> as if it were part of the record.

If this is the case, you can be pretty sure that there will be a single <CR> at the
end of every record read. There are a number of ways to deal with this:

Method 1: As you already noted, pre-edit the file using Notepad++ or some other
tool to remove the <CR> characters before processing through your COBOL program.
Personally I don't think this is the best way of going about it. I prefer to use a COBOL
only solution since it involves fewer processing steps.

Method 2: Trim the last character from each input record before processing it. The last
character should always be <CR>. Try the following if you
are reading records as variable length and have the actual input record length available.

SUBTRACT 1 FROM REC_LEN
MOVE INPUT_REC(1:REC_LEN) TO WS_BUFFER
UNSTRING WS_BUFFER INTO...

Method 3: Treat <CR> as a delimiter when UNSTRINGing as follows:

UNSTRING INPUT_REC DELIMITED BY "," OR x"0D"
    INTO WS_ID_1, WS_ID_2, WS_CODE, WS_DESCRIPTION, WS_FLAG, WS_AMOUNT_TXT

Method 4: Condition the last receiving field from UNSTRING by replacing trailing
non digit/non decimal point characters with spaces. I outlined this solution a litte earlier in this
question. You could also explore the INSPECT statement using the REPLACING option (Format 2). This should be able to do pretty much the same thing - just replace all x"00" by SPACE and x"0D" by SPACE.

Where there is a will, there is a way. Any of the above solutions should work for you. Choose the one you are most comfortable with.

長街聽風 2024-12-03 21:19:36

^M 是回车符。

Google Refine 对于纠正此数据有用吗?

^M is a carriage return.

Would Google Refine be useful for rectifying this data?

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文