如何在 R 中转换压缩十进制格式 (S370Fpd5)？

发布于 2025-01-14 20:02:46 字数 572 浏览 3 评论 0原文

压缩十进制格式 S370Fpd5 可以用 R 或 Python 转换吗？下面是 ASCII 转换后的实际输出、预期输出以及十六进制格式的示例。

ACT 输出	EXP 输出	十六进制
....@	647	00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0 40
.\177...	703048	00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 7f 03 9c f0
.....	859902	00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 97 df b0 8c

原文

Can the Packed Decimal Format S370Fpd5 be converted with R or Python? Below are examples with the actual output after ascii conversion, the expected ouptut and also in HEX format.

ACT OUTPUT	EXP OUTPUT	HEX
....@	647	00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 c0 40
.\177...	703048	00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 7f 03 9c f0
.....	859902	00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 97 df b0 8c

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

娜些时光，永不杰束 2025-01-21 20:02:46

那不是打包的十进制数据。

大型机数据通常在单个记录中包含文本和二进制数据，例如名称、货币金额和数量：

Hopper Grace ar% .

...这将是。 ..

x'C8969797859940404040C799818385404040404081996C004B'

...十六进制。这是代码页 37，通常称为 EBCDIC。

不知道姓氏仅限于前 10 个字节，名字仅限于随后的 10 个字节，货币金额在接下来的 3 个字节中是压缩十进制（也称为二进制编码十进制），而数量是在接下来的 3 个字节中。接下来的两个字节，您无法准确传输数据，因为代码页转换会破坏货币金额。转换为 Microsoft Windows 上常用的代码页 1250，最终会得到...

x'486F707065722020202047726163652020202020617225002E'

...其中文本数据被翻译，但打包数据被破坏。打包数据的最后一个半字节（最后一个字节的下半部分）不再有有效符号，货币金额本身和数量都发生了变化（由于代码页转换和损坏，从十进制 75 到十进制 11,776）大端数作为小端数）。

根据我的经验，避免这些困难的最佳方法是在大型机上预处理文件，将所有二进制和压缩十进制字段转换为嵌入显式符号和小数点的文本。然后文件可以安全地进行代码页（在本例中为 EBCDIC 到 ASCII）转换。

这种预处理可以使用大型机 SORT 实用程序轻松完成，该实用程序通常擅长数据转换。

这是来自更长的文章我写的关于在非大型机平台上读取大型机数据的文章。

可能有一个库可以将数据逐字段从源代码页转换到目标代码页。无论好坏，对场外资源的推荐请求都被认为是偏离主题的。您无法将记录包含打包十进制和/或二进制数据的整个文件从一个代码页转换为另一个代码页，否则至少会冒风险并可能导致数据损坏。

That's not packed decimal data.

It is common for mainframe data to include both text and binary data in a single record, for example a name, a currency amount, and a quantity:

Hopper Grace ar% .

...which would be...

x'C8969797859940404040C799818385404040404081996C004B'

...in hex. This is code page 37, commonly referred to as EBCDIC.

Without knowing that the family name is confined to the first 10 bytes, the given name confined the the subsequent 10 bytes, the currency amount is in packed decimal (also known as binary coded decimal) in the next 3 bytes, and the quantity in the next two bytes, you cannot accurately transfer the data because code page conversion will destroy the currency amount. Converting to code page 1250, commonly in use on Microsoft Windows, you would end up with...

x'486F707065722020202047726163652020202020617225002E'

...where the text data is translated but the packed data is destroyed. The packed data no longer has a valid sign in the last nibble (the lower half of the last byte), the currency amount itself has been changed as has the quantity (from decimal 75 to decimal 11,776 due to both code page conversion and mangling of a big endian number as a little endian number).

In my experience, the best way to avoid these difficulties is to preprocess the file on the mainframe, converting all binary and packed decimal fields to text with embedded explicit signs and decimal points. Then the file can safely go through code page (EBCDIC to ASCII in this case) conversion.

Such preprocessing can easily be done with the mainframe SORT utility, which typically excels at data transformations.

This is from a longer piece I wrote about reading mainframe data on non-mainframe platforms.

There's probably a library to convert the data field-by-field from the source code page to the target code page. For better or worse, requests for recommendations for off-site resources are considered off-topic. You cannot convert an entire file whose records contain packed decimal and/or binary data from one code page to another without at least risking and probably causing data corruption.

回复收藏 0 原文

~没有更多了~