将 EBCDIC Char 转换为十六进制值(AFP EBCDIC 数据)

发布于 07-16 20:07 字数 1038 浏览 13 评论 0原文

我正在处理一些 EBCDIC 数据,我需要解析这些数据并找到一些十六进制值。 我遇到的问题是,我似乎正在使用不正确的编码读取文件。 我可以看到我的记录以“!”开头(这是 EBCDIC 中的 x5A),但在转换为十六进制时,它返回为 x21 code>,这是“!”的 ASCII 值。

我希望框架中有一个内置方法,但恐怕我必须创建一个自定义类来正确映射 EBCDIC 字符集。

Using fileInStream As New FileStream(inputFile, FileMode.Open, FileAccess.Read)
   Using bufferedInStream As New BufferedStream(fileInStream)
      Using reader As New StreamReader(bufferedInStream, Encoding.GetEncoding(37))
         While Not reader.EndOfStream
            Do While reader.Peek() >= 0
               Dim charArray(52) As Char
               reader.Read(charArray, 0, charArray.Length)

               For Each letter As Char In charArray
                  Dim value As Integer = Convert.ToInt16(letter)

                  Dim hexOut As String = [String].Format("{0:x}", value)
                  Debug.WriteLine(hexOut)
               Next
            Loop
         End While
      End Using
   End Using
End Using

谢谢!

I working with some EBCDIC data that I need to parse and find some Hex values. The problem that I'm having is that it appears that I'm reading the file in with the incorrect encoding. I can see that my record begins with "!" (which is a x5A in EBCDIC) but when doing the conversion to hex it returns as a x21, which is the ASCII value for a "!".

I was hoping that there was a built-in method in the framework, but I'm afraid that I'm going to have to create a custom class to correctly map the EBCDIC character set.

Using fileInStream As New FileStream(inputFile, FileMode.Open, FileAccess.Read)
   Using bufferedInStream As New BufferedStream(fileInStream)
      Using reader As New StreamReader(bufferedInStream, Encoding.GetEncoding(37))
         While Not reader.EndOfStream
            Do While reader.Peek() >= 0
               Dim charArray(52) As Char
               reader.Read(charArray, 0, charArray.Length)

               For Each letter As Char In charArray
                  Dim value As Integer = Convert.ToInt16(letter)

                  Dim hexOut As String = [String].Format("{0:x}", value)
                  Debug.WriteLine(hexOut)
               Next
            Loop
         End While
      End Using
   End Using
End Using

Thanks!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

倾听心声的旋律2024-07-23 20:07:14

您可以这样做:

  1. 打开 AFP 文件。 读取前 9 个字节。
  2. 字节 0 应为 0xD3 或 0x5A。 字节 1 和字节 2 将是 SFI 的长度,包括您刚刚读取的 9 个字节中的 8 个。 由于是big endian,所以长度=byte1 * 256+byte2。
  3. 字节 3、4 和 5 是结构化字段标识符。 如果您正在寻找可打印文本,请查找 PTX(演示文本元素)0xD3 0xEE 0x9B。 如果没有找到,请跳过 length-8 并读取接下来的 9 个字节。
  4. 如果您确实找到了 PTX,请读取长度 8 字节。 解析控制序列以获取文本有点棘手。 第一个将从 0x2b 0xD3 开始,一个字节表示长度,另一个字节表示它是什么类型的控制序列。 如果该字节是奇数,则下一个控制序列将省略 0x2B 0xD3 标头,而是从长度字节开始。 这被称为“链接”,显然是为了让程序员疯狂地解析这些东西而引入的。
  5. 从长度字节 length-1 向前跳并按或仅查找下一个 0x2B 0xD3; 最后一个控制序列不会被链接,并且 PTX 末尾之后的所有内容都将是 EBCDIC。 使用 Jon Skeet 的库(谢谢 Jon)并寻找下一个 PTX 元素。

抱歉我啰嗦了。 这是可行的,但并不简单。

You can do it like this:

  1. Open the AFP file. Read the first 9 bytes.
  2. Byte 0 should be 0xD3 or 0x5A. Byte 1 and byte 2 will be the length of the SFI, including 8 of the 9 bytes you just read. It is big endian, so the length = byte1 * 256+byte2.
  3. Bytes 3, 4, and 5 is the Structured Field Identifier. If you're looking for printable text, look for PTX, (Presentation Text Element) 0xD3 0xEE 0x9B. Skip ahead length-8 and read the next 9 bytes if you didn't find it.
  4. If you did find a PTX, read length-8 bytes. Parsing through the control sequences to get to the text is a little tricky. The first will start with 0x2b 0xD3, a byte for the length, and byte for what kind of control sequence it is. If this byte is an odd number, the next control sequence will omit the 0x2B 0xD3 header, starting with the length byte instead. This is called "chaining" and was apparently introduced to drive programmers trying to parse this stuff insane.
  5. Skip ahead from the length byte length-1 and press on or just look for the next 0x2B 0xD3; the last control sequence will not be chained, and everything following to the end of the PTX will be EBCDIC. Use Jon Skeet's library (thanks, Jon) and look for the next PTX element.

Sorry I was long-winded. It is doable, but not simple.

人生戏2024-07-23 20:07:14

是的,当您以字符串形式读取文本数据时,它会在内部将其存储为 Unicode。 如果您关心二进制值(即原始字节),那么首先不要对其进行解码。

如果您确实需要使用自定义 EBCDIC 编码执行任何操作,可以使用我的开源 EBCDIC 实现 - 但我认为你真的只需要决定是否将其视为二进制数据或文本。

Yes, when you read the text data in as strings, it's storing it internally as Unicode. If you care about the binary values (i.e. the raw bytes) then don't decode it in the first place.

If you really need to do anything with a custom EBCDIC encoding, you can use my open source EBCDIC implementation - but I think you really just need to make up your mind as to whether you're treating this as binary data or text.

从﹋此江山别2024-07-23 20:07:14

以这种方式阅读法新社数据时要小心。 它在字节顺序和位顺序上都是大端序。 如果您将其视为二进制数据,例如解析文档中的结构化字段,则需要考虑这一点。

Be careful reading AFP data that way. It is big-endian in both byte and bit order. You will need to account for that if you are treating it as binary data, such as parsing through the Structured Fields in a document.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文