二进制文件 VB.NET 中的 0x00

发布于 2024-08-03 06:43:21 字数 2212 浏览 3 评论 0原文

更新如下

我正在使用 VB.NET 中的 BinaryReader 读取二进制文件。 文件中每一行的结构是:

    "Category" = 1 byte
    "Code" = 1 byte
    "Text" = 60 Bytes

    Dim Category As Byte
    Dim Code As Byte
    Dim byText() As Byte
    Dim chText() As Char
    Dim br As New BinaryReader(fs)

    Category = br.ReadByte()
    Code = br.ReadByte()
    byText = br.ReadBytes(60)
    chText = encASCII.GetChars(byText)

问题是“文本”字段有一些用于填充的时髦字符。 大部分似乎都是 0x00 空字符。

  1. 有什么方法可以通过某种编码摆脱这些 0x00 字符吗?

  2. 否则,如何对 chText 数组进行替换以删除 0x00 字符? 我试图将生成的数据表序列化为 XML,但在这些不兼容的字符上失败。 我可以循环遍历数组,但是我不知道如何进行替换?

更新:

这就是我在下面的人/女孩的帮助下所处的位置。 第一个解决方案有效,但不如我希望的那么灵活,第二个解决方案对于一个用例失败,但更通用。

Ad 1) 我可以通过将字符串传递给该子例程来解决该问题

    Public Function StripBad(ByVal InString As String) As String
        Dim str As String = InString
        Dim sb As New System.Text.StringBuilder
        strNew = strNew.Replace(chBad, " ")
        For Each ch As Char In str

            If StrComp(ChrW(Val("&H25")), ch) >= 0 Then
                ch = " "
            End If
            sb.Append(ch)
        Next

        Return sb.ToString()
    End Function

Ad 2) 该例程确实删除了几个有问题的字符,但因 0x00 而失败。 这是改编自 MSDN,http://msdn.microsoft.com/en-我们/library/kdcak6ye.aspx

    Public Function StripBadwithConvert(ByVal InString As String) As String
        Dim unicodeString As String
        unicodeString = InString
        ' Create two different encodings.
        Dim ascii As Encoding = Encoding.ASCII
        Dim [unicode] As Encoding = Encoding.UTF8

        ' Convert the string into a byte[].
        Dim unicodeBytes As Byte() = [unicode].GetBytes(unicodeString)

        Dim asciiBytes As Byte() = Encoding.Convert([unicode], ascii, unicodeBytes)

        Dim asciiChars(ascii.GetCharCount(asciiBytes, 0, asciiBytes.Length) - 1) As Char
        ascii.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0)
        Dim asciiString As New String(asciiChars)

        Return asciiString
    End Function

UPDATED BELOW

I am reading a Binary file using BinaryReader in VB.NET.
The structure of each row in the file is:

    "Category" = 1 byte
    "Code" = 1 byte
    "Text" = 60 Bytes

    Dim Category As Byte
    Dim Code As Byte
    Dim byText() As Byte
    Dim chText() As Char
    Dim br As New BinaryReader(fs)

    Category = br.ReadByte()
    Code = br.ReadByte()
    byText = br.ReadBytes(60)
    chText = encASCII.GetChars(byText)

The problem is that the "Text" field has some funky characters used for padding.
Mostly seems to be 0x00 null characters.

  1. Is there any way to get rid of these 0x00 characters by some Encoding?

  2. Otherwise, how can I do a replace on the chText array to get rid of the 0x00 characters?
    I am trying to serialize the resulting datatable to XML and it is failing on these non compliant characters.
    I am able to loop through the array, however I can not figure out how to do the replace?

UPDATE:

This is where I am at with a lot of help from guys/gals below.
The first solutions works, however not as flexible as I hoped, the second one fails for one use case, however is much more generic.

Ad 1) I can solve the issue by passing the string to this subroutine

    Public Function StripBad(ByVal InString As String) As String
        Dim str As String = InString
        Dim sb As New System.Text.StringBuilder
        strNew = strNew.Replace(chBad, " ")
        For Each ch As Char In str

            If StrComp(ChrW(Val("&H25")), ch) >= 0 Then
                ch = " "
            End If
            sb.Append(ch)
        Next

        Return sb.ToString()
    End Function

Ad 2) This routine does takes out several offending characters, however fails for 0x00.
This was adapted from MSDN, http://msdn.microsoft.com/en-us/library/kdcak6ye.aspx.

    Public Function StripBadwithConvert(ByVal InString As String) As String
        Dim unicodeString As String
        unicodeString = InString
        ' Create two different encodings.
        Dim ascii As Encoding = Encoding.ASCII
        Dim [unicode] As Encoding = Encoding.UTF8

        ' Convert the string into a byte[].
        Dim unicodeBytes As Byte() = [unicode].GetBytes(unicodeString)

        Dim asciiBytes As Byte() = Encoding.Convert([unicode], ascii, unicodeBytes)

        Dim asciiChars(ascii.GetCharCount(asciiBytes, 0, asciiBytes.Length) - 1) As Char
        ascii.GetChars(asciiBytes, 0, asciiBytes.Length, asciiChars, 0)
        Dim asciiString As New String(asciiChars)

        Return asciiString
    End Function

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

是你 2024-08-10 06:43:21

首先你应该弄清楚文本的格式是什么,这样你就只是盲目地删除一些东西而不知道你击中了什么。

根据格式的不同,您可以使用不同的方法来删除字符。

仅删除零个字符:

Dim len As Integer = 0
For pos As Integer = 0 To byText.Length - 1
   If byText(pos) <> 0 Then
      byText(len) = byText(pos)
      len += 1
   End If
Next
strText = Encoding.ASCII.GetChars(byText, 0, len)

要删除从第一个零字符到数组末尾的所有内容:

Dim len As Integer
While len < byText.Length AndAlso byText(len) <> 0
   len += 1
End While
strText = Encoding.ASCII.GetChars(byText, 0, len)

编辑:
如果您只想保留恰好是 ASCII 字符的任何垃圾:

Dim len As Integer = 0
For pos As Integer = 0 To byText.Length - 1
   If byText(pos) >= 32 And byText(pos) <= 127 Then
      byText(len) = byText(pos)
      len += 1
   End If
Next
strText = Encoding.ASCII.GetChars(byText, 0, len)

First of all you should find out what the format for the text is, so that you are just blindly removing something without knowing what you hit.

Depending on the format, you use different methods to remove the characters.

To remove only the zero characters:

Dim len As Integer = 0
For pos As Integer = 0 To byText.Length - 1
   If byText(pos) <> 0 Then
      byText(len) = byText(pos)
      len += 1
   End If
Next
strText = Encoding.ASCII.GetChars(byText, 0, len)

To remove everything from the first zero character to the end of the array:

Dim len As Integer
While len < byText.Length AndAlso byText(len) <> 0
   len += 1
End While
strText = Encoding.ASCII.GetChars(byText, 0, len)

Edit:
If you just want to keep any junk that happens to be ASCII characters:

Dim len As Integer = 0
For pos As Integer = 0 To byText.Length - 1
   If byText(pos) >= 32 And byText(pos) <= 127 Then
      byText(len) = byText(pos)
      len += 1
   End If
Next
strText = Encoding.ASCII.GetChars(byText, 0, len)
滥情哥ㄟ 2024-08-10 06:43:21

如果空字符用作文本的右填充(即终止),这将是正常情况,这相当简单:

Dim strText As String = encASCII.GetString(byText)
Dim strlen As Integer = strText.IndexOf(Chr(0))
If strlen <> -1 Then
    strText = strText.Substr(0, strlen - 1)
End If

如果没有,您仍然可以执行正常的替换 在字符串上。如果您在将字节数组转换为字符串之前进行修剪,那么它会稍微“干净”一些。但原理仍然是一样的。

Dim strlen As Integer = Array.IndexOf(byText, 0)
If strlen = -1 Then
    strlen = byText.Length + 1
End If
Dim strText = encASCII.GetString(byText, 0, strlen - 1)

If the null characters are used as right padding (i.e. terminating) the text, which would be the normal case, this is fairly easy:

Dim strText As String = encASCII.GetString(byText)
Dim strlen As Integer = strText.IndexOf(Chr(0))
If strlen <> -1 Then
    strText = strText.Substr(0, strlen - 1)
End If

If not, you can still do a normal Replace on the string. It would be slightly “cleaner” if you did the pruning in the byte array, before converting it to a string. The principle remains the same, though.

Dim strlen As Integer = Array.IndexOf(byText, 0)
If strlen = -1 Then
    strlen = byText.Length + 1
End If
Dim strText = encASCII.GetString(byText, 0, strlen - 1)
三生一梦 2024-08-10 06:43:21

您可以使用结构来加载数据:

[System.Runtime.InteropServices.StructLayout(System.Runtime.InteropServices.LayoutKind.Explicit)]
internal struct TextFileRecord
{
    [System.Runtime.InteropServices.FieldOffset(0)]
    public byte Category;
    [System.Runtime.InteropServices.FieldOffset( 1 )]
    public byte Code;
    [System.Runtime.InteropServices.FieldOffset( 2 )]
    [System.Runtime.InteropServices.MarshalAs(System.Runtime.InteropServices.UnmanagedType.LPTStr, SizeConst=60)]
    public string Text;
}

您必须调整 UnmanagedType-Argument 以适应您的字符串编码。

You can use a struct to load the data:

[System.Runtime.InteropServices.StructLayout(System.Runtime.InteropServices.LayoutKind.Explicit)]
internal struct TextFileRecord
{
    [System.Runtime.InteropServices.FieldOffset(0)]
    public byte Category;
    [System.Runtime.InteropServices.FieldOffset( 1 )]
    public byte Code;
    [System.Runtime.InteropServices.FieldOffset( 2 )]
    [System.Runtime.InteropServices.MarshalAs(System.Runtime.InteropServices.UnmanagedType.LPTStr, SizeConst=60)]
    public string Text;
}

You have to adjust the UnmanagedType-Argument to fit with your string encoding.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文