将 pandas 数据帧中的十六进制字符串拆分为 4 个字节

发布于 2025-01-11 06:49:38 字数 714 浏览 0 评论 0原文

我有一个像这样的数据框：

idx  cola  colb  hexstring
0      2     a   2000001443b660280c25800380241c0000102000120000000000000000000003010
1      3     b   80b7d0082b7d0082b7d00821d640000102000
2      5     a   ffffffff34140038030000014
...

我想用 4 个字节分割十六进制字符串列，并用 0 填充其余部分，这样，

idx  cola  colb  hexstring
0      2     a   00003010
0      2     a   00000000
0      2     a   00000000
0      2     a   02000120
0      2     a   41c00001
0      2     a   58003802
0      2     a   660280c2
0      2     a   0001443b
0      2     a   00000200
1      3     b   00102000
1      3     b   821d6400
1      3     b   0082b7d0
1      3     b   082b7d00
...

有没有办法做到这一点？

原文

I have a dataframe like this:

idx  cola  colb  hexstring
0      2     a   2000001443b660280c25800380241c0000102000120000000000000000000003010
1      3     b   80b7d0082b7d0082b7d00821d640000102000
2      5     a   ffffffff34140038030000014
...

And I want to split the hexstring column with 4 byte, and fill with the rest with 0, such that,

idx  cola  colb  hexstring
0      2     a   00003010
0      2     a   00000000
0      2     a   00000000
0      2     a   02000120
0      2     a   41c00001
0      2     a   58003802
0      2     a   660280c2
0      2     a   0001443b
0      2     a   00000200
1      3     b   00102000
1      3     b   821d6400
1      3     b   0082b7d0
1      3     b   082b7d00
...

Is there a way to do this?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

穿越时光隧道 2025-01-18 06:49:39

我不是特别了解字节等，但在这里我的解决方案将带您到达目的地。据我了解，您想将字符串从后面分成 8 个字符的片段。但如果初始字符不够 8 个，则需要在前面添加零。

我的解决方案是先在前面添加零，使字符串变成八的倍数。

import pandas as pd
import math

df= pd.DataFrame({"idx  ":[0,1,2],
                  "cola" : [2,3,5],
                  "colb" : ["a","b","a"],
                  "hexstring": ["2000001443b660280c25800380241c0000102000120000000000000000000003010", "80b7d0082b7d0082b7d00821d640000102000", "ffffffff34140038030000014"]})

df["hexstring"]
0    2000001443b660280c25800380241c0000102000120000...
1                80b7d0082b7d0082b7d00821d640000102000
2                            ffffffff34140038030000014
Name: hexstring, dtype: object

df["hexstring"] = df["hexstring"].apply(lambda x: x.zfill((math.ceil(len(x)/8))*8))
0    000002000001443b660280c25800380241c00001020001...
1             00080b7d0082b7d0082b7d00821d640000102000
2                     0000000ffffffff34140038030000014
Name: hexstring, dtype: object

zfill 在字符串前面添加指定数量的零。对于金额，我通过将字符串的长度除以 8 并获取最接近的较大整数来计算。现在你已经有了 8 次的所有字符串。

df["hexstring"] = df["hexstring"].apply(lambda x: [x[i:i+8] for i in range(0, len(x), 8)])

df["hexstring"]
0    [00000200, 0001443b, 660280c2, 58003802, 41c00...
1    [00080b7d, 0082b7d0, 082b7d00, 821d6400, 00102...
2             [0000000f, fffffff3, 41400380, 30000014]
Name: hexstring, dtype: object

df = df.explode('hexstring')
   idx    cola colb hexstring
0      0     2    a  00000200
0      0     2    a  0001443b
0      0     2    a  660280c2
0      0     2    a  58003802
0      0     2    a  41c00001
0      0     2    a  02000120
0      0     2    a  00000000
0      0     2    a  00000000
0      0     2    a  00003010
1      1     3    b  00080b7d
1      1     3    b  0082b7d0
1      1     3    b  082b7d00
1      1     3    b  821d6400
1      1     3    b  00102000
2      2     5    a  0000000f
2      2     5    a  fffffff3
2      2     5    a  41400380
2      2     5    a  30000014

然后我将字符串分成 8 个一组并将其保存在一个列表中。之后，您可以分解列表，以便可以将每个组放在单独的行中。
当然，这里的结果与您想要的相反，因为列表从一开始就爆炸了。如果您更喜欢精确的目标，则可以在分解列表之前反转列表。

df["hexstring"] = df["hexstring"].apply(lambda x: [x[i:i+8] for i in range(0, len(x), 8)][::-1]) #[::-1] reverse the list
0    [00003010, 00000000, 00000000, 02000120, 41c00...
1    [00102000, 821d6400, 082b7d00, 0082b7d0, 00080...
2             [30000014, 41400380, fffffff3, 0000000f]

df = df.explode('hexstring')
   idx    cola colb hexstring
0      0     2    a  00003010
0      0     2    a  00000000
0      0     2    a  00000000
0      0     2    a  02000120
0      0     2    a  41c00001
0      0     2    a  58003802
0      0     2    a  660280c2
0      0     2    a  0001443b
0      0     2    a  00000200
1      1     3    b  00102000
1      1     3    b  821d6400
1      1     3    b  082b7d00
1      1     3    b  0082b7d0
1      1     3    b  00080b7d
2      2     5    a  30000014
2      2     5    a  41400380
2      2     5    a  fffffff3
2      2     5    a  0000000f

I am not particularly aware of bytes and such, but here my solution will get you to your destination. From what I understand is you want to split your string into pieces of 8 characters from the back. But if the initial chracters will not have 8 enough, you want to add zeros in front.

My solution here make your strings into times of eight first by adding zeros in front.

import pandas as pd
import math

df= pd.DataFrame({"idx  ":[0,1,2],
                  "cola" : [2,3,5],
                  "colb" : ["a","b","a"],
                  "hexstring": ["2000001443b660280c25800380241c0000102000120000000000000000000003010", "80b7d0082b7d0082b7d00821d640000102000", "ffffffff34140038030000014"]})

df["hexstring"]
0    2000001443b660280c25800380241c0000102000120000...
1                80b7d0082b7d0082b7d00821d640000102000
2                            ffffffff34140038030000014
Name: hexstring, dtype: object

df["hexstring"] = df["hexstring"].apply(lambda x: x.zfill((math.ceil(len(x)/8))*8))
0    000002000001443b660280c25800380241c00001020001...
1             00080b7d0082b7d0082b7d00821d640000102000
2                     0000000ffffffff34140038030000014
Name: hexstring, dtype: object

zfill adds zeros in front of the string for the specified amount. For the amount, I calculate by dividing the length of the string wih 8 and getting the nearest higher integer. Now you have all strings in times of 8.

df["hexstring"] = df["hexstring"].apply(lambda x: [x[i:i+8] for i in range(0, len(x), 8)])

df["hexstring"]
0    [00000200, 0001443b, 660280c2, 58003802, 41c00...
1    [00080b7d, 0082b7d0, 082b7d00, 821d6400, 00102...
2             [0000000f, fffffff3, 41400380, 30000014]
Name: hexstring, dtype: object

df = df.explode('hexstring')
   idx    cola colb hexstring
0      0     2    a  00000200
0      0     2    a  0001443b
0      0     2    a  660280c2
0      0     2    a  58003802
0      0     2    a  41c00001
0      0     2    a  02000120
0      0     2    a  00000000
0      0     2    a  00000000
0      0     2    a  00003010
1      1     3    b  00080b7d
1      1     3    b  0082b7d0
1      1     3    b  082b7d00
1      1     3    b  821d6400
1      1     3    b  00102000
2      2     5    a  0000000f
2      2     5    a  fffffff3
2      2     5    a  41400380
2      2     5    a  30000014

Then I split the string into groups of eights and keep it in a list. Afterwards you can just explode the lists so that you can get each groups in seperate rows.
Of course, here the results are the reverse of what you want because the lists explode from the beginning. If you prefer your exact target, you can reverse the lists before exploding them.

df["hexstring"] = df["hexstring"].apply(lambda x: [x[i:i+8] for i in range(0, len(x), 8)][::-1]) #[::-1] reverse the list
0    [00003010, 00000000, 00000000, 02000120, 41c00...
1    [00102000, 821d6400, 082b7d00, 0082b7d0, 00080...
2             [30000014, 41400380, fffffff3, 0000000f]

df = df.explode('hexstring')
   idx    cola colb hexstring
0      0     2    a  00003010
0      0     2    a  00000000
0      0     2    a  00000000
0      0     2    a  02000120
0      0     2    a  41c00001
0      0     2    a  58003802
0      0     2    a  660280c2
0      0     2    a  0001443b
0      0     2    a  00000200
1      1     3    b  00102000
1      1     3    b  821d6400
1      1     3    b  082b7d00
1      1     3    b  0082b7d0
1      1     3    b  00080b7d
2      2     5    a  30000014
2      2     5    a  41400380
2      2     5    a  fffffff3
2      2     5    a  0000000f

回复收藏 0 原文

~没有更多了~