与“随机”混合的base64编码数据十六进制
我得到一个输入字符串,其中包含一些经过 Base64 编码的数据。不幸的是,它得到了随机的十六进制数据(全部小写)混合它。手动排序相当简单,因为十六进制数据似乎都是 32 字节的段。例如,我可以格式化一个示例字符串,如下所示:
6dd11d15c419ac219901f14bdd999f38 0ad94e978ad624d15189f5230e5435a9 2dc19fe95e583e7d593dd52ae7e68a6e 465ffa6074a371a8958dad3ad271181a 23310939b981b4e56f2ecee26f82ec60 fe04bef49be47603d1278cc80673b226 VGhpcyBpcyBzb 6dd11d15c419ac219901f14bdd999f38 0ad94e978ad624d15189f5230e5435a9 2dc19fe95e583e7d593dd52ae7e68a6e 465ffa6074a371a8958dad3ad271181a 23310939b981b4e56f2ecee26f82ec60 fe04bef49be47603d1278cc80673b226 6dd11d15c419ac219901f14bdd999f38 0ad94e978ad624d15189f5230e5435a9 2dc19fe95e583e7d593dd52ae7e68a6e 465ffa6074a371a8958dad3ad271181a 23310939b981b4e56f2ecee26f82ec60 fe04bef49be47603d1278cc80673b226 21lIGJhc2UtNjQ bb4af7e61760735ba17c29e8f542a668 75da91e90863f1ddb7e149297fc59afc f5de951fb65d06d2927aab7b9b54830e 2d935616a54c381c2f38db3731d5a378 gZW5jb2RlZCB 6dd11d15c419ac219901f14bdd999f38 0ad94e978ad624d15189f5230e5435a9 2dc19fe95e583e7d593dd52ae7e68a6e 465ffa6074a371a8958dad3ad271181a 23310939b981b4e56f2ecee26f82ec60 fe04bef49be47603d1278cc80673b226 kYXRhIGhvb3JheSE=
基本上,我需要获取 Base64 内容并对其进行解码(在 PHP 中)。问题是,我将所有内容都作为一个长字符串来获取,并且并不总是立即显而易见在哪里放置换行符。例如,base64 内容的第一位以“b”结尾,很容易被误认为是某些十六进制数据。我对如何做到这一点感到不知所措......有什么想法吗?
谢谢!
-马拉
I get an input string with some data that's base64 encoded. Unfortunately, it gets random hexadecimal data (all lowercase) mixed it. It's fairly straightforward to sort out by hand because the hexadecimal data all seems to be in segments of 32 bytes. For example, I can format an example string like this:
6dd11d15c419ac219901f14bdd999f38 0ad94e978ad624d15189f5230e5435a9 2dc19fe95e583e7d593dd52ae7e68a6e 465ffa6074a371a8958dad3ad271181a 23310939b981b4e56f2ecee26f82ec60 fe04bef49be47603d1278cc80673b226 VGhpcyBpcyBzb 6dd11d15c419ac219901f14bdd999f38 0ad94e978ad624d15189f5230e5435a9 2dc19fe95e583e7d593dd52ae7e68a6e 465ffa6074a371a8958dad3ad271181a 23310939b981b4e56f2ecee26f82ec60 fe04bef49be47603d1278cc80673b226 6dd11d15c419ac219901f14bdd999f38 0ad94e978ad624d15189f5230e5435a9 2dc19fe95e583e7d593dd52ae7e68a6e 465ffa6074a371a8958dad3ad271181a 23310939b981b4e56f2ecee26f82ec60 fe04bef49be47603d1278cc80673b226 21lIGJhc2UtNjQ bb4af7e61760735ba17c29e8f542a668 75da91e90863f1ddb7e149297fc59afc f5de951fb65d06d2927aab7b9b54830e 2d935616a54c381c2f38db3731d5a378 gZW5jb2RlZCB 6dd11d15c419ac219901f14bdd999f38 0ad94e978ad624d15189f5230e5435a9 2dc19fe95e583e7d593dd52ae7e68a6e 465ffa6074a371a8958dad3ad271181a 23310939b981b4e56f2ecee26f82ec60 fe04bef49be47603d1278cc80673b226 kYXRhIGhvb3JheSE=
Basically, I need to get the base64 stuff out and decode it (in PHP). The catch is that I get it all as one long string and it's not always immediately obvious where to put the linebreaks. For example, the first bit of base64 stuff ends in 'b', easily mistaken for some of the hex data. I'm at something of a loss for how to do this... Any ideas?
Thanks!
-mala
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
我认为这是一个无法回答的问题——完全有可能有 32 字节的 Base64 编码数据无法与 32 字节的随机十六进制区分开。如果没有有关流的更多信息,就不可能决定这些数据可能进入哪个存储桶。
I think this is an unanswerable problem -- it is entirely possible to have 32 bytes worth of base64-encoded data that cannot be differentiated from 32 bytes of random hex. Without more information about the stream it would be impossible to make a decision as to which bucket such data might go.
您可以这样做:
当然,存在尾随 az/0-9 的问题,但这是一个起点。
您可以添加一些代码,其中从可疑的 base64 末尾计数到下一个 [g-zA-Z] 的开头,并查看该字符数是否能被 32 整除。如果是,那么您可能找到了所有你原来的base64。如果不是,您将不知道 'b' 是 b64 的结尾还是十六进制的开头,而 6 是十六进制的结尾还是 NEXT b64 的开头。
简而言之,这很愚蠢,让我很难过。
You could do it like:
Of course, there's the issue of the trailing a-z/0-9, but it's a starting point.
You could add some code in which counts from the end of your suspected base64 to the beginning of the next [g-zA-Z] and see if that number of characters is divisible by 32. If it is, then you probably found all of your original base64. If not, you won't have a clue if 'b' is the end of your b64, or the beginning of your hex, and 6 is the end of your hex, or beginning of your NEXT b64.
In short, this is stupid and it makes me sad.
Base64 解码到每个决策点(接下来的 32 字节 Base64 或十六进制)可能会携带线索。
还有极可能将这些十六进制字符串之一解释为 base64总是,无论正在解码的内容如何,都会产生容易检测到的垃圾。
否则你就不走运了。
There is the possibility that base64 decoding up to each decision point (next 32 bytes base64 or hex) might carry the clue.
There's also the most minute chance that interpreting one of those hex strings as base64 always yields easily detected garbage for whatever is being decoded.
Otherwise you're out of luck.