Base64解码直到没有Base64

发布于 2024-09-29 03:12:15 字数 810 浏览 9 评论 0原文

所以我认为我的问题非常简单。我需要解码 Base64 直到没有 Base64，我用 RegEx 检查是否有 Base64，但我不知道如何解码直到没有 Base64。

在这段简短的代码中，我可以解码 Base64 直到没有 Base64，因为我的文本已定义。（直到 Base64 解码内容不是“Hello World”解码）

# Import Libraries
from base64 import *
import re

# Text & Base64 String
strText = "Hello World"
strEncode = "VmxSQ2ExWXlUWGxUYTJoUVVqSlNXRlJYY0hOT1ZteHlXa1pLVVZWWE9EbERaejA5Q2c9PQo=".encode("utf-8")

# Decode
objRgx = re.search('^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?$', strEncode.decode("utf-8"))

strDecode = b64decode(objRgx.group(0).encode("utf-8"))

print(strDecode.decode("utf-8"))

while strDecode != strText.encode("utf-8"):
    strDecode = b64decode(strDecode)

    print(strDecode.decode("utf-8"))

有谁知道如何解码 Base64 直到出现真正的文本（不再是 Base64）

PS 抱歉我的英语不好。

原文

So my problem is something very simple, i think. I need to Decode Base64 until there is no Base64, i check with an RegEx if there is some Base64 but i got no Idea how to decode until there is no Base64.

In this short Code i can Decode the Base64 until there is no Base64 because my Text is defined. (Until the Base64 Decode Stuff isn't "Hello World" decode)

# Import Libraries
from base64 import *
import re

# Text & Base64 String
strText = "Hello World"
strEncode = "VmxSQ2ExWXlUWGxUYTJoUVVqSlNXRlJYY0hOT1ZteHlXa1pLVVZWWE9EbERaejA5Q2c9PQo=".encode("utf-8")

# Decode
objRgx = re.search('^(?:[A-Za-z0-9+/]{4})*(?:[A-Za-z0-9+/]{2}==|[A-Za-z0-9+/]{3}=)?
Does anyone have an Idea how i can decode the Base64 until there is the real text (no  more base64)
P. S. sorry for my bad english.
, strEncode.decode("utf-8"))

strDecode = b64decode(objRgx.group(0).encode("utf-8"))

print(strDecode.decode("utf-8"))

while strDecode != strText.encode("utf-8"):
    strDecode = b64decode(strDecode)

    print(strDecode.decode("utf-8"))

Does anyone have an Idea how i can decode the Base64 until there is the real text (no more base64)

P. S. sorry for my bad english.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

三五鸿雁 2024-10-06 03:12:15

你不能，不是任意意义上的。问题很简单，正常的日常单词也可以是 BASE64。因此，没有真正的方法来区分两者之间的区别。

BASE64 除了长度之外没有终止符。它可以用 = 或 == 终止，但不必终止。 = 只是填充。不需要填充，那么就没有=。因此，BASE64 可能会结束并开始一些文本，而您却无法检测到它。

编辑“那么真的没有办法做我想做的事吗？”：

不，不确定，不可靠。即使使用启发式方法，也可能会出现失败的情况，并且最终会消耗太多字符，从而导致二进制块末尾出现垃圾，并在后续文本流中丢失字符。

现在这是针对任意 BASE64 块的。如果您知道二进制数据是什么，那么也许还有希望。

例如，如果您知道二进制数据是什么，则大多数二进制格式“知道”它们何时“完成”。我不知道有效的二进制格式是“读取直到到达 EOF”。它们通常带有“这是下一个块有多少数据”的内部描述符，或者带有“我完成了”的终止符。

在这些情况下，您可以将 BASE64 视为流。 BASE64 基本上非常简单。它需要 3 个字节并将其转换为 4 个字符。

因此，B64 流读取器只需读取 4 个字符并返回它们代表的 3 个字节。

例如，如果您有一个 PNG 阅读器，它可以开始读取转换后的流。当它“完成”时，它“关闭”流，并且您的原始文本是“在 BASE64 的末尾”。

如果您知道原始附件的大小，它也可以工作。如果有人发送“10,000 个字节”，那么您可以使用 BASE64 流解码器并从中读取“10,000”个字节。

通常，您的 BASE64 带有 = 或 == 终止符。在您不这样做的情况下，这是一个问题。解码后的流可以以任何方式工作。

如果您不知道附件的原始大小或编码的二进制文件的格式，那么您就运气不好了。

You can't, not in an arbitrary sense. The problem is simply that normal, every day words can ALSO be BASE64. So, there's no real way to tell the difference between the two.

BASE64 doesn't have a terminator other than length. It CAN be terminated with = or == but does not HAVE to be terminated. The = are just padding. No padding needed, then no =. So its possible that the BASE64 will end and some text will begin, without you being able to detect it.

Edit for "So there is really no way to do what i want?":

No, not deterministically, not reliably. Even with a heuristic, there will be potential cases where it fails and you will end up consuming too many characters, resulting in garbage at the end of your binary block, and lost of characters in the following text stream.

Now this is for an arbitrary BASE64 block. If you KNOW what the binary data is, then perhaps there's hope.

For example, if you KNOW what the binary data is, most binary formats "know" when they are "done". I don't know of a valid binary format that says "read until you reach EOF". They're typically laced with internal descriptors of "this is how much data the next chunk has" or with terminators saying "I'm done".

In these cases you can treat the BASE64 as a stream. BASE64 is basically pretty simple. It takes 3 bytes and converts them in to 4 characters.

So, a B64 stream reader needs to simply read 4 chars and return the 3 bytes they represent.

If you have, say, a PNG reader, it can start reading the converted stream. And when it is "done", it "closes" the stream, and your original text is "at the end of the BASE64".

It can also work if you know the size of the original attachment. If someone sent "10,000 bytes", then you use your BASE64 stream decoder and simply read "10,000" bytes from it.

More often than not, you will have BASE64 with a = or == terminator. It's the cases where you don't that it's a problem. The stream decoded works either way.

If you don't know the original size of the attachment, or the format of the encoded binary, then you're pretty much out of luck.

回复收藏 0 原文