Python 未知模式查找
好吧,基本上我想要的是通过重用代码来压缩文件,然后在运行时替换丢失的代码。我想出的方法确实很丑陋而且很慢,但至少它有效。问题是该文件没有特定的结构,例如“aGVsbG8=\n”,正如您所看到的,它是 base64 编码。我的函数非常慢,因为文件的长度超过 1700,并且它当时检查 1 个字符的模式。请帮助我编写更好的新代码,或者至少帮助我优化我得到的代码:)。任何有帮助的都欢迎!顺便说一句,我已经尝试过压缩库,但它们的压缩效果不如我丑陋的函数。
def c_long(inp, cap=False, b=5):
import re,string
if cap is False: cap = len(inp)
es = re.escape; le=len; ref = re.findall; ran = range; fi = string.find
c = b;inpc = inp;pattern = inpc[:b]; l=[]
rep = string.replace; ins = list.insert
while True:
if c == le(inpc) and le(inpc) > b+1: c = b; inpc = inpc[1:]; pattern = inpc[:b]
elif le(inpc) <= b: break
if c == cap: c = b; inpc = inpc[1:]; pattern = inpc[:b]
p = ref(es(pattern),inp)
pattern += inpc[c]
if le(p) > 1 and le(pattern) >= b+1:
if l == []: l = [[pattern,le(p)+le(pattern)]]
elif le(ref(es(inpc[:c+2]),inp))+le(inpc[:c+2]) < le(p)+le(pattern):
x = [pattern,le(p)+le(inpc[:c+1])]
for i in ran(le(l)):
if x[1] >= l[i][1] and x[0][:-1] not in l[i][0]: ins(l,i,x); break
elif x[1] >= l[i][1] and x[0][:-1] in l[i][0]: l[i] = x; break
inpc = inpc[:fi(inpc,x[0])] + inpc[le(x[0]):]
pattern = inpc[:b]
c = b-1
c += 1
d = {}; c = 0
s = ran(le(l))
for x in l: inp = rep(inp,x[0],'{%d}' % s[c]); d[str(s[c])] = x[0]; c += 1
return [inp,d]
def decompress(inp,l): return apply(inp.format, [l[str(x)] for x in sorted([int(x) for x in l.keys()])])
Okay, basically what I want is to compress a file by reusing code and then at runtime replace missing code. What I've come up with is really ugly and slow, at least it works. The problem is that the file has no specific structure, for example 'aGVsbG8=\n', as you can see it's base64 encoding. My function is really slow because the length of the file is 1700+ and it checks for patterns 1 character at the time. Please help me with new better code or at least help me with optimizing what I got :). Anything that helps is welcome! BTW i have already tried compression libraries but they didn't compress as good as my ugly function.
def c_long(inp, cap=False, b=5):
import re,string
if cap is False: cap = len(inp)
es = re.escape; le=len; ref = re.findall; ran = range; fi = string.find
c = b;inpc = inp;pattern = inpc[:b]; l=[]
rep = string.replace; ins = list.insert
while True:
if c == le(inpc) and le(inpc) > b+1: c = b; inpc = inpc[1:]; pattern = inpc[:b]
elif le(inpc) <= b: break
if c == cap: c = b; inpc = inpc[1:]; pattern = inpc[:b]
p = ref(es(pattern),inp)
pattern += inpc[c]
if le(p) > 1 and le(pattern) >= b+1:
if l == []: l = [[pattern,le(p)+le(pattern)]]
elif le(ref(es(inpc[:c+2]),inp))+le(inpc[:c+2]) < le(p)+le(pattern):
x = [pattern,le(p)+le(inpc[:c+1])]
for i in ran(le(l)):
if x[1] >= l[i][1] and x[0][:-1] not in l[i][0]: ins(l,i,x); break
elif x[1] >= l[i][1] and x[0][:-1] in l[i][0]: l[i] = x; break
inpc = inpc[:fi(inpc,x[0])] + inpc[le(x[0]):]
pattern = inpc[:b]
c = b-1
c += 1
d = {}; c = 0
s = ran(le(l))
for x in l: inp = rep(inp,x[0],'{%d}' % s[c]); d[str(s[c])] = x[0]; c += 1
return [inp,d]
def decompress(inp,l): return apply(inp.format, [l[str(x)] for x in sorted([int(x) for x in l.keys()])])
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
压缩 Base64 编码数据的最简单方法是首先将其转换为二进制数据 - 这将节省 25% 的存储空间:
在大多数情况下,您可以使用某种压缩算法进一步压缩字符串,例如
t.encode("bz2")
或t.encode("zlib")
。关于您的代码的一些注释:有很多因素使代码难以阅读:不一致的间距、过长的行、无意义的变量名、不惯用的代码等。示例:您的
decompress()
函数可以等效地写为现在它的作用已经更加明显了。您可以更进一步:为什么
substitutions
是一个带有字符串键"0"
、"1"
等的字典?使用字符串而不是整数不仅很奇怪,而且根本不需要键!一个简单的列表就可以了,decompress()
将简化为您可能认为所有这些都是次要的,但是如果您使代码的其余部分同样具有可读性,您将自己发现代码中的错误。 (有一些错误 - 对于
"abcdefgabcdefg"
和许多其他字符串,它会崩溃。)The easiest way to compress base64-encoded data is to first convert it to binary data -- this will already save 25 percent of the storage space:
In most cases, you can compress the string even further using some compression algorithm, like
t.encode("bz2")
ort.encode("zlib")
.A few remarks on your code: There are lots of factors that make the code hard to read: inconsistent spacing, overly long lines, meaningless variable names, unidiomatic code, etc. An example: Your
decompress()
function could be equivalently written asNow it's already much more obvious what it does. You could go one step further: Why is
substitutions
a dictionary with the string keys"0"
,"1"
etc.? Not only is it strange to use strings instead of integers -- you don't need the keys at all! A simple list will do, anddecompress()
will simplify toYou might think all this is secondary, but if you make the rest of your code equally readable, you will find the bugs in your code yourself. (There are bugs -- it crashes for
"abcdefgabcdefg"
and many other strings.)通常,人们会通过针对文本优化的压缩算法来泵送程序,然后通过
exec
运行该程序,例如,可能是
.pyc
/.pyo< /code> 文件已被压缩,可以通过使用
x="""aaaaaaaa"""
创建一个文件进行检查,然后将长度增加到x="""aaaaaaaaaaaaaaaaaaaaaaa...aaaa"""
并查看大小是否明显变化。Typically one would pump the program through a compression algorithm optimized for text, then run that through
exec
, e.g.It may be the case that
.pyc
/.pyo
files are compressed already, and one could check by creating one withx="""aaaaaaaa"""
, then increasing the length tox="""aaaaaaaaaaaaaaaaaaaaaaa...aaaa"""
and seeing if the size changes appreciably.