如何防止反编译或检查Python代码?
让我们假设有一个大型商业项目(又名 Project),它在底层使用 Python 来管理用于配置新控制面的插件,这些控制面可以由 Project 附加和使用。
有一个小的信息泄露,项目的 Python API 的某些部分泄露给了公共信息,人们能够编写 Python 脚本,这些脚本被底层 Python 实现调用,作为项目插件加载机制的一部分。
此外,使用 inspect
模块和原始 __dict__
读数,人们能够找出 Project 底层 Python 实现的主要部分。
有没有办法让Python密码保密?
快速浏览一下 Python 的文档,发现了一种抑制 inspect
模块导入的方法:
import sys
sys.modules['inspect'] = None
它能完全解决问题吗?
let us assume that there is a big, commercial project (a.k.a Project), which uses Python under the hood to manage plugins for configuring new control surfaces which can be attached and used by Project.
There was a small information leak, some part of the Project's Python API leaked to the public information and people were able to write Python scripts which were called by the underlying Python implementation as a part of Project's plugin loading mechanism.
Further on, using inspect
module and raw __dict__
readings, people were able to find out a major part of Project's underlying Python implementation.
Is there a way to keep the Python secret codes secret?
Quick look at Python's documentation revealed a way to suppres a import of inspect
module this way:
import sys
sys.modules['inspect'] = None
Does it solve the problem completely?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
不,这并不能解决问题。有人可以将检查模块重命名为其他名称并导入它。
你想做的事是不可能的。 python 解释器必须能够获取你的字节码并执行它。总是有人能够反编译字节码。他们始终能够生成 AST 并使用变量和类名称查看代码流程。
请注意,这个过程也可以使用编译的语言代码来完成;不同之处在于你会得到组装。有些工具可以从汇编中推断 C 结构,但我没有足够的经验来评论细节。
您想隐藏哪些具体信息?你能保留算法服务器端并将你的软件变成接触你的网络服务的客户端吗?将代码保存在您控制的机器上是真正控制代码的唯一方法。你不能给某人一个上锁的盒子和盒子的钥匙,并在他们必须打开盒子才能运行它时阻止他们打开盒子。这与 DRM 不起作用的原因相同。
话虽这么说,逆向工程仍然有可能变得困难,但当客户端拥有可执行文件时,这永远不会是不可能的。
No, this does not solve the problem. Someone could just rename the inspect module to something else and import it.
What you're trying to do is not possible. The python interpreter must be able to take your bytecode and execute it. Someone will always be able to decompile the bytecode. They will always be able to produce an AST and view the flow of the code with variable and class names.
Note that this process can also be done with compiled language code; the difference there is that you will get assembly. Some tools can infer C structure from the assembly, but I don't have enough experience with that to comment on the details.
What specific piece of information are you trying to hide? Could you keep the algorithm server side and make your software into a client that touches your web service? Keeping the code on a machine you control is the only way to really keep control over the code. You can't hand someone a locked box, the keys to the box, and prevent them from opening the box when they have to open it in order to run it. This is the same reason DRM does not work.
All that being said, it's still possible to make it hard to reverse engineer, but it will never be impossible when the client has the executable.
没有办法让您的应用程序代码保持绝对秘密。
坦率地说,如果一群专注且坚定的黑客(从好的意义上讲,而不是贬义意义上的)可以破解 PlayStation 的代码签名安全模型,那么您的应用程序就没有机会。一旦您将应用程序交给公司外部的人,就可以对其进行逆向工程。
现在,如果你想付出一些努力让它变得更难,你可以编译自己的嵌入式 python 可执行文件,删除不必要的模块,混淆编译的 python 字节码并将其包装在一些恶意软件 rootkit 中,如果调试器存在,则拒绝启动你的应用程序正在运行。
但你应该真正考虑一下你的商业模式。如果您将那些对您的产品充满热情的人视为威胁,如果您将那些愿意花时间和精力定制您的产品以个性化他们的体验的人视为危险,也许您需要重新考虑您的安全方法。假设您不从事 DRM 业务,或者拥有类似的模式,涉及从不情愿的消费者那里榨取金钱,请考虑开发一种方法,涉及与用户共享信息,并允许他们协作改进您的产品。
There is no way to keep your application code an absolute secret.
Frankly, if a group of dedicated and determined hackers (in the good sense, not in the pejorative sense) can crack the PlayStation's code signing security model, then your app doesn't stand a chance. Once you put your app into the hands of someone outside your company, it can be reverse-engineered.
Now, if you want to put some effort into making it harder, you can compile your own embedded python executable, strip out unnecessary modules, obfuscate the compiled python bytecode and wrap it up in some malware rootkit that refuses to start your app if a debugger is running.
But you should really think about your business model. If you see the people who are passionate about your product as a threat, if you see those who are willing to put time and effort into customizing your product to personalize their experience as a danger, perhaps you need to re-think your approach to security. Assuming you're not in the DRM business, or have a similar model that involves squeezing money from reluctant consumers, consider developing an approach that involves sharing information with your users, and allowing them to collaboratively improve your product.
不,没有。
Python 特别容易逆向工程,但其他语言,甚至是编译语言,也很容易逆向工程。
No there is not.
Python is particularly easy to reverse engineer, but other languages, even compiled ones, are easy enough to reverse.
您无法完全阻止软件的逆向工程 - 如果归根结底,人们总是可以分析您的程序包含的汇编指令。
但是,您可以使该过程显着复杂化,例如通过扰乱 Python 内部结构。然而,在跳到如何去做之前,我建议您评估是否去做。 “窃取”您的代码通常比自己编写代码更难(毕竟,需要完全理解它们才能扩展它们)。然而,一个纯粹的、未混淆的 Python 插件接口对于围绕你的程序创建一个完整的生态系统至关重要,这远远超过了让别人窥视你可能不完美设计的编码内部结构可能带来的负面影响。
You cannot fully prevent reverse engineering of software - if it comes down to it, one can always analyze the assembler instructions your program consists of.
You can, however, significantly complicate the process, for example by messing with Python internals. However, before jumping to how to do it, I'd suggest you evaluate whether to do it. It's usually harder to "steal" your code (one needs to fully understand them to be able to extend them, after all) than code it oneself. A pure, unobfuscated Python plugin interface, however, can be vital in creating a whole ecosystem around your program, far outweighing the possible downsides to having someone peek in your maybe not perfectly designed coding internals.
答案是否定的。但我会详细阐述对策。
这是我所有项目的 setup.py。我用Python编码。我不懂 CPython。对于我的一些项目,我使用了 MPI4Py。这需要重写 CFLAGS,这是一个不同的问题,在不同的模块中解决。
为了使用它,我将 .py 文件重命名为 .pyx。它们被编译为 .cpp,然后编译为 .so。只有 .so 对于最终产品来说是必需的,因此我们可以从转译中排除中间结果。
话虽这么说……显然,您仍然想确保只在无堆栈的 Python 实现上运行。
更新:Cython 和/或动态代码生成理论上可以进行逆向工程。但这是非常不平凡
The answer is a hard no. But I will elaborate countermeasures.
This is my setup.py for all projects. I code in Python. I don't know CPython. For some of my projects, I've played around with MPI4Py. That requires overriding the CFLAGS, which is a different problem, solved in a different module.
To use this, I rename my .py files to .pyx. They are compiled to .cpp, then to .so. Only the .so is necessary for the final product, so we can exclude the intermediate results from transpilation.
That being said... you're still gonna wanna make sure that you only run on a stackless Python implementation, apparently.
Update: Cython and/or dynamic code generation are theoretically possible to reverse engineer. But it is very non-trivial