仅加载一次 pickled 列表 - Django\Python
我有一个 pickle 文件,其中包含已编译的正则表达式和其他数据的列表。
加载大约需要 1-1.5 秒。
在我的视图中使用此列表的好方法是什么,但让 pickle 对文件只工作一次?
编辑:
导入到 settings.py 会被认为可以吗?
有什么想法吗?
I have a pickle file that contains a list of compiled regexes and other data.
It takes about 1-1.5 seconds to load.
What could a good way of using this list into my views, but have pickle work on the file just once?
Edit:
would importing into settings.py be considered ok?
Any ideas?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
如何操作
创建一个名为 cache.py 的模块,然后:
这将仅由服务器进程重新加载数据一次(这取决于您的设置、Web 服务器以及使用 WSGI 或 CGI 的位置)。在开发 Web 服务器(
./manage.py runserver
)中,每次修改文件时,缓存都会失效。工作原理
Python 中的模块对于每个 Python 进程仅导入一次。如果多次使用
import
,它只会返回对已导入模块的引用。因此,如果您有一个运行 mod_wsgi 且有 4 个工作进程的 Apache,则get_my_data()
将仅被调用 4 次,因为只有 4 个 Python 进程在运行。请记住,worker 可能会死亡、被重新加载、被杀死等。但它应该将对get_my_data()
的调用保持在最低限度。问题:如果一个进程修改了缓存数据,其他进程不会知道。如果您的数据是静态的,那就没问题。如果您需要使其保持最新,它将无法工作。对于此方法或任何暗示使用单例的方法都是如此,除非您可以确保只有一个 Python 进程在运行(您可以,但这不是本答案的目的)。
语法说明:
getattr(cache, 'data', '')
返回对象 'cache' 的名称为 'data' 的属性。如果不存在,则返回最后一个参数,此处为空字符串。在 Python 中,
or
是惰性的,如果可以返回,将停止评估参数。在我们的例子中,如果“data”是缓存的一个属性,那么在布尔上下文中它将为True
,or
将认为它已经完成了它的工作(因为它需要只有一个值为True
才能返回True
),并且将在不运行get_my_data()
的情况下返回True
。但是,如果“data”不是缓存的属性,那么如果or
将计算空字符串,则将其视为False
,然后运行 get_my_data()< /代码>。
为什么您可能不想这样做
re
模块都会缓存正则表达式,因此您可能不再需要编译它们。其他数据可能可以表示为原始数据。将它们全部作为字符串和其他基元存储在缓存后端,例如 memcached 或 redis,它会变得更加干净。 另外,如果一个 Python 进程更新了缓存,那么其他进程也会意识到这一点。他们不会使用上面的代码片段。关于settings.py的最后一句话
你不应该放入settings.py文件中:
How you can do it
Create a module called cache.py, then:
This will reload the data only once by server process (which will depend on your setup, your web server and wherever you use WSGI or CGI). In the dev web server (
./manage.py runserver
), every time you will modify a file, the cache will be invalidated.How it works
Modules in Python are imported only once for each Python process. If you use
import
several times, it will only return a reference to the already imported module. So if you have an Apache running mod_wsgi with 4 workers,get_my_data()
will be called only 4 times as there are only 4 Python processes running. Remember that worker can die, be reloaded, be killed, etc. But it should keep calls toget_my_data()
to a minimum.Gotcha: if one process modifies the cache data, others won't know about it. If your data is meant to be static, it's ok. If you need to keep it up to date, it won't work. It's true for this method or any method implying the use of a singleton, unless you can ensure you have only one Python process running (which you can, but this is not the purpose of this answer).
About the syntax:
getattr(cache, 'data', '')
return the attribute with the name 'data' of the object 'cache'. If it doesn't exist, it returns the last parameters, here an empty string.In Python,
or
is lazy and will stop evaluating parameters if it can return. In our case, if 'data' is an attribute of cache, it will beTrue
in a boolean context,or
will consider that it already did it's job (as it needs only one value to beTrue
to returnTrue
) and will returnTrue
without runningget_my_data()
. However, if 'data' is not an attribute of cache, then ifor
will evaluate an empty string, consider it asFalse
, then runget_my_data()
.Why you probably don't want to do it anyway
re
module caches regex anyway, so you probably don't need to compile them anymore. The other data probably can be expressed as primitive. Store all of them as strings and other primitives in a cache backend such as memcached or redis, it's going to be much cleaner. Plus, if one Python processes update the cache, then the others will be aware of it. They wont with the above snippet.Last word about settings.py
You should not put in in the settings.py file:
我会编写一个 python 模块 - 一个带有 init 方法的单例类,该方法将 pickled 数据读取到 python 对象中,然后使用您需要的任何“get”方法来获取信息。
然后在你的settings.py中你只需调用初始化方法。任何需要从中获取信息的东西都只需导入模块并使用 get 方法。
I'd write a python module - a singleton class with an init method that reads the pickled data into a python object, and then whatever 'get' methods you need to get the info out.
Then in your settings.py you just call the initialisation method. Anything that needs to get info from it just imports the module and uses the get methods.
您可以加载它,然后使用 django 缓存框架来存储它,这样它只会加载一次。
http://docs.djangoproject.com/en/dev/topics/cache/
You could load it in and then use the django cacheing framework to store it, that way it would only be loaded once.
http://docs.djangoproject.com/en/dev/topics/cache/