django 如何处理多个 memcached 服务器?
在 django 文档中是这样说的:
...
Memcached 的一个出色特性是它能够通过以下方式共享缓存: 多个服务器。这意味着您可以在多个上运行 Memcached 守护进程 机器,并且程序会将机器组视为单个机器 缓存,无需在每台机器上复制缓存值。到 利用此功能,将所有服务器地址包含在 LOCATION,用分号分隔或作为列表。
...
它到底是如何工作的?我在这个网站上读过一些答案,这些答案表明这是通过根据密钥的哈希值跨服务器进行分片来完成的。
MemCacheStore 是如何真正与多个服务器一起工作的?
很好,但我需要一个比这更具体和详细的答案。将 django 与 pylibmc 或 python-memcached 结合使用,这种分片实际上是如何执行的?配置设置中 IP 地址的顺序重要吗?如果运行同一个 django 应用程序的两个不同的 Web 服务器有两个不同的设置文件,并且 memcached 服务器的 IP 地址的顺序不同,该怎么办?这是否会导致每台机器使用不同的分片策略,从而导致重复键和其他低效率问题?
如果特定机器在列表中出现两次怎么办?例如,如果我要执行类似的操作,其中 127.0.0.1 实际上与 172.19.26.240 是同一台计算机,该怎么办?
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
'LOCATION': [
'127.0.0.1:11211',
'172.19.26.240:11211',
'172.19.26.242:11211',
]
}
}
如果其中一台 Memcached 服务器的容量比其他服务器大怎么办?如果机器 1 有 64MB 的 memcached,机器 2 有 128MB,分片算法是否会考虑到这一点并为机器 2 提供更大比例的键?
我还读到,如果 memcached 服务器丢失,那么这些密钥也会丢失。当涉及分片时,这一点是显而易见的。更重要的是,如果 memcached 服务器出现故障并且我将其 IP 地址保留在设置文件中,会发生什么情况? django/memcached 是否会简单地无法获取任何本应分片到故障服务器的密钥,或者它会意识到服务器已发生故障并提出新的分片策略吗?如果有新的分片策略,它是否会智能地获取最初用于故障服务器的密钥并将它们分配给其余服务器,或者是否会提出一个全新的策略,就好像第一台服务器不存在一样导致密钥重复?
我尝试阅读 python-memcached 的源代码,但根本无法弄清楚这一点。我打算尝试阅读 libmemcached 和 pylibmc 的代码,但我认为如果有人已经知道的话,在这里问会更容易。
In the django documentation it says this:
...
One excellent feature of Memcached is its ability to share cache over
multiple servers. This means you can run Memcached daemons on multiple
machines, and the program will treat the group of machines as a single
cache, without the need to duplicate cache values on each machine. To
take advantage of this feature, include all server addresses in
LOCATION, either separated by semicolons or as a list....
Django's cache framework - Memcached
How exactly does this work? I've read some answers on this site that suggest this is accomplished by sharding across the servers based on hashes of the keys.
Multiple memcached servers question
How does the MemCacheStore really work with multiple servers?
That's fine, but I need a much more specific and detailed answer than that. Using django with pylibmc or python-memcached how is this sharding actually performed? Does the order of IP addresses in the configuration setting matter? What if two different web servers running the same django app have two different settings files with the IP addresses of the memcached servers in a different order? Will that result in each machine using a different sharding strategy that causes duplicate keys and other inefficiencies?
What if a particular machine shows up in the list twice? For example, what if I were to do something like this where 127.0.0.1 is actually the same machine as 172.19.26.240?
CACHES = {
'default': {
'BACKEND': 'django.core.cache.backends.memcached.MemcachedCache',
'LOCATION': [
'127.0.0.1:11211',
'172.19.26.240:11211',
'172.19.26.242:11211',
]
}
}
What if one of the memcached servers has more capacity than the others? If machine one has as 64MB memcached and machine 2 has a 128MB, Will the sharding algorithm take that into account and give machine 2 a greater proportion of the keys?
I've also read that if a memcached server is lost, then those keys are lost. That is obvious when sharding is involved. What's more important is what will happen if a memcached server goes down and I leave its IP address in the settings file? Will django/memcached simply fail to get any keys that would have been sharded to that failed server, or will it realize that server has failed and come up with a new sharding strategy? If there is a new sharding strategy, does it intelligently take the keys that were originally intended for the failed server and divide them among the remaining servers, or does it come up with a brand new strategy as if the first server didn't exist and result in keys being duplicated?
I tried reading the source code of python-memcached, and couldn't figure this out at all. I plan to try reading the code of libmemcached and pylibmc, but I figured asking here would be easier if someone already knew.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
实际的 memcached 客户端负责进行分片。 Django 仅将配置从
settings.CACHES
传递到客户端。服务器的顺序并不重要*,但是(至少对于 python-memcached)您可以为每个服务器指定一个“权重”:
我认为快速浏览一下
memcache.py
(来自 python-memcached),尤其是memcached.Client._get_server
应该回答您的其余问题:我希望其他 memcached 客户端以类似的方式实现。
@Apreche 的澄清:在一种情况下,服务器的顺序确实很重要。如果您有多个Web服务器,并且希望它们都将相同的密钥放在相同的memcached服务器上,则需要使用相同的服务器列表以相同的顺序和相同的权重配置它们
It's the actual memcached client who does the sharding. Django only passes the configuration from
settings.CACHES
to the client.The order of the servers doesn't matter*, but (at least for python-memcached) you can specify a 'weight' for each of the servers:
I think that a quick look at
memcache.py
(from python-memcached) and especiallymemcached.Client._get_server
should answer the rest of your questions:I would expect that the other memcached clients are implemented in a similar way.
Clarification by @Apreche: The order of servers does matter in one case. If you have multiple web servers, and you want them all to put the same keys on the same memcached servers, you need to configure them with the same server list in the same order with the same weights
我测试了其中的一部分,并在 django 1.1 和 python-memcached 1.44 中发现了一些有趣的东西。
在 django 上使用 2 个 memcache 服务器
cache.set('a', 1, 1000)
cache.get('a') # returned 1
我查找了哪个 memcache 服务器 ' a' 被分片为使用另外 2 个 django 设置,每个设置都指向一个 memcache 服务器。我通过在原始 django 实例和存储 'a' 的 memcache 服务器之间设置防火墙来模拟连接中断。
cache.get('a') # 暂停几秒钟,然后返回 None
cache.set('a', 2, 1000)
cache.get('a') # returned 2 rightaway
memcache 客户端库确实更新它是如果服务器出现故障,则采用分片策略。
然后我删除了防火墙。
cache.get('a') # 返回 2 一段时间,直到检测到服务器备份,然后返回 1!
当 Memcache 服务器断开并恢复时,您可以读取过时的数据! Memcache 并没有采取任何巧妙的措施来尝试阻止这种情况。
如果您使用的缓存策略将内容长时间保存在 Memcache 中并依赖缓存失效来处理更新,那么这确实会使事情变得混乱。可以将旧值写入该密钥的“正常”缓存服务器,如果您失去连接并且在该窗口期间发生失效,当服务器再次可访问时,您将读取不应该读取的陈旧数据到。
还有一点要注意:我一直在阅读一些对象/查询缓存库,我认为 johnny-cache 应该不受这个问题的影响。它不会明确地使条目无效;相反,当表更改时,它会更改缓存查询的键。所以它永远不会意外读取旧值。
编辑:我认为我关于 johnny-cache 工作正常的注释是垃圾。 http://jmoiron.net/blog/is-johnny-cache-for-you/ 表示“每个请求都会有额外的缓存读取来加载当前几代”。如果世代存储在缓存本身中,则上述情况可能会导致读取过时的世代。
I tested part of this and found some interesting stuff with django 1.1 and python-memcached 1.44.
On django using 2 memcache servers
cache.set('a', 1, 1000)
cache.get('a') # returned 1
I looked up which memcache server 'a' was sharded to using 2 other django setups each pointing at one of the memcache servers. I simulated a connectivity outage by putting up a firewall between the original django instance and the memcache server that 'a' was stored in.
cache.get('a') # paused for a few seconds and then returned None
cache.set('a', 2, 1000)
cache.get('a') # returned 2 right away
The memcache client library does update its sharding strategy if a server goes down.
Then I removed the firewall.
cache.get('a') # returned 2 for a bit until it detected the server back up then returned 1!
You can read stale data when a memcache server drops and comes back! Memcache doesn't do anything clever to try to prevent this.
This can really mess things up if you're using a caching strategy that puts things in memcache for a long time and depends on cache invalidation to handle updates. An old value can be written to the "normal" cache server for that key and if you loose connectivity and an invalidation is made during that window, when the server becomes accessible again, you'll read stale data that you shouldn't be able to.
One more note: I've been reading about some object/query caching libraries and I think johnny-cache should be immune to this problem. It doesn't explicitly invalidate entries; instead, it changes the key at which a query is cached when a table changes. So it would never accidentally read old values.
Edit: I think my note about johnny-cache working ok is crap. http://jmoiron.net/blog/is-johnny-cache-for-you/ says "there are extra cache reads on every request to load the current generations". If the generations are stored in the cache itself, the above scenario can cause a stale generation to be read.
考虑在问题提出两年后添加这个答案,因为它在搜索中排名非常高,而且我们确实发现了 django 仅与其中一个 memcached 服务器通信的情况。
在 django 1.4.3、python-memcached 1.51 上运行的站点与四个 memcached 实例通信时,我们发现数据库的查询频率远远高于预期。进一步挖掘,我们发现
cache.get()
对于已知至少存在于一个 memcached 实例中的键返回None
。当使用 -vv 选项启动 memcached 时,它显示仅向一台服务器询问该问题!费尽心思之后,我们将后端切换到 django.core.cache.backends.memcached.PyLibMCCache (pylibmc),问题就消失了。
Thought to add this answer two years after the question was asked, since it ranks very highly in search and because we did find a situation where django was talking to only one of the memcached servers.
With a site running on django 1.4.3, python-memcached 1.51 talking to four memcached instances, we found that the database was being queried far more often than expected. Digging futher, we found that
cache.get()
was returningNone
for keys that were knew to be present in at least one of the memcached instances. When memcached was started with the -vv option it showed that the question was asked only of one server!After a lot of hair had been pulled, we switched the backend to
django.core.cache.backends.memcached.PyLibMCCache
(pylibmc) and the problem went away.如果使用两个不同的内存缓存是理想的,那么 django 的默认实现允许这种行为。
首先你需要更新你的settings.py:
在你的django代码中,访问memcache的默认方法没有改变。您现在可以使用其他缓存接口,如下所示:
Django 文档有一篇很棒的文章涵盖了这个主题: https://docs.djangoproject.com/en/dev/topics/cache/
If using two distinct memcache's is ideal, django's default implementation allows for this behavior.
First you'll want to update your settings.py:
Inside your django code, the default method for accessing memcache hasn't changed. You can now use the other cache interface as follows:
The Django documentation has a great write up covering this topic: https://docs.djangoproject.com/en/dev/topics/cache/