RDBMS 作为缓存,需要设计建议
我有一个“黑匣子”应用程序,它获取值映射作为参数,执行繁重且长时间(最多 5 秒)的计算,并生成可以保存在数据库中的单个 Result
。 我对该应用程序的了解是:
- 结果相对于提供的映射 af 值是唯一的
- 参数是一个 String->String 映射,两者的最大长度都已知 键和值
- 参数映射的长度可变(从 2-3 到 1000 个条目或 so)
- 可能的键值列表的大小约为 1000 个
示例参数为:
Map: {'k1'->'a', 'k2'->'b'}
Map: {'k1'->'a', 'k2'->'b', ... 'k100'->'zzz'}
Map: {'k1'->'x', 'k8'->'y'}
Map: {'k6'->'z'}
上述每个参数都会生成唯一的 Result
对象。
现在想象另一个服务,它构建在那个缓慢的库之上,并且需要上线并每秒处理数十个计算请求。 如果不缓存已经计算的结果,这是不可能的。 我对可能的缓存大小总数的估计约为 100-5 亿条记录,这使我倾向于使用 RDBMS 作为缓存存储。
由于结果由提供的映射唯一标识,我可以按键对参数映射进行排序并将其连接到字符串“k1:a:k2:b....”。这肯定是缓存键,但是:
- 缓存键将会很大,超过许多 RDBMS 的键大小限制,并且 需要索引 CLOB
- 我不会利用键值限制的事实 可能的值。
你有什么建议?性能是我在这里主要关心的问题。
I have a 'black box' application that gets a map of values as parameters, performs heavy and long (up to 5s) calculations and generates single Result
which can be persisted in a database.
All I know about that application is that:
- Result is unique with respect to provided map af values
- Argument is a String->String map with known maximun length for both
key and value - Argument map is of variable length (from 2-3 up to 1000 entries or
so) - The size of list of possible key values is around 1000
Sample arguments are:
Map: {'k1'->'a', 'k2'->'b'}
Map: {'k1'->'a', 'k2'->'b', ... 'k100'->'zzz'}
Map: {'k1'->'x', 'k8'->'y'}
Map: {'k6'->'z'}
Each of the above will produce unique Result
object.
Now imagine another service, which is built on top of that slow library, and which needs to go online and handle dozens of calculation requests per second.
This is impossible without caching of already calculated results.
My estimation of total number of possible cache size is somewhat around 100-500 millions of records, which leads me towards using RDBMS as cache storage.
As the result is uniquely identified by provided map, I could sort argument map by key and concatenate it into the string 'k1:a:k2:b....'. That will definetely be the cache key, but:
- Cache key will be huge, above key size limits for many RDBMS and
require indexed CLOB's - I will make no use of the fact that key values are limited in
possible values.
What'd be your advice? Performance is my main concern here.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
实际上,这听起来更像是一个最好通过 键值存储 解决的问题,或者文档数据库,而不是 RDBMS。
另一种值得研究的可能性是像 memcached 这样的缓存服务器。
Actually, this sounds more like a problem best solved by a key-value store or document database, not an RDBMS.
Another possibility worth looking into is a caching server like memcached.
我给你的建议是计算 500M * 5sec 有多长,以天表示。这是计算您将存储在缓存中的所有结果所需的时间,也是您开始看到构建该缓存的实际好处之前所需的时间。
(是的,我知道,你可以“逐渐”建立你的缓存。但是如果有那么多可能的条目,那么命中的概率与缓存大小本身成正比,即:在启动阶段几乎没有恕我直言,您需要花费很长的时间才能达到合理的命中概率水平。)
My advice to you is to calculate how long 500M * 5sec is, expressed in days. That is the time it will take to compute all the results that you will be storing in your cache, and that is the time it will take before you start to see actual benefit from having built that cache.
(Yeah, I know, you can build up your cache "gradually". But if there are that many possible entries, then the probability of a hit is just proportional to the cache size itself, i.e. : almost none at all in the startup phase. And it will take a looooooooooooong time before you get up to a reasonable level of hit probability. imho.)