何时使用键值存储进行 Web 开发?
什么时候有人会使用键值(Redis、memcache 等)存储进行 Web 开发?实际的用例将是最有帮助的。
我的困惑是,一个简单的数据库似乎功能强大得多,因为据我了解,它可以做键值存储可以做的所有事情,而且它还允许您进行过滤/查询。这意味着,根据我的理解,你不能像这样
select * homes where price > 100000
使用键值存储来进行过滤。
示例
假设 StackOverflow 使用键值存储(memcache、redis 等)。
键值存储如何帮助满足 Stackoverflow 托管需求?
When would someone use a key-value (Redis, memcache, etc) store for web development? An actual use case would be most helpful.
My confusion is that a simple database seems so much more functional because, to my understanding, it can do everything a key-value store can do PLUS it also allows you to do filtering/querying. Meaning, to my understanding, you can NOT do filter like:
select * homes where price > 100000
with a key-value store.
Example
Let's pretend that StackOverflow uses a key-value store (memcache, redis, etc).
How would a key-value store help benefit Stackoverflow hosting needs?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我无法回答何时使用键值(此处为 kv)数据存储的问题,但我可以向您展示一些示例,并回答您的 stackoverflow 示例。
通过数据库访问,您需要的大部分是 kv 存储。例如,用户使用用户名“joe”登录。因此,您在数据库中查找“user:joe”并检索他的密码(当然是哈希值)。或者也许你在“user:pass:joe”下有他的密码,这真的不重要。如果是堆栈溢出并且您正在渲染页面
http://stackoverflow.com/questions/6935566/when-to-use-a-key-value-store-for-web-development
,您可以查找“问题:6935566”并使用它。很容易看出 kv 商店如何解决您的大部分问题。我想说,kv 存储是传统 RDMS 提供的功能的子集。这是因为传统 RDMS 的设计存在许多扩展问题,并且通常会随着扩展而丢失功能。 kv 商店不具备这些功能,因此它们不会限制您。然而,这些功能通常可以无论如何创建,从核心设计为可扩展(因为如果它们不是可扩展的,它会立即变得显而易见)。
但这并不意味着有些事情你不能做。例如你提到查询。这是许多 kv 存储的陷阱,因为它们通常不知道值(并不总是正确的,例如,redis 等)并且无法找到您正在寻找的内容。更糟糕的是,它们的设计目的并不是要快速完成此操作,它们只是通过按键快速查找。
此问题的一种解决方案是按字典顺序对键进行排序并允许范围查询。这本质上是“给我问题:1 和问题:5 之间的所有内容”。现在这个例子相当无用,但是范围查询有很多用途。
您说您希望所有房屋的价格都超过 100 000 美元。如果您希望能够做到这一点,您将创建按价格列出的房屋索引。假设您有以下房屋。
在 SQL 中,您可以将每个字段存储在一列中,而不是将其全部存储在一个(在本例中为 JSON)文档中。并且可以
从价格>的房屋中选择* 100000。这看起来一切都很好,但是,如果没有建立索引,则需要查看表中的每栋房子并检查其价格,如果您有几百万栋房子,这可能会很慢。因此,对于 kv 存储,您还需要一个索引。主要区别在于 SQL 数据库会默默地执行缓慢的操作,而 kv 存储则无法做到这一点。
如果您没有范围查询,则需要将索引粘贴在单个文档中,这使得安全更新变得很痛苦,并且意味着您必须为每个查询下载整个索引,这又限制了可扩展性。
但是,如果您有范围查询(通常称为键扫描),您可以创建如下索引:
然后您可以请求
house:index:price:100000
和house:index:price 之间的键::
(':' 字符是 '9' 之后的字符),你会得到[3,1,0]
这是所有比 100 000 美元更贵的房子(它们也都是按顺序有帮助)。另一个好处是它们可能位于集群的一个“分区”上,因此如果您的范围恰好超出,此查询将花费与单次获取(加上微小的额外传输开销)或两次获取相同的时间服务器边界(但这些可以并行完成!)。这展示了如何在 kv 存储中进行查询。您可以查询任何可以作为字符串排序的内容(几乎任何内容)并快速查找。如果你没有范围查询,你将需要将整个索引存储在一个键下,这很糟糕,但如果你有范围查询,那就非常好,而且非常快。这是一个更复杂的例子。
我想要多伦多未售出的房屋,价格低于 100 000 美元。我只需要设计我的索引即可。 (我添加了几栋房屋以使其更有意义)起初,您可能只是为每个房产建立另一个索引,但您很快就会意识到,这意味着您必须选择每栋未售出的房屋并从数据库中下载它。 (这就是我说扩展问题立即显而易见时的意思。)解决方案是使用多索引。构建完成后,您可以准确选择所需的值。
现在,与上一个示例不同,我将 id 放入密钥中。这使得两栋房子具有相同的属性。我可以将它们合并到值中,但是添加删除索引会变得更加困难。我还选择用
~
分隔数据。这是因为它按字典顺序排列在所有字母之后,确保对全名进行排序,并且我不必将每个城市填充到相同的长度。在生产系统中,我可能会使用字节 255 或 0。现在范围
house:index:sold:city:price:f~Toronto~100000
-house:index:sold:city :price:f~Toronto~~
将选择与查询匹配的所有房屋。需要注意的重要一点是,查询与结果数量成线性比例。这确实意味着您必须为要索引的每组属性构建一个索引(尽管我们示例中的索引也适用于已售和已售城市查询)。这可能看起来工作量很大,但最终您意识到这只是您在做,而不是您的数据库在做。我相信我们很快就会开始看到此类事情的库出现:D在稍微扩展主题之后,我已经展示了:
我认为您会发现 kv-store 对于许多应用程序来说已经足够了,并且通常可以提供比传统 RDMS 更好的性能和可用性。话虽这么说,每个应用程序都是不同的,因此不可能回答原来的问题。
I can't answer the question of when to use a key-value (herein kv) data store but I can show you some of the examples, and answer your stackoverflow example.
With database access, most of what you need is a kv store. For example, a user logs in with the username "joe". So you look up "user:joe" in your database and retrieve his password (hash of course). Or maybe you have his password under "user:pass:joe", it really doesn't matter. If it was stack overflow and you were rendering the page
http://stackoverflow.com/questions/6935566/when-to-use-a-key-value-store-for-web-development
, you would look up "question:6935566" and use that. It is simple to see how kv stores can solve most of your problems.I would like to say that a kv store is a subset of functionality provided by a traditional RDMS. This is because the design of the traditional RDMS provides many scaling issues, and generally loses features as you scale. kv stores don't come with these features, so they don't limit you. However, these features can often be created anyways, designed from the core to be scalable (because it becomes immediately obvious if they are not).
However that doesn't mean that there are things that you can't do. For example you mention querying. This is a pitfall of many kv stores, as they are generally agnostic of the value (not always true, example, redis and more) and have no way of finding what you are looking for. Worse, they are not designed to do that quickly, they are just really quick looking up by key.
One solution to this problem is to sort your keys lexicographically and allow range queries. This is essentially "give me everything between question:1 and question:5". Now that example is fairly useless, but there are many uses of range queries.
You said you want all houses more then $100 000. If you wanted to be able to do this you would create an index of houses by price. Say you had the following houses.
In SQL you would store each field in a column rather then having it all in one (in this case JSON) document. And could
SELECT * FROM houses WHERE price > 100000
. This seems all fine and dandy but, if there isn't an index built, this requires looking at every house in your table and checking its price, which if you have a couple million houses, could be slow. So with a kv store you need an index as well. The main difference is that the SQL database would silently do the slow thing, where the kv store wouldn't be able.If you don't have range queries you would need to stick your index in a single document, which makes safely updating it a pain and means that you would have to download the whole index for every query, again, limiting scalability.
But if you have range queries (often called keyscans) you can create an index like this:
And then you could request the keys between
house:index:price:100000
andhouse:index:price::
(the ':' character is the character after '9') and you would get[3,1,0]
which is all the houses more expensive than $100 000 (they are also helpfully in order). Another nice thing about this is that they will likely be on one "partition" of your cluster so this query will take about the same time as a singe get (plus the tiny extra transfer overhead) or two gets if your range happens to go over a server boundary (but these can be done in parallel!).So that shows how to do queries in a kv store. You can query anything that can be ordered as a string (just about anything) and look it up very quickly. If you don't have range queries you will need to store your whole index under one key which sucks, but if you have range queries it is very nice, and very fast. Here is a more complex example.
I want unsold houses in Toronto that are less then $100 000. I simply have to design my index. (I added in a couple of houses to make it more meaningful) At first thought you might just build another index for every property, but you will quickly realize that that means that you have to select every unsold house and download it from the database. (This is what I meant when I said scaling problems are immediately obvious.) The solution is to use a multi-index. Once built you can select exactly the values you want.
Now, unlike the last example I put the id in the key. This allows two houses have the same properties. I could have merged them in the value but then adding a removing indexes becomes more difficult. I also chose to separate my data with a
~
. This is because it is lexicographically after all of the letters, ensuring that the full name will be sorted and I don't have to pad every city to the same length. In a production system I would probably use the byte 255 or 0.Now the range
house:index:sold:city:price:f~Toronto~100000
-house:index:sold:city:price:f~Toronto~~
will select all houses that match the query. And the important thing to note is that query scales linearly with the number of results. This does mean that you have to build an index for every set of properties that you want to index (although the index in our example also works for sold, and sold-city queries). This may seem like a lot of work but in the end you realize that it is just that you are doing it, not your database. I'm sure we will begin to see libraries for this kind of thing coming out soon :DAfter stretching the topic a bit, I have shown:
I think that you will find that kv-stores are enough for many applications and can often provide better performance and availability than traditional RDMS. That being said, every app is different and therefore, it is impossible to answer the original question.
不要将 NoSQL 类型数据库与 memcached(其无意永久存储数据)之类的数据库混淆。
memcached 的典型用途是存储一些可由 Web 服务器集群访问的查询结果 - 即。共享缓存。例如。此页面上有相关帖子的列表,数据库可能需要做一些工作才能生成该列表。如果每次有人加载页面时都这样做,那么您将为数据库创建大量工作。相反,第一次检索的结果可以存储在 memcached 服务器上,键为页面 ID。然后,集群中的任何 Web 服务器都可以非常快速地获取该信息,而无需不断访问数据库。一段时间后,缓存条目将被 memcached 清除,以便旧文章的结果不会占用空间。 [免责声明:我不知道 StackOverflow 是否真的这样做]。
另一方面,“NoSQL”数据库用于永久存储信息。如果您的数据模式非常简单并且您的查询也非常简单,那么它可能比标准 SQL 数据库更快。许多 Web 应用程序不需要非常复杂的数据库,因此 NoSQL 数据库非常适合。
Do not confuse a NoSQL type database with something like memcached (which is not intended to store data permanently).
Typical use for memcached is to store some query results that can be accessed by a cluster of web servers - ie. a shared cache. Eg. On this page is a list of related posts and there is likely a bit of work for the database to do to produce that list. If you do that every time someone loads the page then you will create a lot of work for the database. Instead, the results once retrieved for the first time could be stored on a memcached server with the key being the page ID. Any of the web servers in the cluster can then fetch that information very quickly without having to constantly hit the database. After a while, the cache entry would be purged by memcached so that the results for old articles don't use up space. [Disclaimer: I've no idea if StackOverflow does this in reality].
A "NoSQL" database on the other hand is for storing information permanently. If your data schema is quite simple and so are your queries, then it may be faster than a standard SQL database. A lot of web applications don't need hugely complex databases, and so NoSQL databases can be a good fit.
noSQL 有两个一般可行的用例:
大多数 noSQL 解决方案实际上都是无模式的;操作所需的仪式要少得多;轻量级(就 API 而言);与更规范的关系持久性系统相比,它提供了显着的性能提升,表明它们适用于上述 2 个用例(一般意义上)。
愤世嫉俗的——或者也许是商业意义上的实用——人们可以为 noSQL 系统提出第三种通用用例(仍然由上述一组特性/功能告知):
它更容易被任何没有经验的人(但没有经验的人)发现。脑死亡)一个年轻的极客可以很快就学会它。这是一个非常强大的功能。 (尝试使用 Oracle ..)
因此,noSQL 系统的用例(通常可以被描述为宽松的持久性系统)- 都是基于实际考虑的最佳信息。嗯>。
毫无疑问——除了大规模可扩展的系统之外——RDBMS 系统在形式上是完美的系统,旨在确保数据完整性。
There are two general viable use-cases for noSQL:
The fact that most noSQL solutions are effectively schema-less; require far less ceremony to operate; are light-weight (in terms of API); and provide significant performance gains in contrast to the more canonical relational persistence systems informs their suitability for the above 2 use-cases (in the general sense).
Being cynical -- or perhaps practical in the business sense -- one can propose a 3rd general use-case for noSQL systems (still informed by the above set of characteristics/features):
It is easier to grock and any inexperienced (but un-brain-dead) aspring geek can pick it up in a snap. That is a very powerful feature. (Try that with Oracle ..)
So, the use-cases of noSQL systems -- which in general can be characterized as relaxed persistent systems -- are all optimally informed by practical considerations.
There is absolutely no question -- outside of hugely massively scalable systems -- that RDBMS systems are formally perfect systems designed to insure data integrity.
键值存储通常非常快,因此最好将它们用作频繁访问且很少更新的数据的缓存,以减少数据库的负载。
正如您所说,您通常会受到查询的限制(尽管 MongoDB 处理得很好),但是键值存储主要用于访问精确数据:用户 X 的个人资料、会话 X 的信息等。
“传统”数据库对于一般网站来说可能绰绰有余,但如果您遇到高负载,键值存储确实可以帮助您缩短加载时间。
编辑:“高负载”是指真正负载。键值存储很少需要。
查看键值存储的比较。
Key-value stores are usually really fast so it's good to have them as a cache for data that is heavily accessed and rarely updated to reduce load on your DBs.
As you said, you are usually limited with queries (though MongoDB handles them pretty well), but key-value stores are mostly meant for accessing precise data: user X's profile, session X's info, etc.
A "traditional" DB will probably be more than enough for the average website, but if you experience high loads key-value stores can really help your load times.
EDIT: And by "high loads", I mean really high loads. Key-value stores are rarely necessary.
See this comparison of key-value stores.
只是对 bstrawson 的答案进行补充,
“mem-cache-d”是一种缓存机制,而Redis是一种永久存储,但两者都将数据存储为键值对。
在键值存储(例如 Redis 或 Membase)上搜索更像是在关系数据库中搜索所有值,速度太慢。如果您想做一些查询,您可能需要迁移到面向文档的 NoSQL 类型数据库,例如 MongoDB 或 CouchDB,您可以在其中执行一些查询部分。
不久的将来,您将能够使用 couchbase 服务器 2.0,它将通过新引入的 UnQL 和缓存(直接派生自内存缓存源代码)
Just an adding to bstrawson's answer,
"mem-cache-d" is a caching mechanism while Redis is a permanent storage but both store data as key-value pair.
Search on a key-value storage(something like Redis or Membase) more like search all the value in a relational database, too slow. If you want do some querying you may need to move to document-oriented NoSQL type DB such as MongoDB or CouchDB which you can do some query part.
Near future you will able to handle couchbase sever 2.0 which will address all your burning issues with NoSQL data querying with newly introduced UnQL and caching(directly derived from the memcached source code)
Stack Overflow 确实广泛使用了 Redis。您的问题的详细答案,以 Stack Overflow 为例,位于 a一些不错的 博客文章。 Mark 是出色的 Booksleeve 全异步 .NET Redis 绑定库的作者。
Stack Overflow does indeed use Redis, and extensively. Detailed answer to your question, with Stack Overflow as the example, in a couple of nice blog posts by @Mark Gravell. Mark is the author of the superb Booksleeve fully-asynchronous .NET Redis binding library.