何时使用键值存储进行 Web 开发？

发布于 2024-11-27 23:10:30 字数 339 浏览 2 评论 0原文

什么时候有人会使用键值（Redis、memcache 等）存储进行 Web 开发？实际的用例将是最有帮助的。

我的困惑是，一个简单的数据库似乎功能强大得多，因为据我了解，它可以做键值存储可以做的所有事情，而且它还允许您进行过滤/查询。这意味着，根据我的理解，你不能像这样

select * homes where price > 100000

使用键值存储来进行过滤。

示例

假设 StackOverflow 使用键值存储（memcache、redis 等）。

键值存储如何帮助满足 Stackoverflow 托管需求？

原文

When would someone use a key-value (Redis, memcache, etc) store for web development? An actual use case would be most helpful.

My confusion is that a simple database seems so much more functional because, to my understanding, it can do everything a key-value store can do PLUS it also allows you to do filtering/querying. Meaning, to my understanding, you can NOT do filter like:

select * homes where price > 100000

with a key-value store.

Example

Let's pretend that StackOverflow uses a key-value store (memcache, redis, etc).

How would a key-value store help benefit Stackoverflow hosting needs?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

红墙和绿瓦 2024-12-04 23:10:30

我无法回答何时使用键值（此处为 kv）数据存储的问题，但我可以向您展示一些示例，并回答您的 stackoverflow 示例。

通过数据库访问，您需要的大部分是 kv 存储。例如，用户使用用户名“joe”登录。因此，您在数据库中查找“user:joe”并检索他的密码（当然是哈希值）。或者也许你在“user:pass:joe”下有他的密码，这真的不重要。如果是堆栈溢出并且您正在渲染页面http://stackoverflow.com/questions/6935566/when-to-use-a-key-value-store-for-web-development，您可以查找“问题：6935566”并使用它。很容易看出 kv 商店如何解决您的大部分问题。

我想说，kv 存储是传统 RDMS 提供的功能的子集。这是因为传统 RDMS 的设计存在许多扩展问题，并且通常会随着扩展而丢失功能。 kv 商店不具备这些功能，因此它们不会限制您。然而，这些功能通常可以无论如何创建，从核心设计为可扩展（因为如果它们不是可扩展的，它会立即变得显而易见）。

但这并不意味着有些事情你不能做。例如你提到查询。这是许多 kv 存储的陷阱，因为它们通常不知道值（并不总是正确的，例如，redis 等）并且无法找到您正在寻找的内容。更糟糕的是，它们的设计目的并不是要快速完成此操作，它们只是通过按键快速查找。

此问题的一种解决方案是按字典顺序对键进行排序并允许范围查询。这本质上是“给我问题：1 和问题：5 之间的所有内容”。现在这个例子相当无用，但是范围查询有很多用途。

您说您希望所有房屋的价格都超过 100 000 美元。如果您希望能够做到这一点，您将创建按价格列出的房屋索引。假设您有以下房屋。

house:0 -> {"color":"blue","sold":false,"city":"Stackoverville","price":500000}
house:1 -> {"color":"red","sold":true,"city":"Toronto","price":150000}
house:2 -> {"color":"beige","sold":false,"city":"Toronto","price":40000}
house:3 -> {"color":"blue","sold":false,"city":"The Blogosphere","price":110000}

在 SQL 中，您可以将每个字段存储在一列中，而不是将其全部存储在一个（在本例中为 JSON）文档中。并且可以从价格>的房屋中选择* 100000。这看起来一切都很好，但是，如果没有建立索引，则需要查看表中的每栋房子并检查其价格，如果您有几百万栋房子，这可能会很慢。因此，对于 kv 存储，您还需要一个索引。主要区别在于 SQL 数据库会默默地执行缓慢的操作，而 kv 存储则无法做到这一点。

如果您没有范围查询，则需要将索引粘贴在单个文档中，这使得安全更新变得很痛苦，并且意味着您必须为每个查询下载整个索引，这又限制了可扩展性。

house:index:price -> [{"price":500000,"id":"0"},{"price":150000,"id":"1"},{"price":110000,"id":"3"},{"price":40000,"id":"2"}]

但是，如果您有范围查询（通常称为键扫描），您可以创建如下索引：

house:index:price:040000 -> 2
house:index:price:110000 -> 3
house:index:price:150000 -> 1
house:index:price:500000 -> 0

然后您可以请求 house:index:price:100000 和 house:index:price 之间的键:: （':' 字符是 '9' 之后的字符），你会得到 [3,1,0] 这是所有比 100 000 美元更贵的房子（它们也都是按顺序有帮助）。另一个好处是它们可能位于集群的一个“分区”上，因此如果您的范围恰好超出，此查询将花费与单次获取（加上微小的额外传输开销）或两次获取相同的时间服务器边界（但这些可以并行完成！）。

这展示了如何在 kv 存储中进行查询。您可以查询任何可以作为字符串排序的内容（几乎任何内容）并快速查找。如果你没有范围查询，你将需要将整个索引存储在一个键下，这很糟糕，但如果你有范围查询，那就非常好，而且非常快。这是一个更复杂的例子。

我想要多伦多未售出的房屋，价格低于 100 000 美元。我只需要设计我的索引即可。（我添加了几栋房屋以使其更有意义）起初，您可能只是为每个房产建立另一个索引，但您很快就会意识到，这意味着您必须选择每栋未售出的房屋并从数据库中下载它。（这就是我说扩展问题立即显而易见时的意思。）解决方案是使用多索引。构建完成后，您可以准确选择所需的值。

house:index:sold:city:price:f~Fooville~000010:5        -> ""
house:index:sold:city:price:f~Toronto~040000:2         -> ""
house:index:sold:city:price:f~Toronto~140000:4         -> ""
house:index:sold:city:price:t~Stackoverville~500000:0  -> ""
house:index:sold:city:price:t~The Blogosphere~110000:3 -> ""
house:index:sold:city:price:t~Toronto~150000:1         -> ""

现在，与上一个示例不同，我将 id 放入密钥中。这使得两栋房子具有相同的属性。我可以将它们合并到值中，但是添加删除索引会变得更加困难。我还选择用 ~ 分隔数据。这是因为它按字典顺序排列在所有字母之后，确保对全名进行排序，并且我不必将每个城市填充到相同的长度。在生产系统中，我可能会使用字节 255 或 0。

现在范围 house:index:sold:city:price:f~Toronto~100000 - house:index:sold:city :price:f~Toronto~~ 将选择与查询匹配的所有房屋。需要注意的重要一点是，查询与结果数量成线性比例。这确实意味着您必须为要索引的每组属性构建一个索引（尽管我们示例中的索引也适用于已售和已售城市查询）。这可能看起来工作量很大，但最终您意识到这只是您在做，而不是您的数据库在做。我相信我们很快就会开始看到此类事情的库出现：D

在稍微扩展主题之后，我已经展示了：

kv 存储的一些用途。
如何在 kv 存储中进行查询。

我认为您会发现 kv-store 对于许多应用程序来说已经足够了，并且通常可以提供比传统 RDMS 更好的性能和可用性。话虽这么说，每个应用程序都是不同的，因此不可能回答原来的问题。

I can't answer the question of when to use a key-value (herein kv) data store but I can show you some of the examples, and answer your stackoverflow example.

With database access, most of what you need is a kv store. For example, a user logs in with the username "joe". So you look up "user:joe" in your database and retrieve his password (hash of course). Or maybe you have his password under "user:pass:joe", it really doesn't matter. If it was stack overflow and you were rendering the page http://stackoverflow.com/questions/6935566/when-to-use-a-key-value-store-for-web-development, you would look up "question:6935566" and use that. It is simple to see how kv stores can solve most of your problems.

I would like to say that a kv store is a subset of functionality provided by a traditional RDMS. This is because the design of the traditional RDMS provides many scaling issues, and generally loses features as you scale. kv stores don't come with these features, so they don't limit you. However, these features can often be created anyways, designed from the core to be scalable (because it becomes immediately obvious if they are not).

However that doesn't mean that there are things that you can't do. For example you mention querying. This is a pitfall of many kv stores, as they are generally agnostic of the value (not always true, example, redis and more) and have no way of finding what you are looking for. Worse, they are not designed to do that quickly, they are just really quick looking up by key.

One solution to this problem is to sort your keys lexicographically and allow range queries. This is essentially "give me everything between question:1 and question:5". Now that example is fairly useless, but there are many uses of range queries.

You said you want all houses more then $100 000. If you wanted to be able to do this you would create an index of houses by price. Say you had the following houses.

house:0 -> {"color":"blue","sold":false,"city":"Stackoverville","price":500000}
house:1 -> {"color":"red","sold":true,"city":"Toronto","price":150000}
house:2 -> {"color":"beige","sold":false,"city":"Toronto","price":40000}
house:3 -> {"color":"blue","sold":false,"city":"The Blogosphere","price":110000}

In SQL you would store each field in a column rather then having it all in one (in this case JSON) document. And could SELECT * FROM houses WHERE price > 100000. This seems all fine and dandy but, if there isn't an index built, this requires looking at every house in your table and checking its price, which if you have a couple million houses, could be slow. So with a kv store you need an index as well. The main difference is that the SQL database would silently do the slow thing, where the kv store wouldn't be able.

If you don't have range queries you would need to stick your index in a single document, which makes safely updating it a pain and means that you would have to download the whole index for every query, again, limiting scalability.

house:index:price -> [{"price":500000,"id":"0"},{"price":150000,"id":"1"},{"price":110000,"id":"3"},{"price":40000,"id":"2"}]

But if you have range queries (often called keyscans) you can create an index like this:

house:index:price:040000 -> 2
house:index:price:110000 -> 3
house:index:price:150000 -> 1
house:index:price:500000 -> 0

And then you could request the keys between house:index:price:100000 and house:index:price:: (the ':' character is the character after '9') and you would get [3,1,0] which is all the houses more expensive than $100 000 (they are also helpfully in order). Another nice thing about this is that they will likely be on one "partition" of your cluster so this query will take about the same time as a singe get (plus the tiny extra transfer overhead) or two gets if your range happens to go over a server boundary (but these can be done in parallel!).

So that shows how to do queries in a kv store. You can query anything that can be ordered as a string (just about anything) and look it up very quickly. If you don't have range queries you will need to store your whole index under one key which sucks, but if you have range queries it is very nice, and very fast. Here is a more complex example.

I want unsold houses in Toronto that are less then $100 000. I simply have to design my index. (I added in a couple of houses to make it more meaningful) At first thought you might just build another index for every property, but you will quickly realize that that means that you have to select every unsold house and download it from the database. (This is what I meant when I said scaling problems are immediately obvious.) The solution is to use a multi-index. Once built you can select exactly the values you want.

house:index:sold:city:price:f~Fooville~000010:5        -> ""
house:index:sold:city:price:f~Toronto~040000:2         -> ""
house:index:sold:city:price:f~Toronto~140000:4         -> ""
house:index:sold:city:price:t~Stackoverville~500000:0  -> ""
house:index:sold:city:price:t~The Blogosphere~110000:3 -> ""
house:index:sold:city:price:t~Toronto~150000:1         -> ""

Now, unlike the last example I put the id in the key. This allows two houses have the same properties. I could have merged them in the value but then adding a removing indexes becomes more difficult. I also chose to separate my data with a ~. This is because it is lexicographically after all of the letters, ensuring that the full name will be sorted and I don't have to pad every city to the same length. In a production system I would probably use the byte 255 or 0.

Now the range house:index:sold:city:price:f~Toronto~100000 - house:index:sold:city:price:f~Toronto~~ will select all houses that match the query. And the important thing to note is that query scales linearly with the number of results. This does mean that you have to build an index for every set of properties that you want to index (although the index in our example also works for sold, and sold-city queries). This may seem like a lot of work but in the end you realize that it is just that you are doing it, not your database. I'm sure we will begin to see libraries for this kind of thing coming out soon :D

After stretching the topic a bit, I have shown:

Some uses of a kv store.
How to do queries in a kv store.

I think that you will find that kv-stores are enough for many applications and can often provide better performance and availability than traditional RDMS. That being said, every app is different and therefore, it is impossible to answer the original question.

回复收藏 0 原文

怕倦 2024-12-04 23:10:30

不要将 NoSQL 类型数据库与 memcached（其无意永久存储数据）之类的数据库混淆。

memcached 的典型用途是存储一些可由 Web 服务器集群访问的查询结果 - 即。共享缓存。例如。此页面上有相关帖子的列表，数据库可能需要做一些工作才能生成该列表。如果每次有人加载页面时都这样做，那么您将为数据库创建大量工作。相反，第一次检索的结果可以存储在 memcached 服务器上，键为页面 ID。然后，集群中的任何 Web 服务器都可以非常快速地获取该信息，而无需不断访问数据库。一段时间后，缓存条目将被 memcached 清除，以便旧文章的结果不会占用空间。 [免责声明：我不知道 StackOverflow 是否真的这样做]。

另一方面，“NoSQL”数据库用于永久存储信息。如果您的数据模式非常简单并且您的查询也非常简单，那么它可能比标准 SQL 数据库更快。许多 Web 应用程序不需要非常复杂的数据库，因此 NoSQL 数据库非常适合。

回复收藏 0 原文