如何在 Riak 中构建数据?

发布于 2024-10-18 11:15:32 字数 596 浏览 6 评论 0 原文

我试图弄清楚如何在 Riak 中对数据进行建模。假设您正在构建类似 CMS 的内容,其中包含两个功能:新闻和产品。您需要能够为多个客户端 X 和 Y 存储此信息。您通常会如何构建此信息?

  1. 每个客户端一个存储桶,然后两个密钥新闻产品。在每个键下存储多个对象,然后使用 map/reduce 对它们进行排序。

  2. 将新闻和产品存储在同一个存储桶中,但为每个新闻项和产品项使用新的自动生成的密钥。也就是说,一个存储桶用于 X,一个存储桶用于 Y。

  3. 每个客户端/功能组合一个存储桶,即存储桶将是X-新闻X-产品 >、Y-新闻Y-产品。然后在整个存储桶上使用map/reduce按顺序返回结果。

解决这个问题的最佳方法是什么?

I'm trying to figure out how to model data in Riak. Let's say you are building something like a CMS with two features, news and products. You need to be able to store this information for multiple clients X and Y. How would you typically structure this?

  1. One bucket per client and then two keys news and products. Store multiple objects under each key and then use map/reduce to order them.

  2. Store both the news and the products in the same bucket, but with a new autogenerated key for each news item and product item. That is, one bucket for X and one for Y.

  3. One bucket per client/feature combination, that is, the buckets would be X-news, X-products, Y-news and Y-products. Then use map/reduce on the whole bucket to return the results in order.

Which would be the best way to handle this problem?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

阳光下慵懒的猫 2024-10-25 11:15:32

我会创建两个桶:新闻和产品。
然后我会在每个存储桶中的键前面加上客户端名称。
我可能还会在新闻键中包含日期,以便于确定日期范围。

news/acme_2011-02-23_01
news/acme_2011-02-23_02
news/bigcorp_2011-02-21_01

并可选择在产品名称前添加类别名称前缀

products/acme_blacksmithing_anvil
products/bigcorp_databases_oracle

然后在您的 map/reduce 中您可以使用 关键过滤

// BigCorp News items
{
  "inputs":{
     "bucket":"news",
     "key_filters":[["starts_with", "bigcorp"]]
  }
  // ... rest of mapreduce job
}

// Acme Blacksmithing items
{
  "inputs":{
     "bucket":"products",
     "key_filters":[["starts_with", "acme_blacksmithing"]]
  }
  // ... rest of mapreduce job
}

// News for all clients from Feb 12th to 19th
{
  "inputs":{
     "bucket":"news",
     "key_filters":[["tokenize", "_", 2],
                    ["between", "2011-02-12", "2011-02-19"]]
  }
  // ... rest of mapreduce job
}

I'd create 2 buckets: news and products.
Then I'd prefix keys in each bucket with client names.
I'd probably also include dates in news keys for easy date ranging.

news/acme_2011-02-23_01
news/acme_2011-02-23_02
news/bigcorp_2011-02-21_01

And optionally prefix product names with category names

products/acme_blacksmithing_anvil
products/bigcorp_databases_oracle

Then in your map/reduce you could use key filtering:

// BigCorp News items
{
  "inputs":{
     "bucket":"news",
     "key_filters":[["starts_with", "bigcorp"]]
  }
  // ... rest of mapreduce job
}

// Acme Blacksmithing items
{
  "inputs":{
     "bucket":"products",
     "key_filters":[["starts_with", "acme_blacksmithing"]]
  }
  // ... rest of mapreduce job
}

// News for all clients from Feb 12th to 19th
{
  "inputs":{
     "bucket":"news",
     "key_filters":[["tokenize", "_", 2],
                    ["between", "2011-02-12", "2011-02-19"]]
  }
  // ... rest of mapreduce job
}
前事休说 2024-10-25 11:15:32

比使用密钥过滤(根据 Kev Burns 的建议)更有效的方法是使用 辅助索引Riak Search,来模拟这个场景。

看看我对 Which clustered NoSQL 的回答用于消息存储目的的数据库?Riak 中的链接:与图数据库相比,它们能做什么/不能做什么? 进行类似案例的讨论。

根据您的用例,您需要做出几个决定。在所有情况下,您都会从一个公司存储桶开始,以便每个公司都有一个唯一的密钥。

1) 是将感兴趣的项目存储在 2 个单独的存储桶(新闻产品)中还是存储在一个存储桶中(例如 items_of_interest),取决于您的偏好和查询的方便性。如果您总是要在单个查询中查询一家公司的新闻和产品,那么您不妨将它们存储在单个存储桶中。但我建议使用 2 个单独的标签,以便更轻松地跟踪它们,特别是如果您有“公司 X - 产品”和“公司 X - 新闻”等单独的选项卡或页面。如果您需要将它们组合成一个提要,您将进行 2 个查询(一个用于新闻,一个用于产品),并将它们组合到客户端代码中(按日期或其他)。

2) 如果新闻/产品项目可以有且仅一个其所属的公司,请为每个项目在company_key 上创建一个二级索引。这样,您就可以通过该公司的二级索引 (2i) 查询轻松获取该公司的所有新闻或产品。

3) 如果存在多对多关系(如果一个新闻/产品项目可以属于多个公司(也许该新闻项目是关于 2 个独立公司的合资企业)),那么我建议将关系建模为一个单独的 Riak 对象。例如,您可以创建一个 mentions 存储桶,对于新闻报道中提到的每个公司,您将插入一个 Mention 对象,该对象具有自己的唯一键、company_key 的二级索引,并且该值将包含类型(“新闻”或“产品”)和 item_key(新闻密钥或产品密钥)。
像这样提取与单独 Riak 对象的关系可以让您做很多有趣的事情——使用 Riak 搜索任意标记它们,查询它们以获取订阅事件通知等。

An even more efficient approach to this than using key filtering (as per Kev Burns's recommendation) is to use Secondary Indexes or Riak Search, to model this scenario.

Take a look at my answers to Which clustered NoSQL DB for a Message Storing purpose? and Links in Riak: what can they do/not do, compared to graph databases? for a discussion of similar cases.

You have several decisions to make, depending on your use case. In all cases, you would start out with a company bucket, so that each company has a unique key.

1) Whether to store the items of interest in 2 separate buckets (news and products) or in one (something like items_of_interest) depends on your preference and ease of querying. If you're always going to be querying for both news and products for a company in a single query, you might as well store them in a single bucket. But I recommend using 2 separate ones, to keep easier track of them, especially if you'll have something like separate tabs or pages for "Company X - Products" and "Company X - News". And if you need to combine them into a single feed, you would make 2 queries (one for news and one for products), and combine them in the client code (by date or whatever).

2) If a news/product item can have one and only one company that it belongs to, create a secondary index on company_key for each item. That way, you can easily fetch all news or products for a company via a secondary index (2i) query for that company.

3) If there's a many-to-many relationship (if a news/product item can belong to several companies (perhaps the news item is about a joint venture for 2 separate companies)), then I recommend modeling the relationship as a separate Riak object. For example, you could create a mentions bucket, and for each company mentioned in a news story, you would insert a Mention object, with its own unique key, a secondary index for company_key, and the value would contain a type ('news' or 'product') and an item_key (news key or product key).
Extracting relationships to separate Riak objects like this allows you to do a lot of interesting things -- tag them arbitrarily using Riak Search, query them for subscription event notifications, etc.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文