如何在 Riak 中构建数据？

发布于 2024-10-18 11:15:32 字数 596 浏览 6 评论 0 原文

我试图弄清楚如何在 Riak 中对数据进行建模。假设您正在构建类似 CMS 的内容，其中包含两个功能：新闻和产品。您需要能够为多个客户端 X 和 Y 存储此信息。您通常会如何构建此信息？

每个客户端一个存储桶，然后两个密钥新闻和产品。在每个键下存储多个对象，然后使用 map/reduce 对它们进行排序。
将新闻和产品存储在同一个存储桶中，但为每个新闻项和产品项使用新的自动生成的密钥。也就是说，一个存储桶用于 X，一个存储桶用于 Y。
每个客户端/功能组合一个存储桶，即存储桶将是X-新闻、X-产品 >、Y-新闻 和Y-产品。然后在整个存储桶上使用map/reduce按顺序返回结果。

解决这个问题的最佳方法是什么？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

阳光下慵懒的猫 2024-10-25 11:15:32

我会创建两个桶：新闻和产品。
然后我会在每个存储桶中的键前面加上客户端名称。
我可能还会在新闻键中包含日期，以便于确定日期范围。

news/acme_2011-02-23_01
news/acme_2011-02-23_02
news/bigcorp_2011-02-21_01

并可选择在产品名称前添加类别名称前缀

products/acme_blacksmithing_anvil
products/bigcorp_databases_oracle

然后在您的 map/reduce 中您可以使用关键过滤：

// BigCorp News items
{
  "inputs":{
     "bucket":"news",
     "key_filters":[["starts_with", "bigcorp"]]
  }
  // ... rest of mapreduce job
}

// Acme Blacksmithing items
{
  "inputs":{
     "bucket":"products",
     "key_filters":[["starts_with", "acme_blacksmithing"]]
  }
  // ... rest of mapreduce job
}

// News for all clients from Feb 12th to 19th
{
  "inputs":{
     "bucket":"news",
     "key_filters":[["tokenize", "_", 2],
                    ["between", "2011-02-12", "2011-02-19"]]
  }
  // ... rest of mapreduce job
}

I'd create 2 buckets: news and products.
Then I'd prefix keys in each bucket with client names.
I'd probably also include dates in news keys for easy date ranging.

news/acme_2011-02-23_01
news/acme_2011-02-23_02
news/bigcorp_2011-02-21_01

And optionally prefix product names with category names

products/acme_blacksmithing_anvil
products/bigcorp_databases_oracle

Then in your map/reduce you could use key filtering:

// BigCorp News items
{
  "inputs":{
     "bucket":"news",
     "key_filters":[["starts_with", "bigcorp"]]
  }
  // ... rest of mapreduce job
}

// Acme Blacksmithing items
{
  "inputs":{
     "bucket":"products",
     "key_filters":[["starts_with", "acme_blacksmithing"]]
  }
  // ... rest of mapreduce job
}

// News for all clients from Feb 12th to 19th
{
  "inputs":{
     "bucket":"news",
     "key_filters":[["tokenize", "_", 2],
                    ["between", "2011-02-12", "2011-02-19"]]
  }
  // ... rest of mapreduce job
}

回复收藏 0 原文

前事休说 2024-10-25 11:15:32

比使用密钥过滤（根据 Kev Burns 的建议）更有效的方法是使用辅助索引或 Riak Search，来模拟这个场景。

看看我对 Which clustered NoSQL 的回答用于消息存储目的的数据库？和Riak 中的链接：与图数据库相比，它们能做什么/不能做什么？进行类似案例的讨论。

根据您的用例，您需要做出几个决定。在所有情况下，您都会从一个公司存储桶开始，以便每个公司都有一个唯一的密钥。

1) 是将感兴趣的项目存储在 2 个单独的存储桶（新闻和产品）中还是存储在一个存储桶中（例如 items_of_interest），取决于您的偏好和查询的方便性。如果您总是要在单个查询中查询一家公司的新闻和产品，那么您不妨将它们存储在单个存储桶中。但我建议使用 2 个单独的标签，以便更轻松地跟踪它们，特别是如果您有“公司 X - 产品”和“公司 X - 新闻”等单独的选项卡或页面。如果您需要将它们组合成一个提要，您将进行 2 个查询（一个用于新闻，一个用于产品），并将它们组合到客户端代码中（按日期或其他）。

2) 如果新闻/产品项目可以有且仅一个其所属的公司，请为每个项目在company_key 上创建一个二级索引。这样，您就可以通过该公司的二级索引 (2i) 查询轻松获取该公司的所有新闻或产品。

3) 如果存在多对多关系（如果一个新闻/产品项目可以属于多个公司（也许该新闻项目是关于 2 个独立公司的合资企业）），那么我建议将关系建模为一个单独的 Riak 对象。例如，您可以创建一个 mentions 存储桶，对于新闻报道中提到的每个公司，您将插入一个 Mention 对象，该对象具有自己的唯一键、company_key 的二级索引，并且该值将包含类型（“新闻”或“产品”）和 item_key（新闻密钥或产品密钥）。
像这样提取与单独 Riak 对象的关系可以让您做很多有趣的事情——使用 Riak 搜索任意标记它们，查询它们以获取订阅事件通知等。

An even more efficient approach to this than using key filtering (as per Kev Burns's recommendation) is to use Secondary Indexes or Riak Search, to model this scenario.

Take a look at my answers to Which clustered NoSQL DB for a Message Storing purpose? and Links in Riak: what can they do/not do, compared to graph databases? for a discussion of similar cases.

You have several decisions to make, depending on your use case. In all cases, you would start out with a company bucket, so that each company has a unique key.

1) Whether to store the items of interest in 2 separate buckets (news and products) or in one (something like items_of_interest) depends on your preference and ease of querying. If you're always going to be querying for both news and products for a company in a single query, you might as well store them in a single bucket. But I recommend using 2 separate ones, to keep easier track of them, especially if you'll have something like separate tabs or pages for "Company X - Products" and "Company X - News". And if you need to combine them into a single feed, you would make 2 queries (one for news and one for products), and combine them in the client code (by date or whatever).

2) If a news/product item can have one and only one company that it belongs to, create a secondary index on company_key for each item. That way, you can easily fetch all news or products for a company via a secondary index (2i) query for that company.

3) If there's a many-to-many relationship (if a news/product item can belong to several companies (perhaps the news item is about a joint venture for 2 separate companies)), then I recommend modeling the relationship as a separate Riak object. For example, you could create a mentions bucket, and for each company mentioned in a news story, you would insert a Mention object, with its own unique key, a secondary index for company_key, and the value would contain a type ('news' or 'product') and an item_key (news key or product key).
Extracting relationships to separate Riak objects like this allows you to do a lot of interesting things -- tag them arbitrarily using Riak Search, query them for subscription event notifications, etc.