我应该如何在 MongoDB 中实现这个模式?

发布于 2024-10-12 03:32:52 字数 750 浏览 1 评论 0原文

我正在尝试编写一个跟踪脚本,但在弄清楚数据库应该如何工作方面遇到了麻烦。

在 MySQL 中,我将创建一个看起来类似于

User:
   username_name: string

Campaign:
   title: string
   description: string
   link: string

UserCampaign:
   user_id: integer
   camp_id: integer

Click:
   os: text
   referer: text
   camp_id: integer
   user_id: integer

我需要能够:

  • 查看每次点击的信息,如 IP、Referer、OS 等
  • 点击次数
  • 查看来自 X IP、X Referer、X OS Associate 的 每次点击一个用户和一个营销活动

如果我按照以下方式做一些事情,

User {
     Campaigns: [
         {
           Clicks: []
         }
     ]
}

我会遇到两个问题:

  • 它为每个用户创建一个新的营销活动对象,这是一个问题,因为如果我需要更新我的营销活动,我需要更新每个用户的对象
  • 我希望 Clicks 数组包含大量数据,我觉得将它作为 User 对象的一部分会使查询速度非常慢

I'm trying to write a tracking script and I'm having trouble with figuring out how the database should work.

In MySQL I'd create a table that looks similar to

User:
   username_name: string

Campaign:
   title: string
   description: string
   link: string

UserCampaign:
   user_id: integer
   camp_id: integer

Click:
   os: text
   referer: text
   camp_id: integer
   user_id: integer

I need to be able to:

  • See the information from each click like IP, Referer, OS, etc
  • See how many often clicks are coming from X IP, X Referer, X OS
  • Associate each click with a User and a Campaign

If I do something along the lines of

User {
     Campaigns: [
         {
           Clicks: []
         }
     ]
}

I run into two problems:

  • It creates a new campaign object for each user which is a problem because if I need to update my campaign I'd need to update the object for each user
  • I expect the Clicks array to contain a LARGE amount of data, I feel like having it a part of the User object will make it very slow to query

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

掀纱窥君容 2024-10-19 03:32:52

好的,我认为你需要将其分解为基本的“品种”。

您有两个“实体”样式对象:

  • User
  • Campaign

您有一个“映射”样式对象:

  • UserCampaign

您有一个“事务”对象-style object:

  • Click

第 1 步:实体

让我们从简单的开始:User & 活动。它们确实是两个独立的对象,没有一个对象真正依赖于另一个对象的存在。两者之间也没有隐式的层次结构:用户不属于营销活动,营销活动也不属于用户。

当你有两个像这样的顶级对象时,它们通常会获得自己的收藏。因此,您需要一个 Users 集合和一个 Campaigns 集合。

第 2 步:映射

UserCampaign 当前用于表示 N 到 M 映射。现在,一般来说,当您有 N 到 1 的映射时,您可以将 N 放在 1 的内部。但是,对于 N 到 M 的映射,您通常必须“选择一边”。

理论上,您可以执行以下操作之一:

  1. 在每个 User 中放置一个 Campaign ID 列表
  2. 放置一个 Users ID 列表在每个Campaign中,

就我个人而言,我会做#1。您可能有更多的用户参与营销活动,并且您可能希望将数组放在更短的地方。

第 3 步:交易

点击次数确实是一个完全不同的野兽。从客观角度来看,您可以认为以下内容:点击次数“属于”用户点击次数“属于”广告系列.因此,从理论上讲,您可以将点击存储为这些对象中的任何一个的一部分。人们很容易认为点击属于用户或营销活动。

但如果真正深入挖掘的话,上面的简化确实是有缺陷的。在您的系统中,Clicks 实际上是一个中心对象。事实上,您甚至可以说 Users &营销活动实际上只是与点击“相关”。

查看您提出的问题/疑问。所有这些问题实际上都围绕着点击。 用户和广告系列不是数据中的中心对象,点击才是。

此外,点击将成为系统中最丰富的数据。您将获得比其他任何东西都多的点击次数。

这是为此类数据设计模式时最大的问题。有时,当“父”对象不是最重要的事情时,您需要推迟它们。想象一下构建一个简单的电子商务系统。很明显,orders 将“属于”用户,但 orders 对于系统来说是如此重要,以至于它将成为“顶级” “ 目的。

总结

您可能需要三个集合:

  1. User ->有活动列表。_id
  2. 活动
  3. 点击 ->包含 user._id、campaign._id

这应该满足您的所有查询需求:

查看每次点击的信息,例如 IP、Referer、操作系统等

db.clicks.find()

查看来自 X IP、X Referer、X OS 的点击次数

db.clicks.group() 的点击次数或运行 Map-Reduce

将每次点击与用户和营销活动相关联

db.clicks.find({user_id : blah}) 也可以将点击 ID 推送到用户和营销活动中(如果有意义的话)。

请注意,如果您有大量的点击,您确实必须分析您运行最多的查询。您无法对每个字段建立索引,因此您通常需要运行 Map-Reduce 来“汇总”这些查询的数据。

OK, I think you need to break this out into the basic "varieties".

You have two "entity"-style objects:

  • User
  • Campaign

You have one "mapping"-style object:

  • UserCampaign

You have one "transactional"-style object:

  • Click

Step 1: entity

Let's start with the easy ones: User & Campaign. These are truly two separate objects, neither one really depends on the other for its existence. There's also no implicit heirarchy between the two: Users do not belong to Campaigns, nor do Campaigns belong to Users.

When you have two top-level objects like this, they generally earn their own collection. So you'll want a Users collection and a Camapaigns collection.

Step 2: mapping

UserCampaign is currently used to represent an N-to-M mapping. Now, in general, when you have an N-to-1 mapping, you can put the N inside of the 1. However, with the N-to-M mapping, you generally have to "pick a side".

In theory, you could do one of the following:

  1. Put a list of Campaign IDs inside of each User
  2. Put a list of Users IDs inside of each Campaign

Personally, I would do #1. You probably have way more users that campaigns, and you probably want to put the array where it will be shorter.

Step 3: transactional

Clicks is really a completely different beast. In object terms you could think the following: Clicks "belong to" a User, Clicks "belong to" a Campaign. So, in theory, you could just store clicks are part of either of these objects. It's easy to think that Clicks belong under Users or Campaigns.

But if you really dig deeper, the above simplification is really flawed. In your system, Clicks are really a central object. In fact, you might even be able to say that Users & Campaigns are really just "associated with" the click.

Take a look at the questions / queries that you're asking. All of those questions actually center around clicks. Users & Campaigns are not the central object in your data, Clicks are.

Additionally, Clicks are going to be the most plentiful data in your system. You're going to have way more clicks than anything else.

This is the biggest hitch when designing a schema for data like this. Sometimes you need to push off "parent" objects when they're not the most important thing. Imagine building a simple e-commerce system. It's clear that orders would "belong to" users, but orders is so central to the system that it's going to be a "top-level" object.

Wrapping it up

You'll probably want three collections:

  1. User -> has list of campaign._id
  2. Campaign
  3. Clicks -> contains user._id, campaign._id

This should satisfy all of your query needs:

See the information from each click like IP, Referer, OS, etc

db.clicks.find()

See how many often clicks are coming from X IP, X Referer, X OS

db.clicks.group() or run a Map-Reduce.

Associate each click with a User and a Campaign

db.clicks.find({user_id : blah}) It's also possible to push click IDs into both users and campaigns (if that makes sense).

Please note that if you have lots and lots of clicks, you'll really have to analyze the queries you run most. You can't index on every field, so you'll often want to run Map-Reduces to "roll-up" the data for these queries.

笙痞 2024-10-19 03:32:52

我在这里看到的主要问题是您试图将关系数据库概念应用到面向文档的数据库。两者之间的主要区别在于,您不必担心 NOSQL 数据库中的架构或结构,而是担心集合和文档。

理解 NOSQL 的许多实现中没有像 SQL 中那样的连接概念是非常重要/必须的。这意味着,如果您将数据分布在多个集合中,那么您需要做大量工作来粘合它。此外,像 SQL 数据库规范化一样,将数据分散到多个集合中也没有其他好处。您需要考虑哪些数据是文档的一部分以及它适用于哪个集合,并且不必担心 NOSQL 数据库下的实现。因此,对于您的问题,答案可能是..并且将支持您所要求的所有内容...

db.trackclicks==>收藏
轨迹点击 = {
操作系统:XP,
用户:约翰·多伊,
营销活动:{标题:测试,desc:测试,链接:url},
推荐人:google.com
}

The main problem i see here is that you are trying to apply the relational database concepts in to a document oriented database. The main difference between the two is that you don't worry about schema or structure in the NOSQL databases but rather about collection and documents.

It is very important/imperative to understand that there is no concepts of join in many implementations of the NOSQL as in SQL. This means if you spread your data across collections then you do a lot of work to glue it later. Also there is no other gain by spreading your data across collections as in normalizations of SQL db. You need to think what data is part of your document and which collection it applies to and never worry about implementations underneath NOSQL db. So for your problem the answer could be..and will support all you asked for...

db.trackclicks==> collection
trackclick = {
OS : XP,
User : John Doe,
Campaign : {title: test,desc: test,link : url},
Referrer : google.com
}

与他有关 2024-10-19 03:32:52
  1. 如果某个公司的某些内容发生了变化,mongodb 更新大量文档不是问题。

  2. 是否有嵌套集合实际上取决于集合中有多少数据。
    在您的情况下,如果您知道“点击”集合将包含“大量数据”,您需要创建一个单独的集合。因为对于“点击”,您肯定需要分页、过滤等,并且用户将是“轻型”集合。

所以我建议如下:

User {
     Campaigns: []
}

Clicks {
 user_id,
 camp_id
}
  1. It is not problem for mongodb to update big amount of documents if something in some company was changed.

  2. Have nested collection or no really depends on how much data in collection.
    In your case if you know that 'Clicks' collection will contain 'LARGE amount of data' you need to create a separate collection. Because for sure for the 'Clicks' you will need paging, filtering and etc. and than user will be 'light' collection.

So i suggest following:

User {
     Campaigns: []
}

Clicks {
 user_id,
 camp_id
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文