我应该如何在 MongoDB 中实现这个模式?
我正在尝试编写一个跟踪脚本,但在弄清楚数据库应该如何工作方面遇到了麻烦。
在 MySQL 中,我将创建一个看起来类似于
User:
username_name: string
Campaign:
title: string
description: string
link: string
UserCampaign:
user_id: integer
camp_id: integer
Click:
os: text
referer: text
camp_id: integer
user_id: integer
我需要能够:
- 查看每次点击的信息,如 IP、Referer、OS 等
- 点击次数
- 查看来自 X IP、X Referer、X OS Associate 的 每次点击一个用户和一个营销活动
如果我按照以下方式做一些事情,
User {
Campaigns: [
{
Clicks: []
}
]
}
我会遇到两个问题:
- 它为每个用户创建一个新的营销活动对象,这是一个问题,因为如果我需要更新我的营销活动,我需要更新每个用户的对象
- 我希望 Clicks 数组包含大量数据,我觉得将它作为 User 对象的一部分会使查询速度非常慢
I'm trying to write a tracking script and I'm having trouble with figuring out how the database should work.
In MySQL I'd create a table that looks similar to
User:
username_name: string
Campaign:
title: string
description: string
link: string
UserCampaign:
user_id: integer
camp_id: integer
Click:
os: text
referer: text
camp_id: integer
user_id: integer
I need to be able to:
- See the information from each click like IP, Referer, OS, etc
- See how many often clicks are coming from X IP, X Referer, X OS
- Associate each click with a User and a Campaign
If I do something along the lines of
User {
Campaigns: [
{
Clicks: []
}
]
}
I run into two problems:
- It creates a new campaign object for each user which is a problem because if I need to update my campaign I'd need to update the object for each user
- I expect the Clicks array to contain a LARGE amount of data, I feel like having it a part of the User object will make it very slow to query
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
好的,我认为你需要将其分解为基本的“品种”。
您有两个“实体”样式对象:
User
Campaign
您有一个“映射”样式对象:
UserCampaign
您有一个“事务”对象-style object:
Click
第 1 步:实体
让我们从简单的开始:
User
&活动
。它们确实是两个独立的对象,没有一个对象真正依赖于另一个对象的存在。两者之间也没有隐式的层次结构:用户不属于营销活动,营销活动也不属于用户。当你有两个像这样的顶级对象时,它们通常会获得自己的收藏。因此,您需要一个
Users
集合和一个Campaigns
集合。第 2 步:映射
UserCampaign
当前用于表示 N 到 M 映射。现在,一般来说,当您有 N 到 1 的映射时,您可以将 N 放在 1 的内部。但是,对于 N 到 M 的映射,您通常必须“选择一边”。理论上,您可以执行以下操作之一:
User
中放置一个Campaign ID
列表Users ID
列表在每个Campaign
中,就我个人而言,我会做#1。您可能有更多的用户参与营销活动,并且您可能希望将数组放在更短的地方。
第 3 步:交易
点击次数确实是一个完全不同的野兽。从客观角度来看,您可以认为以下内容:
点击次数
“属于”用户
,点击次数
“属于”广告系列.因此,从理论上讲,您可以将点击存储为这些对象中的任何一个的一部分。人们很容易认为点击属于用户或营销活动。
但如果真正深入挖掘的话,上面的简化确实是有缺陷的。在您的系统中,
Clicks
实际上是一个中心对象。事实上,您甚至可以说 Users &营销活动实际上只是与点击“相关”。查看您提出的问题/疑问。所有这些问题实际上都围绕着点击。 用户和广告系列不是数据中的中心对象,点击才是。
此外,点击将成为系统中最丰富的数据。您将获得比其他任何东西都多的点击次数。
这是为此类数据设计模式时最大的问题。有时,当“父”对象不是最重要的事情时,您需要推迟它们。想象一下构建一个简单的电子商务系统。很明显,
orders
将“属于”用户
,但orders
对于系统来说是如此重要,以至于它将成为“顶级” “ 目的。总结
您可能需要三个集合:
这应该满足您的所有查询需求:
db.clicks.group()
的点击次数或运行 Map-Reduce。db.clicks.find({user_id : blah})
也可以将点击 ID 推送到用户和营销活动中(如果有意义的话)。请注意,如果您有大量的点击,您确实必须分析您运行最多的查询。您无法对每个字段建立索引,因此您通常需要运行 Map-Reduce 来“汇总”这些查询的数据。
OK, I think you need to break this out into the basic "varieties".
You have two "entity"-style objects:
User
Campaign
You have one "mapping"-style object:
UserCampaign
You have one "transactional"-style object:
Click
Step 1: entity
Let's start with the easy ones:
User
&Campaign
. These are truly two separate objects, neither one really depends on the other for its existence. There's also no implicit heirarchy between the two: Users do not belong to Campaigns, nor do Campaigns belong to Users.When you have two top-level objects like this, they generally earn their own collection. So you'll want a
Users
collection and aCamapaigns
collection.Step 2: mapping
UserCampaign
is currently used to represent an N-to-M mapping. Now, in general, when you have an N-to-1 mapping, you can put the N inside of the 1. However, with the N-to-M mapping, you generally have to "pick a side".In theory, you could do one of the following:
Campaign ID
s inside of eachUser
Users ID
s inside of eachCampaign
Personally, I would do #1. You probably have way more users that campaigns, and you probably want to put the array where it will be shorter.
Step 3: transactional
Clicks is really a completely different beast. In object terms you could think the following:
Clicks
"belong to" aUser
,Clicks
"belong to" aCampaign
. So, in theory, you could just store clicks are part of either of these objects. It's easy to think that Clicks belong under Users or Campaigns.But if you really dig deeper, the above simplification is really flawed. In your system,
Clicks
are really a central object. In fact, you might even be able to say that Users & Campaigns are really just "associated with" the click.Take a look at the questions / queries that you're asking. All of those questions actually center around clicks. Users & Campaigns are not the central object in your data, Clicks are.
Additionally, Clicks are going to be the most plentiful data in your system. You're going to have way more clicks than anything else.
This is the biggest hitch when designing a schema for data like this. Sometimes you need to push off "parent" objects when they're not the most important thing. Imagine building a simple e-commerce system. It's clear that
orders
would "belong to"users
, butorders
is so central to the system that it's going to be a "top-level" object.Wrapping it up
You'll probably want three collections:
This should satisfy all of your query needs:
db.clicks.group()
or run a Map-Reduce.db.clicks.find({user_id : blah})
It's also possible to push click IDs into both users and campaigns (if that makes sense).Please note that if you have lots and lots of clicks, you'll really have to analyze the queries you run most. You can't index on every field, so you'll often want to run Map-Reduces to "roll-up" the data for these queries.
我在这里看到的主要问题是您试图将关系数据库概念应用到面向文档的数据库。两者之间的主要区别在于,您不必担心 NOSQL 数据库中的架构或结构,而是担心集合和文档。
理解 NOSQL 的许多实现中没有像 SQL 中那样的连接概念是非常重要/必须的。这意味着,如果您将数据分布在多个集合中,那么您需要做大量工作来粘合它。此外,像 SQL 数据库规范化一样,将数据分散到多个集合中也没有其他好处。您需要考虑哪些数据是文档的一部分以及它适用于哪个集合,并且不必担心 NOSQL 数据库下的实现。因此,对于您的问题,答案可能是..并且将支持您所要求的所有内容...
db.trackclicks==>收藏
轨迹点击 = {
操作系统:XP,
用户:约翰·多伊,
营销活动:{标题:测试,desc:测试,链接:url},
推荐人:google.com
}
The main problem i see here is that you are trying to apply the relational database concepts in to a document oriented database. The main difference between the two is that you don't worry about schema or structure in the NOSQL databases but rather about collection and documents.
It is very important/imperative to understand that there is no concepts of join in many implementations of the NOSQL as in SQL. This means if you spread your data across collections then you do a lot of work to glue it later. Also there is no other gain by spreading your data across collections as in normalizations of SQL db. You need to think what data is part of your document and which collection it applies to and never worry about implementations underneath NOSQL db. So for your problem the answer could be..and will support all you asked for...
db.trackclicks==> collection
trackclick = {
OS : XP,
User : John Doe,
Campaign : {title: test,desc: test,link : url},
Referrer : google.com
}
如果某个公司的某些内容发生了变化,mongodb 更新大量文档不是问题。
是否有嵌套集合实际上取决于集合中有多少数据。
在您的情况下,如果您知道“点击”集合将包含“大量数据”,您需要创建一个单独的集合。因为对于“点击”,您肯定需要分页、过滤等,并且用户将是“轻型”集合。
所以我建议如下:
It is not problem for mongodb to update big amount of documents if something in some company was changed.
Have nested collection or no really depends on how much data in collection.
In your case if you know that 'Clicks' collection will contain 'LARGE amount of data' you need to create a separate collection. Because for sure for the 'Clicks' you will need paging, filtering and etc. and than user will be 'light' collection.
So i suggest following: