MongoDB 存储大量指标/分析数据的方法

发布于 2024-11-02 05:23:53 字数 624 浏览 2 评论 0原文

我们计划使用 MongoDB 来存储大量分析数据,例如浏览量和点击量。我不确定在 MongoDB 中构建文档以帮助查询和减少数据库大小的最佳方法。

我们需要记录针对页面名称、客户端和操作类型的操作。理想情况下,我们需要年/月/日/小时级别的统计数据,我们不需要或关心每秒或每分钟的视图。虽然此文档结构看起来不错,但我知道 100 个访问者会生成 100 个新文档。

{ 
  "_id" : ObjectId( "4dabdef81a34961506040000" ),
  "pagename" : "Hello",
  "action" : "view",
  "client" : "client-name",
  "time" : Date( "Mon Apr 18 07:49:28 2011" )
}

是否有最佳实践方法可以使用 $inc上限集合

We are planning on using MongoDB to store large amounts of analytics data such as views and clicks. I'm unsure on the best way to structure the documents within MongoDB to aid querying and reduce database size.

We need to record actions agains a pagename, client and the type of action. Ideally we need stats which go down the the year/month/day/hour level, we don't need or care about views per second or minute. While this document structure looks ok, I'm aware 100 vistors would generate a 100 new documents.

{ 
  "_id" : ObjectId( "4dabdef81a34961506040000" ),
  "pagename" : "Hello",
  "action" : "view",
  "client" : "client-name",
  "time" : Date( "Mon Apr 18 07:49:28 2011" )
}

Is there best practice way of doing this, either using $inc or Capped Collections?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

最舍不得你 2024-11-09 05:23:53

更新的答案

在 mongo shell 中一起破解:

use pagestats;

// a little helper function
var pagePerHour = function(pagename) {
    d = new Date();
    return {
        page : pagename,
        year: d.getUTCFullYear(),
        month: d.getUTCMonth(),
        day : d.getUTCDate(),
        hour: d.getUTCHours(),
    }
}

// a pageview happened
db.pagestats.update(
    pagePerHour('Hello'),
    { $inc : { views : 1 }},
    true ); //we want to upsert

// somebody tweeted our page twice!
db.pagestats.update(
    pagePerHour('Hello'),
    { $inc : { tweets : 2 }},
    true ); //we want to upsert

db.pagestats.find();
// { "_id" : ObjectId("4dafe88a02662f38b4a20193"),
//   "year" : 2011, "day" : 21, "hour" : 8, "month" : 3,
//   "page" : "Hello",
//   "tweets" : 2, "views" : 1 }

// 24 hour summary 'Hello' on 2011-4-21
for(i = 0; i < 24; i++) {
    //careful: days (1-31), month (0-11) and hours (0-23)
    stats = db.pagestats.findOne({ page: 'Hello', year: 2011, month: 3, day : 21, hour : i})
    if(stats) {
        print(i + ': ' + stats.views + ' views')
    } else {
        print(i + ': no hits')
    };
}

根据您想要跟踪的方面,您可能会考虑添加更多集合(例如用于以用户为中心的跟踪的集合)。希望有帮助。

另请参阅

有关分析数据的博文

Updated answer

Hacked together in the mongo shell:

use pagestats;

// a little helper function
var pagePerHour = function(pagename) {
    d = new Date();
    return {
        page : pagename,
        year: d.getUTCFullYear(),
        month: d.getUTCMonth(),
        day : d.getUTCDate(),
        hour: d.getUTCHours(),
    }
}

// a pageview happened
db.pagestats.update(
    pagePerHour('Hello'),
    { $inc : { views : 1 }},
    true ); //we want to upsert

// somebody tweeted our page twice!
db.pagestats.update(
    pagePerHour('Hello'),
    { $inc : { tweets : 2 }},
    true ); //we want to upsert

db.pagestats.find();
// { "_id" : ObjectId("4dafe88a02662f38b4a20193"),
//   "year" : 2011, "day" : 21, "hour" : 8, "month" : 3,
//   "page" : "Hello",
//   "tweets" : 2, "views" : 1 }

// 24 hour summary 'Hello' on 2011-4-21
for(i = 0; i < 24; i++) {
    //careful: days (1-31), month (0-11) and hours (0-23)
    stats = db.pagestats.findOne({ page: 'Hello', year: 2011, month: 3, day : 21, hour : i})
    if(stats) {
        print(i + ': ' + stats.views + ' views')
    } else {
        print(i + ': no hits')
    };
}

Depending on which aspects you want to track you might consider adding more collections (e.g. a collection for user centric tracking). Hope that helps.

See also

Blogpost about Analytics Data

暮凉 2024-11-09 05:23:53

我不会太担心空间,Mongo 在这方面可以无限​​扩展,添加更多空间会相当便宜。

需要注意的一件事是,如果您不断更新文档,它的大小将会增加,这意味着 Mongo 最终需要在索引中为其找到一个新位置。如果您有大量文档正在更新并且大小不断增加,Mongo 将需要大量复制这些文档,这可能会显着减慢速度。当然,这一切都取决于您预期的流量。

根据我的经验,使用简单的文档格式,您不需要更新文档,它可能会使您以后的查询变得复杂,但是您可以使用 map/reduce 来获取您想要的任何信息,无论您的文档结构如何(map如果有足够的经验,reduce 非常灵活,你可以做任何事情)。

I wouldn't worry too much about space, Mongo can scale pretty much infinitely in that regard, adding more space would be reasonably cheap.

One thing to be aware of is the fact that if you keep updating a document its size will grow, which means Mongo will eventually need to find a new place for it in the index. If you have a lot of documents being updated and increasing in size Mongo will need to copy these documents around a lot, this can slow stuff down significantly. Of course this all depends on how much traffic you're expecting.

Based on my experience, go with a simple document format where you don't need to update the documents, it might complicate your querying later on, but you can use map/reduce to get whatever information you want regardless of your document structure (map reduce is very flexible given enough experience you can do anything).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文