社交网络和使用记录
从第一天起,应该在社交网络类型的网站上记录什么样的数据,以便将来可以执行有用的统计分析?另外,您还学到了哪些有关站点日志记录的其他提示和技巧?根据站点的规模,是否经常值得记录到平面文件,并出于站点性能原因让作业定期将该数据加载到数据库中?
我在这里考虑服务器端日志记录 - 不仅仅是通用的谷歌分析/piwik 类型日志记录。为了快速给出答案,这里有一些我想到的简单问题:
- IP 地址
- 用户识别信息,如果登录(userid)
- HTTP_REFERRER
- 是 ajax 调用(bool)
- 会话 id(会话也应该永久单独记录) ?)
- 自会话开始以来的第 N 个视图,
- 以某种信息指示用户所在的页面(正在使用的控制器?Url 路径?)
- 时间戳
What sort of data should be logged on a social networking type of site from day 1 so that in the future, useful statistical analysis may be performed? Also, what other tips and tricks have you learned with site logging? Depending on the scale of the site, is it frequently worth it to log to a flat file, and have a job periodically load that data into a db for site-performance reasons?
I am thinking of server side logging here - not just generic google analytics / piwik type logging. To give a jumpstart to the answer, here are a few no-brainers I've thought of:
- ip address
- user identification info, if logged in (userid)
- HTTP_REFERRER
- is ajax call (bool)
- session id (should sessions also be permanently logged separately?)
- Nth # of views since session began
- some sort of information to indicate what page user is on (controller being used? Url path?)
- timestamp
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
好吧,对于初学者来说,“通用谷歌分析/ piwik 类型日志记录”实际上通常比服务器端日志处理更强大 - 您可以设置/获取各种 cookie,您可以从仅适用于 Javascript 的客户端提取大量信息,等等即使在 Javascript 中获取一个简单的 guest_id cookie 也比在服务器端容易得多 - 您必须设置一些 Web 服务器模块来推送会话 cookie,这与 WAA 标准 30 分钟等不同
。在设计要记录的变量/字段时,您需要考虑您想要使用它的报告/聚合。例如:
与流行观点“记录所有内容,稍后再整理”相反,记录不是一个被动的过程,而是一个主动的过程。您很可能最终希望向用户推送一些 cookie,以标记他们的:
所有这些东西都需要服务器(和/或Javascript集合片段)和访问者浏览器之间的交互,而不仅仅是被动日志记录。
Well, for starters, "generic google analytics / piwik type logging" is actually usually more powerful that server-side log processing - you can set/get various cookies, you can extract lots of information from client available only to Javascript, etc, etc. Even getting a simple visitor_id cookie is much easier in Javascript than in server-side - you'll have to set up some web server module to push session cookies, it will be different from WAA standard 30 minutes, etc, etc.
Generally, when designing variables/fields to log, you'd want to think of what reports/aggregations would you want to get using it. For example:
Contrary to popular opinion "log everything, sort them out later", logging is not a passive, but an active process. You'll most likely end up wanting to push some cookies to the users that would mark their:
All this stuff requires interaction between server (and/or Javascript collection snippet) and visitor's browser, not just passive logging.
记录每个请求(查询字符串等)。记录所有 HTTP 变量
'HTTP_ACCEPT'、'HTTP_ACCEPT_CHARSET'、'HTTP_ACCEPT_ENCODING'、'HTTP_ACCEPT_LANGUAGE'
'HTTP_CONNECTION'、'HTTP_HOST'、'HTTP_REFERER'、'HTTP_USER_AGENT'
(可能与每个请求一起)。
由于您从第一天起就感兴趣,因此不必担心可以从原始日志中导出的信息。您可以稍后进行任何您想要的处理。
如果资源是一个约束(它们不应该在开始时),您可以像 HTTP_USER_AGENT 上的 hash 等进行优化。
Log each and every request (query string, etc). Log all HTTP variables
'HTTP_ACCEPT', 'HTTP_ACCEPT_CHARSET', 'HTTP_ACCEPT_ENCODING', 'HTTP_ACCEPT_LANGUAGE'
'HTTP_CONNECTION', 'HTTP_HOST', 'HTTP_REFERER', 'HTTP_USER_AGENT'
(perhaps with each request).
As you are interested from day 1, don't worry about information that can be derived from the raw logs. You can do whatever processing you want later.
If resources are a constraint (they should not be in the beginning), you can optimize like hash on the HTTP_USER_AGENT etc.
高流量网站的 PHP 编码人员应该考虑 Scribe。 Scribe 最初由 Facebook 开发,现已开源,是在应用程序中记录事件以供日后分析的好方法。有关 scribe 和其他技巧的更多信息,请查看这篇关于用于分析目的的日志记录的文章。
PHP coders of high traffic sites should look into Scribe. Originally developed by Facebook and now open source, Scribe is a great way to log events in your app for analysis later on. For more information on scribe and other tips, check out this article on logging for analysis purposes.
您可能已经知道,记录太多而不是太少。
如果您记录所有请求的请求行和标头,您应该有很多信息可供稍后深入研究。例如。这将为您提供上面列出的大部分内容(或者可以从中扣除)。
As you probably already know, log too much rather than too little.
If you log the request line and headers of all requests, you should have a lot of information to dig into at a later point. Eg. that will give you most of the things that you list above (Or they could be deducted from it).