PHP 中的日志/跟踪:Scribe、Chukwa、log4php?

发布于 2024-09-11 06:02:34 字数 323 浏览 2 评论 0原文

这可能是一个非常高级的问题,需要大量解释,但我需要大量解释。

基本上我正在开发一个需要大量日志记录和跟踪的 PHP 应用程序。跟踪点击、交互、性能等。阳光下的一切。 Facebook 的 Scribe 和雅虎的 Chukwa 都是这方面的出色实现。我对log4php知之甚少。

我想要的是这种日志记录如何工作的高级概述,特别是与 PHP 应用程序结合使用时。您可以在日志被处理的地方停止;我已经知道我想使用 Hadoop/Hive 进行处理和存储。

我还希望对应用程序本身内发生的情况进行一些相当低级的研究。例如,如何获取点击行为并将其发送到记录器?我也很感激任何能帮助我入门的阅读。

This is probably a pretty high-level question that requires a lot of explaining, but I'm in need of a lot of explaining.

Basically I'm developing a PHP application that requires a lot of logging and tracking. Tracking clicks, interactions, performance, etc. etc. Anything under the sun. Facebook's Scribe and Yahoo's Chukwa are both great implementations of this. I know little about log4php.

What I want is a high-level overview of how this kind of logging works, specifically in conjunction with a PHP application. You can stop at the point where the log gets processed; I already know that I want to use Hadoop/Hive for processing and storage.

I'd also like some fairly low-level looks at what happens within the application itself. For example, how does one take the behavior of a click and send that to the logger? I'd appreciate any reading that can help get me started, as well.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

猫九 2024-09-18 06:02:34

您可以购买/获取工具来为您执行此操作或内部构建。

购买/获取:

1 - 使用 Google/Yahoo 分析标记您的页面 - 这将跟踪页面浏览量、页面流量性能、关键字的 SEO 排名等。

2 - 用于跟踪和记录用户行为,包括点击、交互和性能。我发现没有什么比 ClickTale 更好的了 - http://www.clicktale.com/default_e.aspx - 它视频记录用户会话并将这些“日志文件”放入服务器中。

内部:
1 - 在提交到日志数据库的表单中创建隐藏字段也可以。您可以为表单指定唯一的 ID,并在提交期间跟踪其操作。

我确信还有更多,但这些是基础知识。但这些并不是特定于 PHP 的。

HTH

编辑#1:

这可能超出了您的问题范围,但跟踪并不一定意味着内部数据。一个例子是向文章或页面添加“喜欢它”或“挖掘它”按钮。这将为您“记录”受欢迎程度。您可以访问 facebook 或 digg.com 查看您网站的进度。它还有助于搜索引擎优化。基本上,它是一个跟踪系统。而且它很容易使用。您可以将 PHP 片段复制并粘贴到您的代码中。如果您有 WordPress,则有一个插件 - 只需在插件搜索部分中查找“digg”、“like it”即可。

回到 Google Analytics,如果您想要超越跟踪点击次数,请继续制定目标/渠道。它将跟踪用户行为,并回答诸如“我最有价值的关键字是什么?”之类的问题。 “我的所有用户都去了哪里?” “每个页面的跳出率是多少?” “我的网站的前 3 个入口点是什么以及来自什么流量媒介?”这些都是SEO/SEM经理最关心的问题。跟踪和理解绝对是件好事。

ClickTale 从 Google Analytics 结束的地方开始。 GA会在页面级别描述用户行为,但不会在字段级别描述用户行为。具有热图的 ClickTale 将回答这些问题“我知道该页面的跳出率很高,但为什么?哪个字段对我的客户来说是一个问题字段?” “用户大部分时间都花在页面的哪个区域?” “我如何向图形人员证明某个特定部分需要重新设计?”。

编辑 #2

对于高流量站点,您将需要扩展日志记录数据库。这在报告方面确实很有帮助。我建议的是三层数据库报告结构。第 1 层 = 过去 7 天,第 2 层 = 过去 6 个月,第 1 层 = 一切。您可以根据业务修改这些。关键是,数据从一层移动到另一层。保持最新数据随时可用。您想尽快生成报告。单个巨大的数据库无法扩展。

You can buy/get the tools to do this for you or build in-house.

buy/get:

1 - Tag your pages with Google/Yahoo analytics - This will track pageviews, page flow performance, SEO ranking for keywords, etc.

2 - For tracking and logging user behavior, which include clicks, interactions and performance. I found nothing better than ClickTale - http://www.clicktale.com/default_e.aspx - It video records user sessions and puts these "log files" in a server.

in-house:
1 - Creating hidden fields in your forms that submits to a logging database also works. You specify unique IDs to forms and keep track of it's actions during submits.

I'm sure there's lots more, but these are the basics. These are not PHP specific though.

HTH

EDIT #1 :

This may be beyond the scope of your question, but tracking doesn't necessarily mean data that goes in-house. An example would be adding a "like it" or "digg it" button to articles or pages. This will "log" popularity for you. You can go to facebook or digg.com to see progress of your site. it'll also help with SEO. basically, it's a tracking system. And it's easy to use. there are PHP snippets out there that you can copy and paste to your code. If you have WordPress, there is a plugin - just look for "digg", "like it" in the plugin search section.

Going back to Google Analytics, if you want to go beyond tracking clicks, go ahead and make goals/funnels. It'll track user behavior, and answer questions such as "What were my most valuable keywords?" "where are all my users dropping off?" "what is the bounce rate for each page?" "what are the top 3 entry points to my site and from what traffic medium?" these are question SEO/SEM managers are most concerned about. and it's definitely good to track and understand.

ClickTale starts where Google Analytics ends. GA will describe user behavior in the page level, but not in the field level. ClickTale, which has heat maps, will answer these questions "I know this page has a high bounce rate, but why? which field is a problem field for my customers?" "At what area of the page do users spend most of their time in?" "how do i prove to the graphics guys that a particular section needs to be redesigned?".

EDIT #2

For high traffic sites, you will need to scale your logging DB. It really helps when it comes to reporting. What I suggest is a 3-tier database reporting structure. tier 1 = last 7 days, tier 2 = last 6 months, tier = everything. You can modify these according to the business. The point being, data moves from one tier to another. keeping fresh data readily available. You want to generate reports asap. A a single huge DB just doesn't scale.

一梦等七年七年为一梦 2024-09-18 06:02:34

您可以通过记录用户所采取的路径来监控用户点击,referrer -->新的 uri,假设两者都足够详细且具有描述性。例如,如果用户点击他的一位朋友,您应该记录 uri:

Referrer: /users/41251
Target: /users/66257

正确存储它们以便于查询和报告。在这里,像这样的直接点击会假设目标位于推荐人的页面中,朋友也是如此。如果您有更复杂的场景,请务必使用不同的 uri 进行描述,例如:/users/suggestion/14152 用于建议的连接。

加上时间戳,你就可以非常粗略地估计他们在每个页面上停留的时间,尽管用户往往会失去焦点、切换选项卡/应用程序然后再回来等等。Google Analytics 就是其中之一,在这方面做得很好。

有关用户使用热图在您网站上点击最多的位置的摘要,我喜欢免费的 (GPL) Clickheat< /a>.

You can monitor user clicks by logging the path the user is taking, referrer --> new uri, assuming both are verbose and descriptive enough. For example, if a user clicks on one of his friends you should log the uris:

Referrer: /users/41251
Target: /users/66257

storing them properly for easy querying and reporting. Here a direct click like that would assume the target is in the referrer's page, so is a friend. If you have more complicated scenarios be sure to describe them with distinct uris, eg: /users/suggestion/14152 for a suggested connection.

Add to that timestamps and you have a very rough estimate of how long they stayed on each page, although users tend to lose focus, switch tabs/applications and come back, etc. Google Analytics, for one, does this well.

For a summary of where users click most on your site using heatmaps I like the free (GPL) Clickheat.

泪意 2024-09-18 06:02:34

查看 Splunk

Check out Splunk

放血 2024-09-18 06:02:34

在进行日志记录的前端,以下是一些可能有用的示例 PHP 代码:

http://www.alphadevx.com/a/85-Logging-Messages-to-Scribe-from-PHP

在架构方面,你有很多Scribe 的灵活性。我建议在每个应用程序节点上运行一个本地 Scribe 实例,并将应用程序本地日志记录到本地主机。这些本地 Scribe 实例又可以配置为在中央 Scribe 服务器不太忙时记录到该服务器,否则它们将继续在本地对消息进行排队。您实际上在中央服务器上使用日志,日志按类别聚合。

我是 Scribe 的忠实粉丝,我认为它的设计很好,因为它的内存和处理器占用空间非常小,而且配置起来非常容易(尽管由于依赖性而安装起来很麻烦!)。它只是缺乏文档。

On the frontend where you're doing the logging from, here is some sample PHP code that you might find useful:

http://www.alphadevx.com/a/85-Logging-Messages-to-Scribe-from-PHP

In terms of the architecture, you have a lot of flexibility with Scribe. I would recommend having a local Scribe instance running on each application node, and having your application log locally to localhost. These local Scribe instances can in turn be configured to log to a central Scribe server when it is not too busy, otherwise they will continue to queue up messages locally. You actually consume your logs on the central server where they are aggregated by category.

I'm a big fan of Scribe, and I think it's designed well is so far as it's got a very small memory and processor footprint, and it is quite easy to configure (although murder to install due to the dependencies!). It just lacks documentation.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文