为什么 Google Analytics 等 Web Analytics 使用维度和指标而不是 SQL 语句?
正当我再次熟悉SQL语句时,当我从Google Analytics中提取数据时,我发现它们没有使用SQL,而是使用了Dimensions和Metrics以及它们的组合。
为什么会这样呢?我认为它没有 SQL 接口(或普通的 Web 服务器日志下载)?如果是这样,SQL 语句如何转换为维度、指标(以及段和过滤器)?
看来指标往往是“聚合”,例如计数()或平均值(),而维度往往是记录的值本身(例如浏览器== IE或国家==澳大利亚) ,与 group by
值相同。过滤器就像条件语句,那么分段呢?
似乎如果我们指定维度,那么它会自动执行group by
并显示该字段。它通常会进行 count() 或 sum() 操作。如果我们想要 average(*)
呢?如果我们希望它显示但不希望它执行group by
怎么办?
用于实验的示例网站位于 http://code.google.com/apis /analytics/docs/gdata/gdataExplorer.html
Just when I get quite familiar with SQL statements once again, when pulling data from Google Analytics, I found that they don't use SQL, but rather, use Dimensions and Metrics and the combinations of them.
Why is a reason for that? I think it doesn't have a SQL interface (or a plain web server log download)? If so, how do SQL statements translate to Dimension, Metrics (and Segment and Filters)?
It seems that Metrics tend to be the "aggregates" such as count() or average(), and Dimension tends to be the logged values themselves (such as Browser == IE or Country == Australia), which is the same as the group by
values. Filters is like conditionals, and what about Segment?
It seems that if we specify Dimensions, then it automatically does a group by
and display that field as well. It does count() or sum() usually. What if we want average(*)
instead? And what if we want it to show but don't want it to do a group by
?
example website to experiment is at http://code.google.com/apis/analytics/docs/gdata/gdataExplorer.html
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(5)
使用术语“维度”和“指标”表明 Google 正在使用 OLAP 数据库而不是关系数据库...SQL 用于关系数据库:OLAP 使用 MDX 或专有查询语言(如果是 Oracle)。
来自 http://en.wikipedia.org/wiki/OLAP
use of the terms "Dimensions" and "Metrics" suggests that Google are using an OLAP database rather than a relational database.... SQL is used for relational databases: OLAP uses MDX or proprietary query languages (if Oracle).
From http://en.wikipedia.org/wiki/OLAP
它可能是使用他们自己的技术(例如 Big Table 和 Map-Reduce)在内部开发的。映射和聚合是 Map-Reduce 类型算法的优势,因此数据看起来像这样跨不同维度进行聚合是有道理的。
如果您想了解更多关于它们的信息,我建议您阅读这些维基百科文章:
It was probably developed internally using their own technologies like Big Table and Map-Reduce. Mapping and aggregating are the strengths of Map-Reduce type algorithms so it makes sense that the data is going to appear to be aggregated across various dimensions like that.
If you want to know more about them I'd suggest these Wikipedia articles:
我的猜测是,如果您问这样的问题,您可能已经远远超出了查看一些开箱即用的报告(例如简单的页面浏览量)的范围。如果这就是您所做的一切,那么您就大大错过了网络分析的意义和力量。一般来说,网络分析(不仅仅是 GA)是关于随着时间的推移观察数据趋势。数据本身是通过遵循某些预先定义的和用户定义的规则和行为来获取的。
报告的大部分数据无法轻松地从直接数据库查询中获取,因为这些数据基于“随时间变化的 xyz”等摘要和聚合数据。例如,维度和指标的“范围”概念,其中变量和/或值将报告有关单个页面视图/事件的数据,或者在访问(会话)过程中甚至在用户定义的时间内的数据(例如“使其持续一个月”或“使其持续到某个事件发生”,例如弹出特定变量或变量类型)。
由于大多数报告涉及更高级别的数据检索概念,因此数据库被抽象出来,并放置一个“框架”(报告界面)来帮助您构建显示趋势数据的报告。即使您是数据库专家,除了最基本的数据(例如页面浏览量)之外,尝试手动提取几乎所有数据都会花费太多时间和精力。而且这样的基本数据不太具有可操作性。
以活动跟踪为例。这一切都从一个 var=value 开始。当用户单击链接并转到 url 中包含 var=value 的页面时,跟踪代码会获取该值并开始归因,不仅涉及该页面的数据(url、时间、浏览器类型、列表等等)等等)以及从自定义编码收集的所有其他数据。然后,您可以对其应用其他设置,例如附加每次点击费用或某些加权衡量标准,将成功归因于目标或事件等...基于其他规则(首次点击与最终点击归因等)。 ..)。正在发挥作用的东西和考虑的东西的清单一直在不断地增加。继续尝试自己制作这些数据库查询字符串。现在清洗、漂洗并重复,因为这只是一个活动代码。我的客户拥有数千个活动代码,并且每天都会添加更多代码。哦,最重要的是,根据您希望实际报告显示数据的方式调整或创建全新的查询。按 xyz 交叉引用和分解。根据该数据查看渠道和场景。这只是针对竞选活动,只是众多事情中的一件事。
因此,长话短说,将报告界面视为数据库框架,通过可以调整的预定义查询,使人们的报告工作变得更加容易,特别是因为大多数人都不是数据库专家。
My guess is if you are asking a question like this, you probably having gone much past looking at some of the out-of-the-box reports such as simple page views. If that's all you are doing then you're vastly missing the point and power of Web Analytics. Web analytics in general (not just GA) is about looking at trends in data, over time. And the data itself is acquired by following certain rules and behaviors, both pre-defined and user-defined.
Much of the data for reports cannot be easily grabbed from a direct database query, because the data is based on abstracts such as "xyz over time" and aggregated data. For instance, the concept of "scope" for dimensions and metrics, where a variable and/or value will report data about single page view / events, or over the course of a visit (session) or even over a user defined amount of time (like "make this last a month" or "make this last until some event occurs," like a specific variable or variable type being popped).
Because most of reporting involves higher level concepts of data retrieval, the database is abstracted away, and a "framework" is put in place (the report interface) to help you build reports showing the trended data. Even if you are a database expert, it would take way too much time and effort to try and extract the data manually for virtually everything except the most basic data like page views. And basic data like that is not very actionable.
Look at campaign tracking as an example. It all starts with a single var=value. When a user clicks on a link and goes to a page with that var=value in the url, the tracking code grabs that value and starts attributing not only the data about the page (the url, time, type of browser, list goes on and on) but also all the other data collected from custom coding. Then there are other settings you can apply to it, like attaching a cost-per-click or some weighted measure, attributing success towards a goal or event, etc...based on other rules (first vs. last click attribution, etc...). The list of stuff coming into play and what is considered goes on and on and on. Go ahead and try to make those database query strings yourself. Now wash, rinse and repeat because that was just one campaign code. I've had clients with thousands of campaign codes, with many more being added every day. Oh, and also on top of that, tweaking or making altogether new queries based on how you want the actual report to show the data. Cross-referencing and breaking down by xyz. Looking at funnels and scenarios based on that data. And that's just for campaigns, one thing out of many things.
So to make a long story short, think of a report interface as a framework for databases, with predefined queries you can tweak, to make people's reporting efforts significantly easier, especially since most people aren't database experts.
我想答案在于这样一个事实:在 API 可用之前,分析数据的唯一方法是通过 Google Analytics 界面。他们在那里广泛使用“尺寸”和“公制”。因为非技术人员经常使用它,所以他们永远不会引入复杂的 SQL 结构;只是更容易有下拉菜单。
我不完全确定 Google Analytics 数据的存储方式是否适合 SQL(即表中的列和行)。我读到他们开发了自己的内部存储数据的方式。
I would imagine the answer lies in the fact that before the API was available the only way you could analyze data was through the Google Analytics interface. And it is there they extensively use "dimension" and "metric". Because non-technical people frequented it, they would have never introduced complex SQL constructs; just easier having dropdowns.
I am not totally sure the way Google Analytics data is stored is SQL friendly (i.e. columns and rows from tables). I have read they have developed their own internal way of storing this data.
我们问自己类似的问题。看起来许多 Web 分析 API 更像是事后的想法,通常是直接映射到相应产品的 UI 功能。通过 Infunl(免责声明:我是联合创始人),我们正在使用语法类似于 SQL 的灵活查询语言构建 Web 分析 API但对其背后的聚合和映射减少执行框架进行了高度优化。此外,它还提供了许多专为网络分析设计的内置功能,例如转化漏斗步骤、群组分析、对比测试支持以及灵活的内容分组和细分。
We asked ourselves similar question. It looks like many web analytics APIs are more of an afterthought and often are direct mappings to UI features of corresponding product. With Infunl (Disclaimer: I am a co-founder) we are building web analytics API with flexible query language that syntactically similar to SQL yet highly optimized for aggregation and map-reduce execution framework behind it. Furthermore it offers many built-in functionality specifically designed for web analytics, like conversion funnel steps, cohort analysis, support for split testing and flexible content grouping and segmentation.