Google Analytics API - 指标的选择会影响返回的维度值吗?

发布于 2024-10-17 17:03:51 字数 876 浏览 4 评论 0原文

早上好。我在 Google Analytics API 中看到过这种行为,作为一名 SQL 人员,我觉得这很奇怪。我想获取 adContent 的所有值的列表,因此我查询 ga:adContent 并(因为我还必须选择一个指标,没有明确的原因)ga:organicSearches。它位于同一组(营销活动)中,因此也许它在服务器上的表现会更好。

我得到一行:adContent 是“(未设置)”,organicSearches 是 516,674。嗯,我猜 adContent 没有被使用。但营销部门发誓确实如此,并制作了一些令人信服的屏幕截图。

后来,我随意将指标更改为 ga:transactions。在我醒来的宇宙中,除了该列中返回的实际值之外,这绝对不会对任何事情产生影响。相反,我得到了无数行,其中包含 ga:adContent 的合理值。 ga:transactions 的值有时为零,因此 GA 不会过滤“指标 > 0”。

我的查询中没有过滤器。我没有更改这两个变体之间的日期范围。谁能告诉我发生了什么事吗?我希望上面的查询能够转换成这样的结果,它应该返回完全相同的行数:

SELECT adContent, SUM(organicSearches)
FROM Campaign
WHERE Date BETWEEN X AND Y
GROUP BY adContent

SELECT adContent, SUM(transactions)
FROM Campaign INNER JOIN ECommerce ON <something>
WHERE Date BETWEEN X AND Y
GROUP BY adContent

我意识到 GA 可能没有在后端使用普通的 RDMS,但在任何数据库中 1 + 1 仍然等于 2 !

Good morning. I've seen this behavior in the Google Analytics API, which as a SQL guy I find bizarre. I'd like to get a list of all values for adContent, so I query ga:adContent and (because I must also select a metric, for no well-defined reason) ga:organicSearches. It's in the same group (Campaign), so maybe it'll perform better back on the server.

I get one row: adContent is "(not set)", organicSearches is 516,674. Huh, I guess adContent isn't being used. But the marketing department swears that it is, and produce some convincing screen shots.

Later on, I arbitrarily change the metric to ga:transactions. In the universe I woke up in, this should have absolutely no impact on anything, except the actual value returned in that column. Instead, I get zillions of rows, with plausible values for ga:adContent. The value for ga:transactions is sometimes zero, so it's not the case that GA was filtering for "metric > 0".

There are no filters in my query. I did not change the date range between these two variants. Can anyone tell me what's going on? I expect the above queries to translate to something like this, which should return exactly the same number of rows:

SELECT adContent, SUM(organicSearches)
FROM Campaign
WHERE Date BETWEEN X AND Y
GROUP BY adContent

SELECT adContent, SUM(transactions)
FROM Campaign INNER JOIN ECommerce ON <something>
WHERE Date BETWEEN X AND Y
GROUP BY adContent

I realize that GA probably isn't using an ordinary RDMS on the back end, but surely 1 + 1 still equals 2 in any database!

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

橘虞初梦 2024-10-24 17:03:51

根据定义,ga:organicSearches 几乎永远不会与 ga:adContent 匹配(排除极端情况)。 ga:adContent 用于广告内容,其中 ga:organicSearches 用于会话内的自然搜索结果访问(例如,如果您在同一会话中多次使用 Google)尝试在网站上查找特定内容)。除了尝试测量特定现象之外,请勿将其用于任何其他用途。

尽量不要在这里使用 SQL 思维框架; Google Analytics(分析)不在后端使用 SQL,因此您对传统关系的概念不适用。 IIRC,他们使用了一些东西,其中有一个 BigTable 变体,它是一个 NoSQL - 类型数据库。

来自 20 年关于 BigTable 的 Google 论文06

我们简要描述其中两个表
由谷歌分析使用。原始的
点击表(̃200 TB)维持一行
对于每个最终用户会话。行
name 是一个包含以下内容的元组
网站名称和时间
会话已创建。这个架构
确保访问的会话
同一网站是连续的,并且
它们按时间顺序排序。这
表压缩至其大小的 14%
原始尺寸。汇总表( ̃20
TB)包含各种预定义的
每个网站的摘要。这张表
从原始点击表生成
通过定期调度的MapReduce
工作机会。每个 MapReduce 作业都会提取
来自原始点击的最近会话数据
桌子。整个系统的吞吐量
受到GFS吞吐量的限制。
该表压缩至其大小的 29%
原始尺寸。

如果您想要所有维度列表的指标的最小公分母,请使用 ga:pageviews

By definition ga:organicSearches will almost never have any matches for ga:adContent (edge cases aside). ga:adContent is for the content of an advertisement, where ga:organicSearches is for organic search result visits within a session (like if you use Google multiple times within the same session to try to find something specific on a site). Don't use it for anything besides trying to measure that particular phenomenon.

Try not to use an SQL mindframe here; Google Analytics doesn't use SQL on the backend, so the notions you have of traditional relationships aren't applicable. IIRC, they use a few things, amongst them a BigTable variant, which is a NoSQL-type database.

From a Google Paper on BigTable from 2006:

We briefly describe two of the tables
used by Google Analytics. The raw
click table ( ̃200 TB) maintains a row
for each end-user session. The row
name is a tuple containing the
website’s name and the time at which
the session was created. This schema
ensures that sessions that visit the
same web site are contiguous, and that
they are sorted chronologically. This
table compresses to 14% of its
original size. The summary table ( ̃20
TB) contains various predefined
summaries for each website. This table
is generated from the raw click table
by periodically scheduled MapReduce
jobs. Each MapReduce job extracts
recent session data from the raw click
table. The overall system’s throughput
is limited by the throughput of GFS.
This table compresses to 29% of its
original size.

If you want the lowest common denominator for a metric for a list of all dimensions, use ga:pageviews.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文