BigQuery和Google Analytics(分析)4在浏览量方面的分歧 - 为什么?
我在大查询中有一张Google Analytics(GA4)事件的大型桌子,用于我所照顾的许多网站。 The table has the following schema:
field name | type |
---|---|
event_date | date |
event_timestamp | integer |
event_name | string |
event_key | string |
event_string_value | string |
event_int_value | integer |
event_float_value | float |
event_double_value | float |
user_pseudo_id | string |
user_first_touch_timestamp | integer |
device_category | string |
device_model_name | string |
device_host_name | 字符串 |
device_web_hostman | 字符串 |
geo_country | 字符串 |
geo_city | string |
clign_source_name | string |
我查询表以获取表格查看特定站点的pageviews的总数:使用以下查询:
with date_range as (
select
'20220601' as start_date,
'20220630' as end_date)
select
count(distinct case when event_name = 'page_view' then concat(user_pseudo_id, cast(event_timestamp as string)) end) as pageviews
from
`project_name.datset_name.table_name`,
date_range
WHERE
event_date BETWEEN PARSE_DATE('%Y%m%d',date_range.start_date) AND PARSE_DATE('%Y%m%d',date_range.end_date)
AND device_web_hostname in ("www.website_name.com")
对我来说是一个谜:当我为某些站点做这个网站时,page_views的图是__views的数字数百个页面浏览量。最大的查询数字更高。有趣的是:
- 如果我尝试其他事件,例如会议,那么就没有
- 说明的问题,只有某些网站,而不是
我所知道的所有内容:
- 这些数字永远不会同意,但是它们不应该'几百个
- GA4都有未加工的数据,因此我查询数据的方式与我尝试过的GA4接口中的处理方式不同
:
- 查看GA4文档,以查看页面浏览量的方式二手/处理;我看不到任何可以启发我
- 调试每个站点以确保标签正确触发的东西;他们是
我的墙壁有点墙,如果有人有任何洞察力将我指向另一个可能的方向,我会很高兴。提前致谢!
I have a large table of Google Analytics 4 (GA4) events in Big Query for a bunch of websites I look after. The table has the following schema:
field name | type |
---|---|
event_date | date |
event_timestamp | integer |
event_name | string |
event_key | string |
event_string_value | string |
event_int_value | integer |
event_float_value | float |
event_double_value | float |
user_pseudo_id | string |
user_first_touch_timestamp | integer |
device_category | string |
device_model_name | string |
device_host_name | string |
device_web_hostman | string |
geo_country | string |
geo_city | string |
traffic_source_name | string |
I query the table to get the total number for pageviews for a specific site using the following query:
with date_range as (
select
'20220601' as start_date,
'20220630' as end_date)
select
count(distinct case when event_name = 'page_view' then concat(user_pseudo_id, cast(event_timestamp as string)) end) as pageviews
from
`project_name.datset_name.table_name`,
date_range
WHERE
event_date BETWEEN PARSE_DATE('%Y%m%d',date_range.start_date) AND PARSE_DATE('%Y%m%d',date_range.end_date)
AND device_web_hostname in ("www.website_name.com")
What is a mystery to me is that when I do this for some sites, the figure for page_views is out by several hundred pageviews. The Big Query figure is higher. What is interesting is that:
- If I try other events, such as sessions then there are no issues
- As stated, it is only for some sites and not all
I know enought to know:
- These numbers are never going to agree, but they shouldn't be out by several hundred either
- GA4 has the unprocessed data, so the way I am querying the data is different to how it is being processed in the GA4 interface
I have tried:
- Looking at the GA4 documentation to see how pageviews are used/processed; I can't see anything that enlightens me
- Debugging each site to make sure tags are firing correctly; they are
I've hit a bit of a wall with this and I'd begrateful if anyone has any insight to point me in another possible direction. Thanks in advance!
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
问题在于代码的下一部分:
您对user_pseudo_id和event_timestamp的contecat进行了不同。您还需要在此基础上进行session_id才能获得唯一的命中。
The issue lies in this following part of the code:
You are counting distinct for concat of user_pseudo_id and event_timestamp which is not unique. You need to also have session_id on top of that to get a unique hit.