计算 N 个术语的 GoogleShare
我需要有关如何计算多个术语的 GoogleShare 的指导。
例如,采用以下基本术语:
- “Tom Cruise” = 12,000,000 页
- “John Travolta” = 4,900,000 页
现在,如果我们添加第二个术语:
- ” 克鲁斯” + “山达基” = 784,000 页
- “约翰·特拉沃尔塔” + “山达基” = 331,000 页
汤姆· 汤姆·克鲁斯和山达基教的比例为 (784000 * 100 / 12000000) = 6.53%,而约翰·特拉沃尔塔和山达基教的 GoogleShare 比例为 (331000 * 100 / 4900000) = 6.76%。
现在,如果我们在查询中添加第三个术语:
- “Tom Cruise” + “Scientology” + “StackOverflow” = 100 页
- “John Travolta” + “Scientology” + “StackOverflow” = 181 页
现在我应该如何计算 GoogleShare 百分比?
// Tom Cruise
100 * 100 / 784000 = 0.01% // StackOverflow / Scientology
// or...
100 * 100 / 12000000 = 0.00083% // StackOverflow / Tom Cruise
// John Travolta
181 * 100 / 331000 = 0.05% // StackOverflow / Scientology
// or...
181 * 100 / 4900000 = 0.00369% // StackOverflow / John Travolta
在 SO 社区中,约翰·特拉沃尔塔 (John Travolta) 的山达基教徒人数似乎是汤姆·克鲁斯 (Tom Cruise) 的 5 倍。
计算 N 项的 GoogleShare 的正确方法是什么?
I need guidance in how I should compute the GoogleShare of several terms.
For example, take the following base terms:
- "Tom Cruise" = 12,000,000 pages
- "John Travolta" = 4,900,000 pages
Now if we add a second term:
- "Tom Cruise" + "Scientology" = 784,000 pages
- "John Travolta" + "Scientology" = 331,000 pages
So the GoogleShare for Tom Cruise and Scientology is (784000 * 100 / 12000000) = 6.53%, while the GoogleShare for John Travolta and Scientology is (331000 * 100 / 4900000) = 6.76%.
Now if we add a third term to our query:
- "Tom Cruise" + "Scientology" + "StackOverflow" = 100 pages
- "John Travolta" + "Scientology" + "StackOverflow" = 181 pages
How should I compute the GoogleShare percentage now?
// Tom Cruise
100 * 100 / 784000 = 0.01% // StackOverflow / Scientology
// or...
100 * 100 / 12000000 = 0.00083% // StackOverflow / Tom Cruise
// John Travolta
181 * 100 / 331000 = 0.05% // StackOverflow / Scientology
// or...
181 * 100 / 4900000 = 0.00369% // StackOverflow / John Travolta
John Travolta seems to be 5 times more Scientologist than Tom Cruise inside the SO community.
What is the correct way to compute the GoogleShare of N terms?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
这取决于。首先,让我们先了解一下 GoogleShare 是什么。
考虑您的搜索
此处计算 GoogleShare 时计算的是同时包含
“Tom Cruise”
的“Scientology”
搜索百分比与“Tom Cruise”的搜索百分比“山达基”
还包含“约翰·特拉沃尔塔”
。因此,计算方法如下:比较:
因此,
“Scientology”
的“Tom Cruise”
GoogleShare 为 17.44%。“山达基”
的“John Travolta”
GoogleShare 为 7.18%。我们说相对于“汤姆·克鲁斯”
与“山达基”
的联系比“约翰·特拉沃尔塔”
与“山达基”的联系更紧密“
。因此,我注意到您对“Tom Cruise”
的 GoogleShare 与“Scientology”
中“John Travolta”
的 GoogleShare 的初步计算不正确。关键是弄清楚您的基本搜索是什么(这里是“山达基”
)以及您想要查看他们在此空间中所占份额的术语是什么(这里是”汤姆·克鲁斯”
与“约翰·特拉沃尔塔”
)。现在考虑搜索
,
有两种方法可以查看它。您是否正在尝试衡量
“汤姆·克鲁斯”
和“约翰·特拉沃尔塔”
在 (“科学教”
+空间中的份额关键字
)或者您是否正在尝试衡量“汤姆克鲁斯”
+关键字
在“山达基”
空间中的份额?这些是不同的。如果您想要在 (
"Scientology"
+"StackOverflow 的空间中分享
)你会计算:"Tom Cruise"
和"John Travolta"
”如果你想要
“Tom Cruise”+“StackOverflow”
和“John Travolta”+“StackOverflow”
在您要计算的“山达基”
空间:您看,这完全取决于您的基本搜索是什么以及您试图找到其在该基本术语中的份额的术语是什么。在第一个版本中,我们的基本搜索是
“Scientology”+“StackOverflow”
,我们看到共享“Tom Cruise”
和“John Travolta”
> 拥有这个空间。在第二个版本中,我们的基本搜索是“Scientology”
,我们看到共享“Tom Cruise”+“StackOverflow”
和“John Travolta”+“StackOverflow” “
在此空间中。It depends. First, let's lay a little groundwork on what GoogleShare is.
Consider your searches
What you're computing when you compute the GoogleShare here is the percentage of searches for
"Scientology"
that also contain"Tom Cruise"
versus the percentage of searches for"Scientology"
that also contain"John Travolta"
. So the way to compute this is as follows:Compare to:
Therefore, the
"Tom Cruise"
GoogleShare of"Scientology"
is 17.44%. The"John Travolta"
GoogleShare of"Scientology"
is 7.18%. We say that relative to"Tom Cruise"
is more connected to"Scientology"
than"John Travolta"
is connected to"Scientology"
. Thus I note that your initial calculations of the GoogleShare of"Tom Cruise"
versus the GoogleShare of"John Travolta"
in"Scientology"
were incorrect. The key is figuring out what your base search is (here it is"Scientology"
) and what the terms are that you want to see what their share of this space is (here it is"Tom Cruise"
versus"John Travolta"
).Now Consider a search
and
There are two ways to view this. Are you trying to measure the share of
"Tom Cruise"
and"John Travolta"
in the space of ("Scientology"
+keyword
) or are you trying to measure the share of"Tom Cruise"
+keyword
in the space of"Scientology"
? These are different.If you want the share of
"Tom Cruise"
and"John Travolta"
in the space of ("Scientology"
+"StackOverflow"
) you'd compute:If you want the share of
"Tom Cruise" + "StackOverflow"
and"John Travolta" + "StackOverflow"
in the space of"Scientology"
you'd compute:You see, it all depends on what your base search is and what the terms are that you are trying find their share of this base term. In the first version our base search is
"Scientology" + "StackOverflow"
and we are seeing what share"Tom Cruise"
and"John Travolta"
have of this space. In the second version our base search is"Scientology"
and we are seeing what share"Tom Cruise" + "StackOverflow"
and"John Travolta" + "StackOverflow"
have in this space.我看不出 N 个术语和 2 个术语之间的区别。每当您有超过 1 个术语时,您就隐式地采用了针对某个初始搜索术语的 GoogleShare。对于任何 N >= 2,对于窄查询的每个子集都有多个 GoogleShare。
您声称“汤姆·克鲁斯和山达基教的 GoogleShare”为 6.53%,但这有点误导,因为术语“和”往往暗示某种对称性,您可以在其中切换“汤姆·克鲁斯”和“山达基”,但含义没有改变。事实上并非如此,因为你最初的术语只是“汤姆·克鲁斯”。
也许对您计算的分数更好的描述是“汤姆·克鲁斯拥有‘山达基’GoogleShare 为 6.53%”。这消除了所有歧义,因为现在我们知道“汤姆·克鲁斯”在 6.53% 的情况下与“山达基”一词一起出现,而不是相反(即 6.53% 的山达基结果提到汤姆·克鲁斯)。
当你这样想时,对 N 项的相应概括就出来了。只需将您想要的任何初始术语放在“has/have”前面,以及您想要的任何其他缩小术语。根据您提供的数字,您可以说“约翰·特拉沃尔塔的山达基教参考文献的 Stack Overflow GoogleShare 为 0.05%”或“约翰·特拉沃尔塔的山达基教 + Stack Overflow GoogleShare 为 0.00369%”。选择哪种方式在上下文中提供更多信息。
I don't see the difference between N terms and, say 2 terms. Whenever you have more than 1 term, you are implicitly taking a GoogleShare with respect to some initial search term. For any N >= 2, there are multiple GoogleShares with respect to each subset of the narrow query.
You state that the "GoogleShare for Tom Cruise and Scientology" is 6.53%, but this is somewhat misleading since the term "and" tends to imply some kind of symmetry, where you could switch "Tom Cruise" and "Scientology" without changing the meaning. This is in fact not the case, since your initial term was "Tom Cruise" alone.
Perhaps a better description of the score you calculated is to say "Tom Cruise has a 'Scientology' GoogleShare of 6.53%." This removes all ambiguity, since now we know that "Tom Cruise" comes along with the term "Scientology" 6.53% of the time instead of the reverse (i.e. 6.53% of Scientology results mention Tom Cruise).
When you think of it this way, the corresponding generalization to N terms falls right out. Just stick whatever initial terms you would like in front of "has/have" and whatever additional narrowing terms you like after. With the numbers you gave, you could say that "John Travolta's Scientology references have a Stack Overflow GoogleShare of 0.05%" or that "John Travolta has a Scientology + Stack Overflow GoogleShare of 0.00369%". Pick which ever way is more informative in context.
这取决于你追求什么。第一个数字衡量的是 Stack Overflow 被提及的频率,作为显示汤姆·克鲁斯和山达基教的所有结果的比例,第二个数字衡量的是 Stack Overflow 和山达基教被提及的频率,作为显示汤姆·克鲁斯的所有结果的比例。
It depends what you're after. The first figure is a measure of how often Stack Overflow is mentioned as a proportion of all results showing both Tom Cruise and Scientology, the second is a measure of how often Stack Overflow and Scientology are both mentioned as a proportion of all results showing Tom Cruise.