视图与物质化与物质的视图

发布于 2025-02-09 18:45:08 字数 830 浏览 2 评论 0 原文

设想: 5小时后,Kusto表中的数据将更新。 任务:从.NET API调用查询 在查询中,创建一个子查询,并使用该子查询在更大的表

let table1=materialize(
Customer|where CustomerId=="cust-reg-aabb-cc232"|distinct CustomerId,City);
CustomerPurchase
|where CustomerId=="cust-reg-aabb-cc232"
//perform join with table1 and other things

let table1=view(){
Customer|where CustomerId=="cust-reg-aabb-cc232"|distinct CustomerId,City};
CustomerPurchase
|where CustomerId=="cust-reg-aabb-cc232"
//perform join with table1 and CustomerPurchase

客户购买和客户数据上进行每5小时后进行更新(添加新行)。更优化的是:创建视图或使用方法。 我仔细阅读了文档,但两者之间的不同之处。

Also since I am implementing an API, is it possible to use

Scenario:
Data in kusto table is updated after 5 hours.
Task: Call query from .net API
In query , create a subquery and use that subquery to perform join on a bigger table

let table1=materialize(
Customer|where CustomerId=="cust-reg-aabb-cc232"|distinct CustomerId,City);
CustomerPurchase
|where CustomerId=="cust-reg-aabb-cc232"
//perform join with table1 and other things

or

let table1=view(){
Customer|where CustomerId=="cust-reg-aabb-cc232"|distinct CustomerId,City};
CustomerPurchase
|where CustomerId=="cust-reg-aabb-cc232"
//perform join with table1 and CustomerPurchase

CustomerPurchase and Customer data is being updated after every 5 hours(new rows being added). What is more optimized : create a view or use the materialize method.
I went through the documentation but could not understand the different between both.

Also since I am implementing an API, is it possible to use materialized view instead of table1?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

心房的律动 2025-02-16 18:45:08

文档很清楚:

允许在查询执行期间缓存子查询结果
其他子查询可以引用部分结果的一种方式。

视图是基于Kusto查询的结果集的虚拟表
语言查询。就像一个真实的表一样,一个视图包含行,
列。与真实的表不同,视图没有自己的数据
存储。

视图是通过用户定义的功能定义的,
要求:

  • 该函数的结果必须是表格(例如,它不能是标量值)。
  • 该功能必须不需要参数。

关键字

默认情况下,支持通配符语法的操作员指定表
即使视图的名称与
通配符。这种类型的操作员的一个例子是联合运营商。
在这种情况下,使用视图关键字也包含视图。

实体视图在源表上暴露了聚合查询,或

顾名思义,汇总结果已实现,含义 - 存储。
结果正在不断更新,而数据不断摄入。


在您的情况下,似乎没有理由使用 belitialize(),也没有 view 关键字。

诸如以下视图(在客户ID上使用过滤器)诸如使用时可能会提高性能,而不是 table1

.create materialized-view Customer_mv on table Customer
{
    Customer
    | summarize by CustomerId, City
}

更新

以下是几个示例,这些示例证明了 apertialize()

1。x337

let t = print x = rand(1000);
union t, t, t
<
998
242
a href =

“ https://dataexplorer.azure.com/clusters/clusters/help/databases/samples/samples? query=H4sIAAAAAAAAA8tJLVEoUbBVKCjKzCtRqACyihLzUjQMDQwMNK15uUrzMvPzFEp0wAgACmPsbiwAAAA=" rel="nofollow noreferrer">Fiddle

vs.

let t = materialize(print x = rand(1000));
union t, t, t
x
722
722
722

小提琴

2。i

let t1 = range i from 1 to 100000000 step 1 | summarize count() by i =  i%2;
let t2 = t1;
t1
| join kind=inner t2 on i
count_ count_1 i1
1 50000000 1 50000000
0 50000000 0 50000000 0 50000000

执行时间:4.4375515

Fiddle

vs.

let t1 = materialize(range i from 1 to 100000000 step 1 | summarize count() by i =  i%2);
let t2 = t1;
t1
| join kind=inner t2 on i
i count_ i1 count_1
1 50000000 1 50000000
0 50000000 0 50000000

Execution Time: 2.5313002


Here is an example that demonstrates the benefits of a view

.create-or-alter function StormEvents_top_5_deaths_v ()
{
    cluster("help").database("Samples").StormEvents
    |   project  Duration       = EndTime - StartTime
                ,Distance       = round(geo_distance_2points(BeginLon, BeginLat, EndLon, EndLat))
                ,TotalDeath     = DeathsDirect + DeathsIndirect
                ,TotalInjuries  = InjuriesIndirect + InjuriesIndirect
                ,TotalDamage    = DamageCrops + DamageProperty            
    |   where    TotalDeath > 0
    |   top 5 by TotalDeath
}

StormEvents_top_5_deaths_v
Duration Distance TotalDeath TotalInjuries TotalDamage
20.07:00:00 0 14 0 0
00:18:00 20609 13 0 46000000
01:02:00 26080 11 0 250000000
2.20:00:00 0 10 0 0
18.09:00: 00 0 10 0 0

Fiddle

The documentation is quite clear:

materialize

Allows caching a subquery result during the time of query execution in
a way that other subqueries can reference the partial result.

views

Views are virtual tables based on the result-set of a Kusto Query
Language query. Just like a real table, a view contains rows and
columns. Unlike a real table, a view doesn't hold its own data
storage.

Views are defined through user-defined functions with the following
requirements:

  • The result of the function must be tabular (for example, it cannot be a scalar value).
  • The function must take no arguments.

The view keyword

By default, operators that support a wildcard syntax to specify table
names will not reference views, even if the view's name matches the
wildcard. An example of this type of operator is the union operator.
In this case, use the view keyword to have the view included as well.

Materialized views

Materialized views expose an aggregation query over a source table, or
over another materialized view.

As the name suggests, the aggregation results are materialized, meaning - stored.
The results are being updated constantly, while the data keeps being ingested.


In your case, there seems to be no reason to use materialize(), nor the view keyword.

A materialized view such as the following (with filter on CustomerId) might improve performance when used instead of table1.

.create materialized-view Customer_mv on table Customer
{
    Customer
    | summarize by CustomerId, City
}

Update

Here are couple of examples that demonstrate the benefits of materialize()

1.

let t = print x = rand(1000);
union t, t, t
x
337
998
242

Fiddle

vs.

let t = materialize(print x = rand(1000));
union t, t, t
x
722
722
722

Fiddle

2.

let t1 = range i from 1 to 100000000 step 1 | summarize count() by i =  i%2;
let t2 = t1;
t1
| join kind=inner t2 on i
i count_ i1 count_1
1 50000000 1 50000000
0 50000000 0 50000000

Execution Time: 4.4375515

Fiddle

vs.

let t1 = materialize(range i from 1 to 100000000 step 1 | summarize count() by i =  i%2);
let t2 = t1;
t1
| join kind=inner t2 on i
i count_ i1 count_1
1 50000000 1 50000000
0 50000000 0 50000000

Execution Time: 2.5313002


Here is an example that demonstrates the benefits of a view

.create-or-alter function StormEvents_top_5_deaths_v ()
{
    cluster("help").database("Samples").StormEvents
    |   project  Duration       = EndTime - StartTime
                ,Distance       = round(geo_distance_2points(BeginLon, BeginLat, EndLon, EndLat))
                ,TotalDeath     = DeathsDirect + DeathsIndirect
                ,TotalInjuries  = InjuriesIndirect + InjuriesIndirect
                ,TotalDamage    = DamageCrops + DamageProperty            
    |   where    TotalDeath > 0
    |   top 5 by TotalDeath
}

StormEvents_top_5_deaths_v
Duration Distance TotalDeath TotalInjuries TotalDamage
20.07:00:00 0 14 0 0
00:18:00 20609 13 0 46000000
01:02:00 26080 11 0 250000000
2.20:00:00 0 10 0 0
18.09:00:00 0 10 0 0

Fiddle

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文