如何在数组中排序值,将其汇总,有效地计算SQL中的类似记录
我需要按表中的类型属性进行计数和汇总。
这些是“标题” 专栏的一些示例:
排名 | Station |
---|---|
1 | Watlington Station -9 Houses and Industrial Units |
2 | Road Yate -1 Bungalow& 2座房屋 |
3 | 旧牧师卡尔巷 - 房屋&稳定的块 |
SO,然后为了清洁这些行,我使用了替换()和 Regex()函数:
SELECT SPLIT(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(TRIM(REGEXP_REPLACE(REGEXP_SUBSTR(REPLACE("Title",'-2',''), '[^-]*$'), '[0-9]+')),'/', '&'), ' & ','&'),' AND ','&'),' & ','&'),'& ','&'),'.',''),'&') AS "Category",
COUNT("Category") AS "Frequency"
FROM DATA.PROPERTIES
GROUP BY "Category"
ORDER BY "Frequency" DESC;
我得到了类似:
类别 | 频率 |
---|---|
“ house” | 12432 |
“ flats) “ | 76736 |
”房屋“ | 23865 |
”公寓“ | 18814 |
“房屋” | ” |
,“公寓” 3245“平面 | ,“房屋” 1342 |
“平面”,“零售单位”,“ | 362 |
“ 461” Flats,“零售 | “平面”,“ House” 461 |
”,“零售”,“零售” | 890 |
“公寓”,“零售单位” | 287 |
我给出了一些结果。如您所见 应该加入一个,因为它相似,并且应该像 [“ houses”,“ flats”] 和4587。
因此,我认为我应该以某种方式重新订购ASC或DESC中的每个数组订购并汇总。 还有关于
您有什么想法如何解决吗?
可能是使用array_agg或其他其他方法的方式。我现在没有任何想法,但是想解决这个问题。
I need to count and aggregate by types properties from the table.
These are some examples of "Title" column:
Row | Title |
---|---|
1 | WATLINGTON STATION - 9 HOUSES AND INDUSTRIAL UNITS |
2 | STATION ROAD YATE - 1 BUNGALOW & 2 HOUSES |
3 | THE OLD VICARAGE CARR LANE - HOUSE & STABLE BLOCK |
So, then to clean these rows I've used REPLACE() AND REGEX() functions:
SELECT SPLIT(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(TRIM(REGEXP_REPLACE(REGEXP_SUBSTR(REPLACE("Title",'-2',''), '[^-]*
And I got rows like:
Category
Frequency
"HOUSES"
12432
"FLATS"
76736
"HOUSE"
23865
"APARTMENTS"
18814
"HOUSES", "FLATS"
3245
"FLATS", "HOUSES"
1342
"FLATS", "RETAIL UNIT"
362
"FLATS", "HOUSE"
461
"FLATS", "RETAIL"
890
"FLATS", "RETAIL UNITS"
287
I've gave some results. As you can see [ "HOUSES", "FLATS" ] and [ "FLATS", "HOUSES" ]
should be joined into one, cause it similar and it should be like [ "HOUSES", "FLATS" ] and 4587.
So, I assume that somehow I should re-order every array in ASC or DESC order and aggregate it.
And similar question about three rows like [ "FLATS", "RETAIL UNIT" ], [ "FLATS", "RETAIL" ], [ "FLATS", "RETAIL UNITS" ].
Do you have any ideas how to resolve it??
May be the way using ARRAY_AGG or something else. I have no any ideas now, but would like to resolve this question.
), '[0-9]+')),'/', '&'), ' & ','&'),' AND ','&'),' & ','&'),'& ','&'),'.',''),'&') AS "Category",
COUNT("Category") AS "Frequency"
FROM DATA.PROPERTIES
GROUP BY "Category"
ORDER BY "Frequency" DESC;
And I got rows like:
Category | Frequency |
---|---|
"HOUSES" | 12432 |
"FLATS" | 76736 |
"HOUSE" | 23865 |
"APARTMENTS" | 18814 |
"HOUSES", "FLATS" | 3245 |
"FLATS", "HOUSES" | 1342 |
"FLATS", "RETAIL UNIT" | 362 |
"FLATS", "HOUSE" | 461 |
"FLATS", "RETAIL" | 890 |
"FLATS", "RETAIL UNITS" | 287 |
I've gave some results. As you can see [ "HOUSES", "FLATS" ] and [ "FLATS", "HOUSES" ]
should be joined into one, cause it similar and it should be like [ "HOUSES", "FLATS" ] and 4587.
So, I assume that somehow I should re-order every array in ASC or DESC order and aggregate it.
And similar question about three rows like [ "FLATS", "RETAIL UNIT" ], [ "FLATS", "RETAIL" ], [ "FLATS", "RETAIL UNITS" ].
Do you have any ideas how to resolve it??
May be the way using ARRAY_AGG or something else. I have no any ideas now, but would like to resolve this question.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
示例显示重新排序和汇总 -
想法是首先将数组弄平,然后在聚合时重新排序。最后,一旦可以使用简单的分组,就可以将所有以前像['a','b'],[b','a']的数组分组。
Example showing reorder and aggregate -
Idea is to first flatten the array and then reorder them while aggregating. Finally once they are in order simple grouping can be used, which will group all arrays that were previously like ['a','b'], ['b','a'] under one group.
我真的很喜欢Panka的答案,但是顺序不应该在CTE的步骤中,应该在Arrray_agg内会燃烧你的魔力。
另外,CTE2和CTE可以合并为一个步骤,如果您别名,则可以看到正在发生的事情:
giss:
I really like Panka's answer, but the ORDER BY should not be in the step of CTE and should be inside the ARRRAY_AGG because a) there is a command to solve this b) the order of CTE is unimportant, and relying on order of rows is a magic that will burn you.
Also the CTE2 and CTE can be merged into one step, if you alias the FLATEN you can see what is happening:
gives: