SQL Server函数在列中显示词频

发布于 2024-07-21 04:28:22 字数 430 浏览 5 评论 0原文

我有一个表格，列出了一项调查中的自由文本输入，允许参与者输入他们的答案（关于他们想要在婚礼中使用的颜色）

我想编写一个 sql 函数来收集此列中的所有信息， order 计算每个单词的频率，并按此计数对结果集进行排序。

Response
--------
Red and White
green
White and blue
Blue
Dark blue

我希望上面的表格按如下顺序排列，

Response  Frequency
--------  ---------
Blue      3
White     2
And       2
Red       1
Green     1

我可以在函数运行后删除所有像“and”这样的垃圾词。有谁知道有什么好的功能可以产生这种行为吗？

原文

I have a table that lists a freet text input from a survey where enterents were allowed to enter their responses (regarding colours they would like to have in their wedding)

I would like to write a sql function that gathers all the information from this column, and orders counts the frequency of each word, ordering the result set by this count.

Response
--------
Red and White
green
White and blue
Blue
Dark blue

I would like the above table to be ordered as follows

Response  Frequency
--------  ---------
Blue      3
White     2
And       2
Red       1
Green     1

I can strip all the rubbish words like "and" after the function has run. Does anyone know any good functions that produce this behaviour?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

冷心人i 2024-07-28 04:28:22

好吧，这真是一种享受。首先是一个分隔值的函数...

Alter Function dbo.SeparateValues    

(    
 @data VARCHAR(MAX),    
 @delimiter VARCHAR(10)     
)     
RETURNS     
@tbldata TABLE(col VARCHAR(MAX))    
As    
--Declare @data VARCHAR(MAX) ,@delimiter VARCHAR(10)     
--Declare @tbldata TABLE(col VARCHAR(10))    
--Set @data = 'hello,how,are,you?,234234'    
--Set @delimiter = ','    
--DECLARE @tbl TABLE(col VARCHAR(10))    
Begin    
DECLARE @pos INT    
DECLARE @prevpos INT    
SET @pos = 1     
SET @prevpos = 0    

WHILE @pos > 0     
BEGIN    
SET @pos = CHARINDEX(@delimiter, @data, @prevpos+1)    
if @pos > 0     
INSERT INTO @tbldata(col) VALUES(LTRIM(RTRIM(SUBSTRING(@data, @prevpos+1, @pos-@prevpos-1))))    
else    
INSERT INTO @tbldata(col) VALUES(LTRIM(RTRIM(SUBSTRING(@data, @prevpos+1, len(@data)-@prevpos))))    
SET @prevpos = @pos     
End    

RETURN       
END

然后我将它应用到我的表...

Select Count(*), sep.Col FROM (
        Select * FROM (
            Select value = Upper(RTrim(LTrim(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(response, ',', ' '), '.', ' '), '!', ' '), '+', ' '), ':', ' '), '-', ' '), ';', ' '), '(', ' '), ')', ' '), '/', ' '), '&', ''), '?', ' '), '  ', ' '), '  ', ' ')))) FROM Responses
        ) easyValues
        Where value <> '' 
    ) actualValues 
    Cross Apply dbo.SeparateValues(value, ' ') sep
    Group By sep.Col
    Order By Count(*) Desc

好吧，所以我用我的嵌套表进行了OTT，但我已经删除了所有废话字符，分隔了值并保持运行最常用的单词总数。

Okay this works a treat. Firstly a function to separate the values...

Alter Function dbo.SeparateValues    

(    
 @data VARCHAR(MAX),    
 @delimiter VARCHAR(10)     
)     
RETURNS     
@tbldata TABLE(col VARCHAR(MAX))    
As    
--Declare @data VARCHAR(MAX) ,@delimiter VARCHAR(10)     
--Declare @tbldata TABLE(col VARCHAR(10))    
--Set @data = 'hello,how,are,you?,234234'    
--Set @delimiter = ','    
--DECLARE @tbl TABLE(col VARCHAR(10))    
Begin    
DECLARE @pos INT    
DECLARE @prevpos INT    
SET @pos = 1     
SET @prevpos = 0    

WHILE @pos > 0     
BEGIN    
SET @pos = CHARINDEX(@delimiter, @data, @prevpos+1)    
if @pos > 0     
INSERT INTO @tbldata(col) VALUES(LTRIM(RTRIM(SUBSTRING(@data, @prevpos+1, @pos-@prevpos-1))))    
else    
INSERT INTO @tbldata(col) VALUES(LTRIM(RTRIM(SUBSTRING(@data, @prevpos+1, len(@data)-@prevpos))))    
SET @prevpos = @pos     
End    

RETURN       
END

then I just apply it to my table...

Select Count(*), sep.Col FROM (
        Select * FROM (
            Select value = Upper(RTrim(LTrim(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(Replace(response, ',', ' '), '.', ' '), '!', ' '), '+', ' '), ':', ' '), '-', ' '), ';', ' '), '(', ' '), ')', ' '), '/', ' '), '&', ''), '?', ' '), '  ', ' '), '  ', ' ')))) FROM Responses
        ) easyValues
        Where value <> '' 
    ) actualValues 
    Cross Apply dbo.SeparateValues(value, ' ') sep
    Group By sep.Col
    Order By Count(*) Desc

Okay, so I went OTT with my nested tables, but I've stripped out all the crap characters, separated the values and kept a running total of the most frequently used words.

回复收藏 0 原文

傾旎 2024-07-28 04:28:22

DECLARE @phrases TABLE (id int, phrase varchar(max))
INSERT @phrases values
(1,'Red and White'  ),
(2,'green'          ),
(3,'White and blue' ),
(4,'Blue'           ),
(5,'Dark blue'      );

SELECT word, COUNT(*) c
FROM @phrases
CROSS APPLY (SELECT CAST('<a>'+REPLACE(phrase,' ','</a><a>')+'</a>' AS xml) xml1 ) t1
CROSS APPLY (SELECT n.value('.','varchar(max)') AS word FROM xml1.nodes('a') x(n) ) t2
GROUP BY word

词频
 ----------- -----------
 和 2
 蓝色3
 暗1
 绿色1
 红1
 白2

DECLARE @phrases TABLE (id int, phrase varchar(max))
INSERT @phrases values
(1,'Red and White'  ),
(2,'green'          ),
(3,'White and blue' ),
(4,'Blue'           ),
(5,'Dark blue'      );

SELECT word, COUNT(*) c
FROM @phrases
CROSS APPLY (SELECT CAST('<a>'+REPLACE(phrase,' ','</a><a>')+'</a>' AS xml) xml1 ) t1
CROSS APPLY (SELECT n.value('.','varchar(max)') AS word FROM xml1.nodes('a') x(n) ) t2
GROUP BY word

word         freq
----------- -----------
and         2
blue        3
Dark        1
green       1
Red         1
White       2

回复收藏 0 原文

绳情 2024-07-28 04:28:22

你的主要问题是你缺少 SQL Server 中的 split 函数。

这里有一个示例，看起来相当不错..

http://www.sqlteam .com/forums/topic.asp?TOPIC_ID=50648

按照以下方式编写一个存储过程......

CREATE TABLE #Temp (Response nvarchar(50), Frequency int) 

DECLARE @response nvarchar(100)
DECLARE db_cursor CURSOR FOR 
SELECT response FROM YourTable

OPEN db_cursor  
FETCH NEXT FROM db_cursor INTO @response 

WHILE @@FETCH_STATUS = 0  
BEGIN  
       /* Pseudo Code */ 
       --Split @Response 
       --Iterate through each word in returned list
       --IF(EXISTS in #TEMP)
       --    UPDATE THAT ROW & INCREMENT THE FREQUENCY
       --ELSE
       --    NEW WORD, INSERT TO #Temp WITH A FREQUENCY OF 1

       FETCH NEXT FROM db_cursor INTO @response 
END   

SELECT * FROM #Temp

使用它，您可以
需要运行一次，并且您的表或响应不是非常大，那么这应该可以工作

You're main problem is that you're missing a split function in SQL Server.

Theres a sample one here that looks pretty good..

http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=50648

Using that, you write a stored proc along the lines of...

CREATE TABLE #Temp (Response nvarchar(50), Frequency int) 

DECLARE @response nvarchar(100)
DECLARE db_cursor CURSOR FOR 
SELECT response FROM YourTable

OPEN db_cursor  
FETCH NEXT FROM db_cursor INTO @response 

WHILE @@FETCH_STATUS = 0  
BEGIN  
       /* Pseudo Code */ 
       --Split @Response 
       --Iterate through each word in returned list
       --IF(EXISTS in #TEMP)
       --    UPDATE THAT ROW & INCREMENT THE FREQUENCY
       --ELSE
       --    NEW WORD, INSERT TO #Temp WITH A FREQUENCY OF 1

       FETCH NEXT FROM db_cursor INTO @response 
END   

SELECT * FROM #Temp

Theres probably a less fugly way to do this without cursors, but if it's just something you
need to run once, and you're table or Responses isn't phenomenally large, then this should work

回复收藏 0 原文

~没有更多了~