SQL Server函数在列中显示词频
我有一个表格,列出了一项调查中的自由文本输入,允许参与者输入他们的答案(关于他们想要在婚礼中使用的颜色)
我想编写一个 sql 函数来收集此列中的所有信息, order 计算每个单词的频率,并按此计数对结果集进行排序。
Response
--------
Red and White
green
White and blue
Blue
Dark blue
我希望上面的表格按如下顺序排列,
Response Frequency
-------- ---------
Blue 3
White 2
And 2
Red 1
Green 1
我可以在函数运行后删除所有像“and”这样的垃圾词。 有谁知道有什么好的功能可以产生这种行为吗?
I have a table that lists a freet text input from a survey where enterents were allowed to enter their responses (regarding colours they would like to have in their wedding)
I would like to write a sql function that gathers all the information from this column, and orders counts the frequency of each word, ordering the result set by this count.
Response
--------
Red and White
green
White and blue
Blue
Dark blue
I would like the above table to be ordered as follows
Response Frequency
-------- ---------
Blue 3
White 2
And 2
Red 1
Green 1
I can strip all the rubbish words like "and" after the function has run. Does anyone know any good functions that produce this behaviour?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
好吧,这真是一种享受。 首先是一个分隔值的函数...
然后我将它应用到我的表...
好吧,所以我用我的嵌套表进行了OTT,但我已经删除了所有废话字符,分隔了值并保持运行最常用的单词总数。
Okay this works a treat. Firstly a function to separate the values...
then I just apply it to my table...
Okay, so I went OTT with my nested tables, but I've stripped out all the crap characters, separated the values and kept a running total of the most frequently used words.
你的主要问题是你缺少 SQL Server 中的 split 函数。
这里有一个示例,看起来相当不错..
http://www.sqlteam .com/forums/topic.asp?TOPIC_ID=50648
按照以下方式编写一个存储过程......
使用它,您可以
需要运行一次,并且您的表或响应不是非常大,那么这应该可以工作
You're main problem is that you're missing a split function in SQL Server.
Theres a sample one here that looks pretty good..
http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=50648
Using that, you write a stored proc along the lines of...
Theres probably a less fugly way to do this without cursors, but if it's just something you
need to run once, and you're table or Responses isn't phenomenally large, then this should work