在 VB.NET 中计算词频的最佳方法是什么?
有一些关于如何在 C# 中计算词频的很好的示例,但它们都不是全面的,我确实需要 VB.NET 中的一个。
我目前的方法仅限于每个频率计数一个单词。 改变这一点以便我可以获得完全准确的词频列表的最佳方法是什么?
wordFreq = New Hashtable()
Dim words As String() = Regex.Split(inputText, "(\W)")
For i As Integer = 0 To words.Length - 1
If words(i) <> "" Then
Dim realWord As Boolean = True
For j As Integer = 0 To words(i).Length - 1
If Char.IsLetter(words(i).Chars(j)) = False Then
realWord = False
End If
Next j
If realWord = True Then
If wordFreq.Contains(words(i).ToLower()) Then
wordFreq(words(i).ToLower()) += 1
Else
wordFreq.Add(words(i).ToLower, 1)
End If
End If
End If
Next
Me.wordCount = New SortedList
For Each de As DictionaryEntry In wordFreq
If wordCount.ContainsKey(de.Value) = False Then
wordCount.Add(de.Value, de.Key)
End If
Next
我更喜欢实际的代码片段,但通用的“哦是的...使用这个并运行那个”也可以。
There are some good examples on how to calculate word frequencies in C#, but none of them are comprehensive and I really need one in VB.NET.
My current approach is limited to one word per frequency count. What is the best way to change this so that I can get a completely accurate word frequency listing?
wordFreq = New Hashtable()
Dim words As String() = Regex.Split(inputText, "(\W)")
For i As Integer = 0 To words.Length - 1
If words(i) <> "" Then
Dim realWord As Boolean = True
For j As Integer = 0 To words(i).Length - 1
If Char.IsLetter(words(i).Chars(j)) = False Then
realWord = False
End If
Next j
If realWord = True Then
If wordFreq.Contains(words(i).ToLower()) Then
wordFreq(words(i).ToLower()) += 1
Else
wordFreq.Add(words(i).ToLower, 1)
End If
End If
End If
Next
Me.wordCount = New SortedList
For Each de As DictionaryEntry In wordFreq
If wordCount.ContainsKey(de.Value) = False Then
wordCount.Add(de.Value, de.Key)
End If
Next
I'd prefer an actual code snippet, but generic 'oh yeah...use this and run that' would work as well.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
这可能就是您想要的:
我刚刚测试过它,它确实有效
编辑! 我添加了代码以确保它只计算字母而不计算符号。
仅供参考:我找到了一篇关于如何使用 LINQ 和目标 2.0 的文章,感觉有点脏,但可能对某人有帮助 http://weblogs.asp.net/fmarguerie/archive/2007/09/05/linq-support-on-net-2 -0.aspx
This might be what your looking for:
I have just tested it and it does work
EDIT! I have added code to make sure that it counts only letters and not symbols.
FYI: I found an article on how to use LINQ and target 2.0, its a feels bit dirty but it might help someone http://weblogs.asp.net/fmarguerie/archive/2007/09/05/linq-support-on-net-2-0.aspx
然后,为了快速演示应用程序,创建一个 winforms 应用程序,其中包含一个名为 InputBox 的多行文本框、一个名为 OutputList 的列表视图和一个名为 CountBtn 的按钮。 在列表视图中创建两列 - “Word”和“Freq”。 选择“详细信息”列表类型。 为 CountBtn 添加事件处理程序。 然后使用这段代码:
你做了一件可怕的事情让我用 VB 写这个,我永远不会原谅你。
:p
祝你好运!
编辑
修复了空白字符串错误和大小写错误
Then for a quick demo application, create a winforms app with one multiline textbox called InputBox, one listview called OutputList and one button called CountBtn. In the list view create two columns - "Word" and "Freq." Select the "details" list type. Add an event handler for CountBtn. Then use this code:
You did a terrible terrible thing to make me write this in VB and I will never forgive you.
:p
Good luck!
EDIT
Fixed blank string bug and case bug
这可能会有所帮助:
自然语言处理的词频算法
This might be helpful:
Word frequency algorithm for natural language processing
非常接近,但 \w+ 是一个很好的正则表达式来匹配(仅匹配单词字符)。
Pretty close, but \w+ is a good regex to match with (matches word characters only).