Java:查找两个数据源中出现了多少个单词?
我想弄清楚是否有一种简单的方法来计算小段落(#1)和小段落(#2)中出现的单词数。
一般来说,我会逐字确定这些段落中有多少重叠。因此,如果 (#1) 包含单词“happy”并且 (#2) 包含单词“happy”,则类似于 +1 值。
我知道我可以对应用于 (#2) 的 (#1) 中的每个单词使用 String.contains()
。但我想知道是否有更有效的方法可以使用
I'm trying to figure out if there is an easy way to count the number of words that appear in small paragraph (#1) and small paragraph (#2).
Generally, Im determining how much overlap there is in these paragraphs on a word by word basis. So if (#1) contains the word "happy" and (#2) contains the word "happy" that would be like a +1 value.
I know that I could use a String.contains()
for each word in (#1) applied to (#2). But I was wondering if there is something more efficient that I could use
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
您可以创建两个集合 s1 和 s2,分别包含第一段和第二段中的所有单词,并将它们相交:
s1.retainAll(s2)
。听起来很容易。更新
对我有用
不要忘记从两组中删除空词。
You can create two sets s1 and s2, containing all words from first and second paragraph respectively, and intersect them:
s1.retainAll(s2)
. Sounds easy enough.update
Works for me
Don't forget to remove empty word from both sets.