在 Java 中使用通用 Pair 类和 Splaytree 来计数和存储单词及其频率
我正在实现一个 splaytree 来保存单词及其频率,并选择创建一个 Pair 类来保存每个单词频率(键值)对。也就是说,splaytree的每个节点都保存一对Pair类。 Pair 类如下所示:
public class SplayEntry<K, V> implements Comparable<SplayEntry<K, V>>{
public K word;
public V frequency;
public SplayEntry(K word, V frequency) {
this.word = word;
this.frequency = frequency;
}
getters, setters, hashCode, equals, compareTo etc...
Splaytree:
public class SplayTree<AnyType extends Comparable<? super AnyType>> {
public SplayTree( )
{
nullNode = new BinaryNode<AnyType>( null );
nullNode.left = nullNode.right = nullNode;
root = nullNode;
}
并且具有 BinaryNode 类。
我遇到的麻烦是如何将每个单词和频率对放入树中,并检查该对是否已经存在,如果存在则将频率增加一。我逐行读取文本文件并将每一行拆分为单词,然后执行 countWords() 方法,该方法现在一团糟:
public void countWords(String line) {
line = line.toLowerCase();
String[] words = line.split("\\P{L}+");
SplayEntry<String, Integer> entry = new SplayEntry<String, Integer>(null, null);
for (int i = 0, n = words.length; i < n; i++) {
Integer occurances = 0;
entry.setWord(words[i]);
entry.setFrequency(occurances);
if (tree.contains(entry.equals(entry)) && entry.getFrequency() == 0) {
occurances = 1;
} else {
int value = occurances.intValue();
occurances = new Integer(value + 1);
entry.setFrequency(occurances);
}
entry = new SplayEntry<String, Integer>(words[i], occurances);
tree.insert(entry);
}
}
我知道这实际上不起作用,我需要帮助来弄清楚应该如何实例化 SplayEntry类以及按什么顺序?我还想要该方法,对于单词数组中的每个单词,检查它是否存在于树内的 SplayEntry 中(包含),如果该单词是新单词,则频率将为 1,否则,频率将为是+1。最后,我将新的 SplayEntry 添加到 Splaytree 中,并将其放入适当的节点中。
现在,我只是因为在同一段代码上工作了太多的时间而不是必要的时间而让自己感到困惑,我非常感谢一些可以引导我走向正确方向的指示!
如果我没有说清楚,请告诉我。
I'm implementing a splaytree to hold words and their frequencies and chose to create a Pair class that would hold each word-frequency (key-value) pair. That is, each node of the splaytree holds a pair of the Pair class. The Pair class looks like this:
public class SplayEntry<K, V> implements Comparable<SplayEntry<K, V>>{
public K word;
public V frequency;
public SplayEntry(K word, V frequency) {
this.word = word;
this.frequency = frequency;
}
getters, setters, hashCode, equals, compareTo etc...
The Splaytree:
public class SplayTree<AnyType extends Comparable<? super AnyType>> {
public SplayTree( )
{
nullNode = new BinaryNode<AnyType>( null );
nullNode.left = nullNode.right = nullNode;
root = nullNode;
}
And has BinaryNode class.
What I'm having trouble with is how to, for every word and frequency pair put it into the tree and also check whether the pair already exists and if so up the frequency by one. I read in a text file line by line and split each line into words then do a countWords() method that right now is a mess:
public void countWords(String line) {
line = line.toLowerCase();
String[] words = line.split("\\P{L}+");
SplayEntry<String, Integer> entry = new SplayEntry<String, Integer>(null, null);
for (int i = 0, n = words.length; i < n; i++) {
Integer occurances = 0;
entry.setWord(words[i]);
entry.setFrequency(occurances);
if (tree.contains(entry.equals(entry)) && entry.getFrequency() == 0) {
occurances = 1;
} else {
int value = occurances.intValue();
occurances = new Integer(value + 1);
entry.setFrequency(occurances);
}
entry = new SplayEntry<String, Integer>(words[i], occurances);
tree.insert(entry);
}
}
I know this isn't really working and I need help in figuring out how I should instantiate the SplayEntry class and in what order? I also want the method to, for every word in the words array, check whether it exists in a SplayEntry which is inside the tree (contains) and if the word is a new word then the frequency will be 1, else, the frequency will be +1. finally I just add the new SplayEntry into the Splaytree and let that put it in an appropriate node.
Right now I've just confused myself by working on the same piece of code for way too many hours than should be necessary, I would very much appreciate some pointers that can lead me in the right direction!
Please tell me if I've not made myself clear.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
![扫码二维码加入Web技术交流群](/public/img/jiaqun_03.jpg)
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
我建议使用展开树的标准实现,即没有计数器,并为频率使用单独的
HashMap
。这不会牺牲复杂性,因为展开树上的操作为 O(log n),而 HashMap 上的操作为 O(1)。为了保留封装性和不变量,您可以将它们放在一个更大的类中,以公开所需的操作。I suggest using a standard implementation of a splay tree, i.e. without the counters, and having a separate
HashMap
for frequencies. This does not sacrifice complexity, since operations on a splay tree are O(log n), while operations on aHashMap
are O(1). To preserve encapsulation and invariants, you can put both within a larger class that exposes the required operations.