Java:从大文件中获取随机行
我已经看到如何获取文本文件中的随机行,但其中所述的方法(接受的答案)运行速度非常慢。它在我的 598KB 文本文件上运行速度非常慢,并且在我的文本文件版本上运行速度仍然很慢,该文本文件每 20 行只有 1 行,大小为 20KB。我从来没有越过“a”部分(它是一个单词列表)。
原始文件有64141行;缩短的有 2138 行。为了生成这些文件,我使用了 Linux Mint 11 /usr/share/dict/american-english
单词列表,并使用 grep
删除任何带有大写或撇号的内容 (grep -v [[:upper:]] | grep -v \'
)。
我使用的代码
String result = null;
final Random rand = new Random();
int n = 0;
for (final Scanner sc = new Scanner(wordList); sc.hasNext();) {
n++;
if (rand.nextInt(n) == 0) {
final String line = sc.nextLine();
boolean isOK = true;
for (final char c : line.toCharArray()) {
if (!(constraints.isAllowed(c))) {
isOK = false;
break;
}
}
if (isOK) {
result = line;
}
System.out.println(result);
}
}
return result;
稍微改编自 Itay 的回答。
对象 constraints
是一个 KeyboardConstraints
,它基本上只有一个方法 isAllowed(char)
:
public boolean isAllowed(final char key) {
if (allAllowed) {
return true;
} else {
return allowedKeys.contains(key);
}
}
其中 allowedKeys
和 < code>allAllowed 在构造函数中提供。这里使用的 constraints
变量将 "aeouhtns".toCharArray()
作为其 allowedKeys
,并且 allAllowed
关闭。
本质上,我希望该方法做的是选择一个满足约束的随机单词(例如,对于这些约束,“outvote”可以工作,但“worker”不行,因为“w”不是在“aeouhtns”.toCharArray()
中)。
我该怎么做?
I've seen how to get a random line from a text file, but the method stated there (the accepted answer) is running horrendously slow. It runs very slowly on my 598KB text file, and still slow on my a version of that text file which has only one out of every 20 lines, at 20KB. I never get past the "a" section (it's a wordlist).
The original file has 64141 lines; the shortened one has 2138 lines. To generate these files, I took the Linux Mint 11 /usr/share/dict/american-english
wordlist and used grep
to remove anything with uppercase or an apostrophe (grep -v [[:upper:]] | grep -v \'
).
The code I'm using is
String result = null;
final Random rand = new Random();
int n = 0;
for (final Scanner sc = new Scanner(wordList); sc.hasNext();) {
n++;
if (rand.nextInt(n) == 0) {
final String line = sc.nextLine();
boolean isOK = true;
for (final char c : line.toCharArray()) {
if (!(constraints.isAllowed(c))) {
isOK = false;
break;
}
}
if (isOK) {
result = line;
}
System.out.println(result);
}
}
return result;
which is slightly adapted from Itay's answer.
The object constraints
is a KeyboardConstraints
, which basically has the one method isAllowed(char)
:
public boolean isAllowed(final char key) {
if (allAllowed) {
return true;
} else {
return allowedKeys.contains(key);
}
}
where allowedKeys
and allAllowed
are provided in the constructor. The constraints
variable used here has "aeouhtns".toCharArray()
as its allowedKeys
with allAllowed
off.
Essentially, what I want the method to do is to pick a random word that satisfies the constraints (e.g. for these constraints, "outvote" would work, but not "worker", because "w" is not in "aeouhtns".toCharArray()
).
How can I do this?
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
您的实现中有一个错误。在选择随机数之前,您应该阅读该行。更改此:
对此(如 原答案):
在抽取随机数之前还应该检查约束条件。如果一行不符合约束,则应忽略它,如下所示:
You have a bug in your implementation. You should read the line before you choose a random number. Change this:
To this (as in the original answer):
You should also check the constraints before drawing a random number. If a line fails the constraints it should be ignored, something like this:
我会读入所有行,将它们保存在某个地方,然后从中选择一个随机行。这需要很短的时间,因为现在小于 1 MB 的单个文件已经很小了。
印刷
I would read in all the lines, save these somewhere and then select a random line from that. This takes a trivial amount of time because a single file of less than 1 MB is a trivial size these days.
prints