从 Java.io.Reader 获取有意义的文本

发布于 2024-12-23 16:15:19 字数 679 浏览 1 评论 0原文

我正在编写一个程序,我正在使用另一家公司的库从他们的网站下载一些报告。我想在将这些报告写入文件之前对其进行解析,因为如果它们符合某些条件,我想忽略它们。

问题是,他们的方法 download() 返回一个 java.io.Reader。我唯一可用的方法是

int read(char[] cbuf);

打印这个返回的数组给我带来无意义的字符。我希望能够识别我正在使用的字符集或将其转换为字节数组,但我不知道该怎么做。我已经尝试过

//retrievedFile is my Reader object
char[] cbuf = new char[2048];
int numChars = retrievedFile.read(cbuf);
//I've tried other character sets, too
new String(cbuf).getBytes("UTF-8");

,但我害怕贬低为更有用的读者,因为我不确定它是否会起作用。有什么建议吗?

编辑

当我说它打印出“无意义的字符”时,我并不是说它看起来像乔恩·斯基特给出的例子。这真的很难描述,因为我现在不在我的机器旁,但我认为这是一个编码问题。这些字符似乎具有与报告的外观相似的缩进和结构。周二一回来我就会尝试这些建议(我只是一名实习生,所以我没有费心去设置远程帐户或任何东西)。

I have a program that I'm writing where I am using another company's library to download some reports from their website. I want to parse these reports before I write them to a file, because if they match certain criteria, I want to disregard them.

Problem is, their method, called download() returns a java.io.Reader. The only method available to me is

int read(char[] cbuf);

Printing this returned array out gives me meaningless characters. I want to be able to identify what character set I'm working with or convert it to a byte array but I can't figure out how to do it. I've tried

//retrievedFile is my Reader object
char[] cbuf = new char[2048];
int numChars = retrievedFile.read(cbuf);
//I've tried other character sets, too
new String(cbuf).getBytes("UTF-8");

and I'm afraid to downcast to a more useful reader because I can't know for sure if it will work or not. Any suggestions?

EDIT

When I say it prints out "meaningless characters", I don't mean that it looks like the example given by Jon Skeet. It's really hard to describe because I'm not at my machine right now, but I think it's an encoding issue. The characters seem to have indentations and structure similar to the look of the reports. I'll try these suggestions as soon as I get back on Tuesday (I'm only an intern, so I haven't bothered with setting up a remote account or anything).

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

◇流星雨 2024-12-30 16:15:19

试试这个:

BufferedReader in = new BufferedReader(retrievedFile);
String line = null;
StringBuilder rslt = new StringBuilder();
while ((line = in.readLine()) != null) {
    rslt.append(line);
}
System.out.println(rslt.toString());

不要将 Reader 类型转换为任何类,因为您不知道它的真实类型。
相反,使用 BufferedReader 并将 Reader 传递给它。 BufferedReader 采用 java.io.Reader 的任何子类作为参数,因此可以保存使用它。

Try this:

BufferedReader in = new BufferedReader(retrievedFile);
String line = null;
StringBuilder rslt = new StringBuilder();
while ((line = in.readLine()) != null) {
    rslt.append(line);
}
System.out.println(rslt.toString());

Don't typecast the Reader to any class because you don't know the real type of it.
Instead, use BufferedReader and pass Reader into it. And BufferedReader take any subclass of java.io.Reader as the argument so it is save to use it.

淡淡的优雅 2024-12-30 16:15:19

打印出 char[] 本身可能会给您类似的结果:

[C@1c8825a5

这只是在 Java 中对 char 数组调用 toString 的正常输出。听起来您想将其转换为 String,您可以使用 String(char[]) 构造函数来完成。下面是一些示例代码:

public class Test {
    public static void main(String[] args) {
        char[] chars = "hello".toCharArray();
        System.out.println((Object) chars);

        String text = new String(chars);
        System.out.println(text);
    }
}

另一方面,java.io.Reader 没有有一个 read 方法返回 > a char[] - 它具有一次返回单个字符的方法,或者(更有用)接受一个 char[]填充数据,并返回读取的数据量。这实际上就是您的示例代码所显示的内容。您只需使用 char 数组和读取的字符数即可创建新的 String。例如:

char[] buffer = new char[4096];
int charsRead = reader.read(buffer);
String text = new String(buffer, 0, charsRead);

但是请注意,它可能不会一次性返回所有数据。您可以使用 BufferedReader 逐行读取它,或者循环获取所有信息。 Guava 在其 CharStreams 类。例如:

String allText = CharStreams.toString(reader);

List<String> lines = CharStreams.readLines(reader);

Printing out the char[] itself will probably give you something like:

[C@1c8825a5

That's just the normal output of calling toString on a char array in Java. It sounds like you want to convert it into a String, which you can do with a String(char[]) constructor. Here's some sample code:

public class Test {
    public static void main(String[] args) {
        char[] chars = "hello".toCharArray();
        System.out.println((Object) chars);

        String text = new String(chars);
        System.out.println(text);
    }
}

On the other hand, java.io.Reader doesn't have a read method returning a char[] - it has methods which either return a single character at a time, or (more usefully) accept a char[] to fill with data, and return the amount of data read. This is actually what your sample code shows. You just need to use the char array and the number of characters read to create the new String. For example:

char[] buffer = new char[4096];
int charsRead = reader.read(buffer);
String text = new String(buffer, 0, charsRead);

However, note that it may not return all the data in one go. You could read it line by line using BufferedReader, or loop to fetch all of the information. Guava contains useful code in its CharStreams class. For example:

String allText = CharStreams.toString(reader);

or

List<String> lines = CharStreams.readLines(reader);
演多会厌 2024-12-30 16:15:19

作为替代方案,您可以使用 java.util.Scannerjava.io.Reader 读取字符串,并使用 try 和资源来自动关闭读取器。

下面是一个示例:

Reader in = ...
try (Scanner scanner = new Scanner(in).useDelimiter("\\Z")) {
    String text = scanner.next();
    ... // Do something with text
}

在这种情况下,对 scanner.next() 的调用将读取所有字符,因为分隔符是文件末尾。

下面的一行代码也将阅读全文,但不会关闭阅读器:

String text = new Scanner(in).useDelimiter("\\Z").next();

As an alternative you can read a string from a java.io.Reader using java.util.Scanner using try with resources which should automatically close the reader.

Here is an example:

Reader in = ...
try (Scanner scanner = new Scanner(in).useDelimiter("\\Z")) {
    String text = scanner.next();
    ... // Do something with text
}

In this situation the call to scanner.next() will read all characters, because the delimiter is the end of file.

The following one liner will also read the whole text but will not close the reader:

String text = new Scanner(in).useDelimiter("\\Z").next();
伊面 2024-12-30 16:15:19

它给出了什么毫无意义的字符。可能是空字符,因为您没有从读取器读取所有字符,而是最多读取 2048 个字符,并且您忽略 read 方法的返回值(它告诉您实际读取了多少个字符。

如果您想读取如果将整个内容转换为字符串,则必须循环直到返回值为负数,并将每次迭代时读取的字符(从 0 到 numChars)附加到 StringBuilder。

StringBuilder builder = new StringBuilder();
char[] cbuf = new char[2048];
int numChars;
while ((numChars = reader.read(cbuf)) >= 0) {
    builder.append(cbuf, 0, numChars);
}
String s = builder.toString();

What meaningless chars does it give. Probably null chars, because you don't read all the chars from the reader, but at most 2048 chars, and you ignore the returned value from the read method (which tell you how many chars were actually read.

If you want to read the whole thing into a String, you'll have to loop until the returned value is negative, and append the chars read at each iteration (from 0 to numChars) to a StringBuilder.

StringBuilder builder = new StringBuilder();
char[] cbuf = new char[2048];
int numChars;
while ((numChars = reader.read(cbuf)) >= 0) {
    builder.append(cbuf, 0, numChars);
}
String s = builder.toString();
趁微风不噪 2024-12-30 16:15:19

将其包装在更有用的内容中,例如 StringReader 或 BufferedReader:

http://docs。 oracle.com/javase/6/docs/api/

Wrap it in something more useful, like a StringReader or a BufferedReader:

http://docs.oracle.com/javase/6/docs/api/

.

天暗了我发光 2024-12-30 16:15:19

由于该文件是一个文本文件,请从您的 Reader 创建一个 BufferedReader 并逐行读取它 - 这应该有助于更好地理解它。

Since the file is a text file create a BufferedReader from your Reader and read it line by line - that should help make more sense of it.

夜未央樱花落 2024-12-30 16:15:19

从 Java 1.8 开始,您可以使用 BufferedReader.lines() 方法,返回 Stream

因此,此代码将返回整个内容,并带有自定义行分隔符“\n”:

String content = new BufferedReader(reader)
    .lines()
    .collect(Collectors.joining("\n"));

Since Java 1.8, you can use the BufferedReader.lines() method, returning Stream<String>.

So, this code will return whole content, with a custom line separator "\n":

String content = new BufferedReader(reader)
    .lines()
    .collect(Collectors.joining("\n"));
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文