从 Java.io.Reader 获取有意义的文本
我正在编写一个程序,我正在使用另一家公司的库从他们的网站下载一些报告。我想在将这些报告写入文件之前对其进行解析,因为如果它们符合某些条件,我想忽略它们。
问题是,他们的方法 download() 返回一个 java.io.Reader。我唯一可用的方法是
int read(char[] cbuf);
打印这个返回的数组给我带来无意义的字符。我希望能够识别我正在使用的字符集或将其转换为字节数组,但我不知道该怎么做。我已经尝试过
//retrievedFile is my Reader object
char[] cbuf = new char[2048];
int numChars = retrievedFile.read(cbuf);
//I've tried other character sets, too
new String(cbuf).getBytes("UTF-8");
,但我害怕贬低为更有用的读者,因为我不确定它是否会起作用。有什么建议吗?
编辑
当我说它打印出“无意义的字符”时,我并不是说它看起来像乔恩·斯基特给出的例子。这真的很难描述,因为我现在不在我的机器旁,但我认为这是一个编码问题。这些字符似乎具有与报告的外观相似的缩进和结构。周二一回来我就会尝试这些建议(我只是一名实习生,所以我没有费心去设置远程帐户或任何东西)。
I have a program that I'm writing where I am using another company's library to download some reports from their website. I want to parse these reports before I write them to a file, because if they match certain criteria, I want to disregard them.
Problem is, their method, called download() returns a java.io.Reader. The only method available to me is
int read(char[] cbuf);
Printing this returned array out gives me meaningless characters. I want to be able to identify what character set I'm working with or convert it to a byte array but I can't figure out how to do it. I've tried
//retrievedFile is my Reader object
char[] cbuf = new char[2048];
int numChars = retrievedFile.read(cbuf);
//I've tried other character sets, too
new String(cbuf).getBytes("UTF-8");
and I'm afraid to downcast to a more useful reader because I can't know for sure if it will work or not. Any suggestions?
EDIT
When I say it prints out "meaningless characters", I don't mean that it looks like the example given by Jon Skeet. It's really hard to describe because I'm not at my machine right now, but I think it's an encoding issue. The characters seem to have indentations and structure similar to the look of the reports. I'll try these suggestions as soon as I get back on Tuesday (I'm only an intern, so I haven't bothered with setting up a remote account or anything).
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
试试这个:
不要将 Reader 类型转换为任何类,因为您不知道它的真实类型。
相反,使用 BufferedReader 并将 Reader 传递给它。 BufferedReader 采用 java.io.Reader 的任何子类作为参数,因此可以保存使用它。
Try this:
Don't typecast the Reader to any class because you don't know the real type of it.
Instead, use BufferedReader and pass Reader into it. And BufferedReader take any subclass of java.io.Reader as the argument so it is save to use it.
打印出
char[]
本身可能会给您类似的结果:这只是在 Java 中对
char
数组调用toString
的正常输出。听起来您想将其转换为String
,您可以使用String(char[])
构造函数来完成。下面是一些示例代码:另一方面,
java.io.Reader
没有有一个read
方法返回 > achar[]
- 它具有一次返回单个字符的方法,或者(更有用)接受一个char[]
填充数据,并返回读取的数据量。这实际上就是您的示例代码所显示的内容。您只需使用 char 数组和读取的字符数即可创建新的String
。例如:但是请注意,它可能不会一次性返回所有数据。您可以使用 BufferedReader 逐行读取它,或者循环获取所有信息。 Guava 在其
CharStreams
类。例如:或
Printing out the
char[]
itself will probably give you something like:That's just the normal output of calling
toString
on achar
array in Java. It sounds like you want to convert it into aString
, which you can do with aString(char[])
constructor. Here's some sample code:On the other hand,
java.io.Reader
doesn't have aread
method returning achar[]
- it has methods which either return a single character at a time, or (more usefully) accept achar[]
to fill with data, and return the amount of data read. This is actually what your sample code shows. You just need to use the char array and the number of characters read to create the newString
. For example:However, note that it may not return all the data in one go. You could read it line by line using
BufferedReader
, or loop to fetch all of the information. Guava contains useful code in itsCharStreams
class. For example:or
作为替代方案,您可以使用
java.util.Scanner
从java.io.Reader
读取字符串,并使用 try 和资源来自动关闭读取器。下面是一个示例:
在这种情况下,对
scanner.next()
的调用将读取所有字符,因为分隔符是文件末尾。下面的一行代码也将阅读全文,但不会关闭阅读器:
As an alternative you can read a string from a
java.io.Reader
usingjava.util.Scanner
using try with resources which should automatically close the reader.Here is an example:
In this situation the call to
scanner.next()
will read all characters, because the delimiter is the end of file.The following one liner will also read the whole text but will not close the reader:
它给出了什么毫无意义的字符。可能是空字符,因为您没有从读取器读取所有字符,而是最多读取 2048 个字符,并且您忽略 read 方法的返回值(它告诉您实际读取了多少个字符。
如果您想读取如果将整个内容转换为字符串,则必须循环直到返回值为负数,并将每次迭代时读取的字符(从 0 到 numChars)附加到 StringBuilder。
What meaningless chars does it give. Probably null chars, because you don't read all the chars from the reader, but at most 2048 chars, and you ignore the returned value from the read method (which tell you how many chars were actually read.
If you want to read the whole thing into a String, you'll have to loop until the returned value is negative, and append the chars read at each iteration (from 0 to numChars) to a StringBuilder.
将其包装在更有用的内容中,例如 StringReader 或 BufferedReader:
http://docs。 oracle.com/javase/6/docs/api/
。
Wrap it in something more useful, like a StringReader or a BufferedReader:
http://docs.oracle.com/javase/6/docs/api/
.
由于该文件是一个文本文件,请从您的 Reader 创建一个 BufferedReader 并逐行读取它 - 这应该有助于更好地理解它。
Since the file is a text file create a
BufferedReader
from yourReader
and read it line by line - that should help make more sense of it.从 Java 1.8 开始,您可以使用
BufferedReader.lines()
方法,返回Stream
。因此,此代码将返回整个内容,并带有自定义行分隔符“\n”:
Since Java 1.8, you can use the
BufferedReader.lines()
method, returningStream<String>
.So, this code will return whole content, with a custom line separator "\n":