为什么使用 inputStream 无法读取外来字符?
我有一个文本文件,其中包含需要预加载到 SQLite 数据库中的数据。我保存在 res/raw 中。
我使用 readTxtFromRaw() 读取整个文件,然后使用 StringTokenizer 类逐行处理文件。
但是,readTxtFromRaw
返回的String
不显示文件中的外来字符。我需要这些,因为有些文字是西班牙语或法语。我错过了什么吗?
代码:
String fileCont = new String(readTxtFromRaw(R.raw.wordstext));
StringTokenizer myToken = new StringTokenizer(fileCont , "\t\n\r\f");
readTxtFromRaw 方法是:
private String readTxtFromRaw(Integer rawResource) throws IOException
{
InputStream inputStream = mCtx.getResources().openRawResource(rawResource);
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
int i = inputStream.read();
while (i != -1)
{
byteArrayOutputStream.write(i);
i = inputStream.read();
}
inputStream.close();
return byteArrayOutputStream.toString();
}
该文件是使用 Eclipse 创建的,并且所有字符在 Eclipse 中都显示正常。
这可能与 Eclipse 本身有关吗?我设置了一个断点并在“监视”窗口中检查了 myToken。我尝试手动将奇怪的字符替换为正确的字符(例如 í 或 é),但它不允许我这么做。
I have a text file which contains data I need to preload into a SQLite database. I saved in in res/raw.
I read the whole file using readTxtFromRaw()
, then I use the StringTokenizer
class to process the file line by line.
However the String
returned by readTxtFromRaw
does not show foreign characters that are in the file. I need these as some of the text is Spanish or French. Am I missing something?
Code:
String fileCont = new String(readTxtFromRaw(R.raw.wordstext));
StringTokenizer myToken = new StringTokenizer(fileCont , "\t\n\r\f");
The readTxtFromRaw method is:
private String readTxtFromRaw(Integer rawResource) throws IOException
{
InputStream inputStream = mCtx.getResources().openRawResource(rawResource);
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
int i = inputStream.read();
while (i != -1)
{
byteArrayOutputStream.write(i);
i = inputStream.read();
}
inputStream.close();
return byteArrayOutputStream.toString();
}
The file was created using Eclipse, and all characters appear fine in Eclipse.
Could this have something to do with Eclipse itself? I set a breakpoint and checked out myToken in the Watch window. I tried to manually replace the weird character for the correct one (for example í, or é), and it would not let me.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
你检查过几种编码吗?
byteArrayOutputStream.toString()
根据平台的默认字符编码进行转换。所以我猜它会去除外来字符或以不显示在输出中的方式转换它们。您是否已经尝试过使用
byteArrayOutputStream.toString(String enc)
?尝试使用“UTF-8”或“iso-8859-1”或“UTF-16”作为编码。Have you checked the several encodings?
the
byteArrayOutputStream.toString()
converts according to the platform's default character encoding. So I guess it will strip the foreign characters or convert them in a way that they are not displayed in your output.Have you already tried to use
byteArrayOutputStream.toString(String enc)
? Try "UTF-8" or "iso-8859-1" or "UTF-16" for the encoding.