为什么使用 inputStream 无法读取外来字符?
我有一个文本文件,其中包含需要预加载到 SQLite 数据库中的数据。我保存在 res/raw 中。
我使用 readTxtFromRaw() 读取整个文件,然后使用 StringTokenizer 类逐行处理文件。
但是,readTxtFromRaw
返回的String
不显示文件中的外来字符。我需要这些,因为有些文字是西班牙语或法语。我错过了什么吗?
代码:
String fileCont = new String(readTxtFromRaw(R.raw.wordstext));
StringTokenizer myToken = new StringTokenizer(fileCont , "\t\n\r\f");
readTxtFromRaw 方法是:
private String readTxtFromRaw(Integer rawResource) throws IOException
{
InputStream inputStream = mCtx.getResources().openRawResource(rawResource);
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
int i = inputStream.read();
while (i != -1)
{
byteArrayOutputStream.write(i);
i = inputStream.read();
}
inputStream.close();
return byteArrayOutputStream.toString();
}
该文件是使用 Eclipse 创建的,并且所有字符在 Eclipse 中都显示正常。
这可能与 Eclipse 本身有关吗?我设置了一个断点并在“监视”窗口中检查了 myToken。我尝试手动将奇怪的字符替换为正确的字符(例如 í 或 é),但它不允许我这么做。
I have a text file which contains data I need to preload into a SQLite database. I saved in in res/raw.
I read the whole file using readTxtFromRaw()
, then I use the StringTokenizer
class to process the file line by line.
However the String
returned by readTxtFromRaw
does not show foreign characters that are in the file. I need these as some of the text is Spanish or French. Am I missing something?
Code:
String fileCont = new String(readTxtFromRaw(R.raw.wordstext));
StringTokenizer myToken = new StringTokenizer(fileCont , "\t\n\r\f");
The readTxtFromRaw method is:
private String readTxtFromRaw(Integer rawResource) throws IOException
{
InputStream inputStream = mCtx.getResources().openRawResource(rawResource);
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();
int i = inputStream.read();
while (i != -1)
{
byteArrayOutputStream.write(i);
i = inputStream.read();
}
inputStream.close();
return byteArrayOutputStream.toString();
}
The file was created using Eclipse, and all characters appear fine in Eclipse.
Could this have something to do with Eclipse itself? I set a breakpoint and checked out myToken in the Watch window. I tried to manually replace the weird character for the correct one (for example í, or é), and it would not let me.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
data:image/s3,"s3://crabby-images/d5906/d59060df4059a6cc364216c4d63ceec29ef7fe66" alt="扫码二维码加入Web技术交流群"
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(1)
你检查过几种编码吗?
byteArrayOutputStream.toString()
根据平台的默认字符编码进行转换。所以我猜它会去除外来字符或以不显示在输出中的方式转换它们。您是否已经尝试过使用
byteArrayOutputStream.toString(String enc)
?尝试使用“UTF-8”或“iso-8859-1”或“UTF-16”作为编码。Have you checked the several encodings?
the
byteArrayOutputStream.toString()
converts according to the platform's default character encoding. So I guess it will strip the foreign characters or convert them in a way that they are not displayed in your output.Have you already tried to use
byteArrayOutputStream.toString(String enc)
? Try "UTF-8" or "iso-8859-1" or "UTF-16" for the encoding.