尝试将二进制文件读取为文本,但扫描仪停在第一行
我正在尝试读取二进制文件,但我的程序只是停在第一行。 我认为这是因为该文件具有奇怪的字符..我只想从中提取一些指示。有办法做到这一点吗?...
public static void main(String[] args) throws IOException
{
Scanner readF = new Scanner(new File("D:\\CurrentDatabase_372.txt"));
String line = null;
String newLine = System.getProperty("line.separator");
FileWriter writeF = new FileWriter("D:\\Songs.txt");
while (readF.hasNext())
{
line = readF.nextLine();
if (line.contains("D:\\") && line.contains(".mp3"))
{
writeF.write(line.substring(line.indexOf("D:\\"), line.indexOf(".mp3") + 4) + newLine);
}
}
readF.close();
writeF.close();
}
文件的开头是这样的:
pppppamepD:\Music\Korn\Untouchables\03 Blame.mp3pmp3pmp3pKornpMetalpKornpUntouchablespKornpUntouchables*;*KornpKornpKornUntouchables003pMetalKornUntouchables003pBlameKornUntouchables003pKornKornUntouchables003pMP3pppppCpppÀppp@ppøp·pppŸú#pdppppppòrSpUpppppp€ppªp8›qpppppppppppp,’ppÒppp’ÍpET?ppppppôpp¼}`Ñ#ãâK†¡H¤*(DppppppppppppppppuÞѤéú:M®$@]jkÝW0ÛœFµú½XVNp`w—wâÊp:ºŽwâÊpppp8Npdpp¡pp{)pppppppppppppppppyY:¸[ªA¥Bi `Û¯pppppppppppp2pppppppppppppppppppppppppppppppppppp¿ÞpAppppppp€ppp€;€?€CpCpC€H€N€S€`€e€y€~p~p~€’€«€Ê€â€Hollow LifepD:\Musica\Korn\Untouchables\04 Hollow Life.mp3pmp3pmp3pKornpMetalpKornpUntouchablespKornpUntouchables*;*KornpKornpKornUntouchables004pMetalKornUntouchables004pHollow LifeKornUntouchables004pKornKornUntouchables004pMP3pppppCpppÀHppppppøp¸pppǺxp‰ppppppòrSpUpppppp€ppªp8›qpppppppppppp,’ppÒpppŠºppppppppppôpp¼}`Ñ#ãâK†¡H¤*(DpppppppppppppppppãG#™R‚CA—®þ^bN °mbŽ‚^¨pG¦sp;5p5ÓÐùšwâÊp
)ŽwâÊpppp8Npdpp!cpp{pppppppppppppppppyY:¸[ªA¥Bi `ۯǺxp‰pppppp2pppppppppppppppppppppppppppppppppppp¿
我想提取像“D:\Music\Korn\Untouchables\03 Blame.mp3”这样的文件方向。
I'm trying to read a binary file but my program just stops at first line..
I think it's because of the strange characters the file has..I just want to extract some directions from it. Is there a way to do this?..
public static void main(String[] args) throws IOException
{
Scanner readF = new Scanner(new File("D:\\CurrentDatabase_372.txt"));
String line = null;
String newLine = System.getProperty("line.separator");
FileWriter writeF = new FileWriter("D:\\Songs.txt");
while (readF.hasNext())
{
line = readF.nextLine();
if (line.contains("D:\\") && line.contains(".mp3"))
{
writeF.write(line.substring(line.indexOf("D:\\"), line.indexOf(".mp3") + 4) + newLine);
}
}
readF.close();
writeF.close();
}
The file starts like this:
pppppamepD:\Music\Korn\Untouchables\03 Blame.mp3pmp3pmp3pKornpMetalpKornpUntouchablespKornpUntouchables*;*KornpKornpKornUntouchables003pMetalKornUntouchables003pBlameKornUntouchables003pKornKornUntouchables003pMP3pppppCpppÀppp@ppøp·pppŸú#pdppppppòrSpUpppppp€ppªp8›qpppppppppppp,’ppÒppp’ÍpET?ppppppôpp¼}`Ñ#ãâK†¡H¤*(DppppppppppppppppuÞѤéú:M®$@]jkÝW0ÛœFµú½XVNp`w—wâÊp:ºŽwâÊpppp8Npdpp¡pp{)pppppppppppppppppyY:¸[ªA¥Bi `Û¯pppppppppppp2pppppppppppppppppppppppppppppppppppp¿ÞpAppppppp€ppp€;€?€CpCpC€H€N€S€`€e€y€~p~p~€’€«€Ê€â€Hollow LifepD:\Musica\Korn\Untouchables\04 Hollow Life.mp3pmp3pmp3pKornpMetalpKornpUntouchablespKornpUntouchables*;*KornpKornpKornUntouchables004pMetalKornUntouchables004pHollow LifeKornUntouchables004pKornKornUntouchables004pMP3pppppCpppÀHppppppøp¸pppǺxp‰ppppppòrSpUpppppp€ppªp8›qpppppppppppp,’ppÒpppŠºppppppppppôpp¼}`Ñ#ãâK†¡H¤*(DpppppppppppppppppãG#™R‚CA—®þ^bN °mbŽ‚^¨pG¦sp;5p5ÓÐùšwâÊp
)ŽwâÊpppp8Npdpp!cpp{pppppppppppppppppyY:¸[ªA¥Bi `ۯǺxp‰pppppp2pppppppppppppppppppppppppppppppppppp¿
I want to extract file directions like "D:\Music\Korn\Untouchables\03 Blame.mp3".
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(3)
您不能使用面向行的扫描仪来读取二进制文件。您无法保证二进制文件甚至具有由换行符分隔的“行”。例如,如果有两个文件与模式“D:\.*.mp3”匹配且没有中间换行符,那么扫描仪会做什么?您将提取第一个“D:\”和最后一个“.mp3”之间的所有内容,以及中间的所有垃圾。从像这样的非分隔流中提取文件名需要不同的策略。
如果我写这篇文章,我会使用一种相对简单的有限状态识别器,一次处理一个字符。当它遇到“d”时,它开始保存字符,检查每个字符以确保其与所需的模式匹配,当它在“.mp3”中看到“3”时结束。如果在任何时候它检测到不适合的字符,它就会重置并继续查找。
编辑:如果要处理的文件很小(小于 50mb 左右),您可以将整个文件加载到内存中,这将使扫描更简单。
You cannot use a line-oriented scanner to read binary files. You have no guarantee that the binary file even has "lines" delimited by newline characters. For example, what would your scanner do if there were TWO files matching the pattern "D:\.*.mp3" with no intervening newline? You would extract everything between the first "D:\" and the last ".mp3", with all the garbage in between. Extracting file names from a non-delimited stream such as this requires a different strategy.
If i were writing this I'd use a relatively simple finite-state recognizer that processes characters one at a time. When it encounters a "d" it starts saving characters, checking each character to ensure that it matches the required pattern, ending when it sees the "3" in ".mp3". If at any point it detects a character that doesn't fit, it resets and continues looking.
EDIT: If the files to be processed are small (less than 50mb or so) you could load the entire file into memory, which would make scanning simpler.
如前所述,由于它是二进制文件,因此您不能使用扫描仪或其他基于字符的阅读器。您可以使用常规的 FileInputStream 来读取文件的实际原始字节。 Java 的
String
类有一个构造函数,它将获取字节数组并将它们转换为字符串。然后,您可以在该字符串中搜索文件名。如果您只使用默认字符集,这可能会起作用。字符串(字节[]):
http://download.oracle.com/javase /1.4.2/docs/api/java/lang/String.html
用于读取字节的 FileInputStream:
http://download.oracle.com/javase/tutorial/essential/io /bytestreams.html
As was said, since it is a binary file you can't use a Scanner or other character based readers. You could use a regular
FileInputStream
to read the actual raw bytes of the file. Java'sString
class has a constructor that will take an array of bytes and turn them into a string. You can then search that string for the file name(s). This may work if you just use the default character set.String(byte[]):
http://download.oracle.com/javase/1.4.2/docs/api/java/lang/String.html
FileInputStream for reading bytes:
http://download.oracle.com/javase/tutorial/essential/io/bytestreams.html
使用 hasNextLine()< /a> 而不是 while 循环检查中的
hasNext()
。Use hasNextLine() instead of
hasNext()
in the while loop check.