如何找出 BufferedReader#readLine() 用于分割行的行分隔符?

发布于 2024-11-09 03:24:07 字数 505 浏览 5 评论 0 原文

我正在通过 BufferedReader 读取文件,

String filename = ...
br = new BufferedReader( new FileInputStream(filename));
while (true) {
   String s = br.readLine();
   if (s == null) break;
   ...
}

我需要知道行是否由 '\n' 或 '\r\n' 分隔 我有什么办法可以查到吗?

我不想打开 FileInputStream,因此首先要扫描它。 理想情况下,我想询问 BufferedReader,因为它必须知道。

我很高兴覆盖 BufferedReader 来破解它,但我真的不想打开文件流两次。

谢谢,

注意:当前行分隔符(由 System.getProperty("line.separator") 返回)无法使用,因为该文件可能已由另一个操作系统上的另一个应用程序写入。

I am reading a file via the BufferedReader

String filename = ...
br = new BufferedReader( new FileInputStream(filename));
while (true) {
   String s = br.readLine();
   if (s == null) break;
   ...
}

I need to know if the lines are separated by '\n' or '\r\n'
is there way I can find out ?

I don't want to open the FileInputStream so to scan it initially.
Ideally I would like to ask the BufferedReader since it must know.

I am happy to override the BufferedReader to hack it but I really don't want to open the filestream twice.

Thanks,

Note: the current line separator (returned by System.getProperty("line.separator") ) can not be used as the file could have been written by another app on another operating system.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(9

梦年海沫深 2024-11-16 03:24:07

为了与 BufferedReader 类同步,您可以使用以下方法来处理 \n、\r、\n\r 和 \r\n 结束行分隔符:

public static String retrieveLineSeparator(File file) throws IOException {
    char current;
    String lineSeparator = "";
    FileInputStream fis = new FileInputStream(file);
    try {
        while (fis.available() > 0) {
            current = (char) fis.read();
            if ((current == '\n') || (current == '\r')) {
                lineSeparator += current;
                if (fis.available() > 0) {
                    char next = (char) fis.read();
                    if ((next != current)
                            && ((next == '\r') || (next == '\n'))) {
                        lineSeparator += next;
                    }
                }
                return lineSeparator;
            }
        }
    } finally {
        if (fis!=null) {
            fis.close();
        }
    }
    return null;
}

To be in phase with the BufferedReader class, you may use the following method that handles \n, \r, \n\r and \r\n end line separators:

public static String retrieveLineSeparator(File file) throws IOException {
    char current;
    String lineSeparator = "";
    FileInputStream fis = new FileInputStream(file);
    try {
        while (fis.available() > 0) {
            current = (char) fis.read();
            if ((current == '\n') || (current == '\r')) {
                lineSeparator += current;
                if (fis.available() > 0) {
                    char next = (char) fis.read();
                    if ((next != current)
                            && ((next == '\r') || (next == '\n'))) {
                        lineSeparator += next;
                    }
                }
                return lineSeparator;
            }
        }
    } finally {
        if (fis!=null) {
            fis.close();
        }
    }
    return null;
}
难如初 2024-11-16 03:24:07

阅读 java 文档(我承认自己是一个Pythonista),似乎没有一种干净的方法来确定特定文件中使用的行结束编码。

我可以推荐的最好的办法是使用 BufferedReader.read() 并迭代文件中的每个字符。像这样的东西:

String filename = ...
br = new BufferedReader( new FileInputStream(filename));
while (true) {
   String l = "";
   Char c = " ";
   while (true){
        c = br.read();
        if not c == "\n"{
            // do stuff, not sure what you want with the endl encoding
            // break to return endl-free line
        }
        if not c == "\r"{
            // do stuff, not sure what you want with the endl encoding
            // break to return endl-free line
            Char ctwo = ' '
            ctwo = br.read();
            if ctwo == "\n"{
                // do extra stuff since you know that you've got a \r\n
            }
        }
        else{
            l = l + c;
        }
   if (l == null) break;
   ...
   l = "";
}

After reading the java docs (I confess to being a pythonista), it seems that there isn't a clean way to determine the line-end encoding used in a specific file.

The best thing I can recommended is that you use BufferedReader.read() and iterate over every character in the file. Something like this:

String filename = ...
br = new BufferedReader( new FileInputStream(filename));
while (true) {
   String l = "";
   Char c = " ";
   while (true){
        c = br.read();
        if not c == "\n"{
            // do stuff, not sure what you want with the endl encoding
            // break to return endl-free line
        }
        if not c == "\r"{
            // do stuff, not sure what you want with the endl encoding
            // break to return endl-free line
            Char ctwo = ' '
            ctwo = br.read();
            if ctwo == "\n"{
                // do extra stuff since you know that you've got a \r\n
            }
        }
        else{
            l = l + c;
        }
   if (l == null) break;
   ...
   l = "";
}
笨死的猪 2024-11-16 03:24:07

BufferedReader.readLine() 不提供任何方法来确定换行符是什么。如果您需要知道,您需要自己阅读字符并自己找到换行符。

您可能对内部 Guava 的 rel="nofollow">LineBuffer 类(以及公共 LineReader 所使用的类)。 LineBuffer 提供回调方法 void handleLine(String line, String end) 其中 end 是换行符。您可能可以在此基础上做一些您想做的事情。 API 可能类似于 public Line readLine(),其中 Line 是一个包含行文本和行结尾的对象。

BufferedReader.readLine() does not provide any means of determining what the line break was. If you need to know, you'll need to read characters in yourself and find line breaks yourself.

You may be interested in the internal LineBuffer class from Guava (as well as the public LineReader class it's used in). LineBuffer provides a callback method void handleLine(String line, String end) where end is the line break characters. You could probably base something to do what you want on that. An API might look something like public Line readLine() where Line is an object that contains both the line text and the line end.

一花一树开 2024-11-16 03:24:07

BufferedReader 不接受 FileInputStreams

不,您无法找到 BufferedReader 正在读取的文件中使用的行终止符。读取文件时该信息会丢失。

不幸的是,下面的所有答案都是不正确的。

编辑:是的,您始终可以扩展 BufferedReader 以包含您想要的附加功能。

BufferedReader does not accept FileInputStreams

No, you cannot find out the line terminator character that was used in the file being read by BufferedReader. That information is lost while reading the file.

Unfornunately all answers below are incorrect.

Edit: And yes you can always extend BufferedReader to include the additional functionality you desire.

埋情葬爱 2024-11-16 03:24:07

答案是你无法找出该行的结尾是什么。

我正在寻找什么可能导致同一函数中的行结尾。查看 BufferedReader 源代码后,我可以发现 BufferedReader.readLine 在 '\r' 或 '\n' 上结束行,并跳过左边的 '\r' 或 '\n'。硬编码,不关心设置。

The answer would be You can't find out what was the line ending.

I am looking for what can cause line endings in the same funcion. After looking at the BufferedReader source code, I can saz that BufferedReader.readLine ends line on '\r' or '\n' and skips leftower '\r' or '\n'. Hardcoded, does not care about settings.

美人迟暮 2024-11-16 03:24:07

如果您碰巧将此文件读入 Swing 文本组件,那么您只需使用 JTextComponent.read(...) 方法即可将该文件加载到文档中。然后您可以使用:

textComponent.getDocument().getProperty( DefaultEditorKit.EndOfLineStringProperty );

获取文件中使用的实际 EOL 字符串。

If you happen to be reading this file into a Swing text component then you can just use the JTextComponent.read(...) method to load the file into the Document. Then you can use:

textComponent.getDocument().getProperty( DefaultEditorKit.EndOfLineStringProperty );

to get actual EOL string that was used in the file.

美人迟暮 2024-11-16 03:24:07

也许您可以使用Scanner来代替。

您可以将正则表达式传递给 Scanner#useDelimiter() 以设置自定义分隔符。

String regex="(\r)?\n";
String filename=....;
Scanner scan = new Scanner(new FileInputStream(filename));
scan.useDelimiter(Pattern.compile(regex));
while (scan.hasNext()) {
    String str= scan.next();
    // todo
}

您可以使用下面的代码将 BufferedReader 转换为 Scanner

 new Scanner(bufferedReader);

Maybe you could use Scanner instead.

You can pass regular expressions to Scanner#useDelimiter() to set custom delimiter.

String regex="(\r)?\n";
String filename=....;
Scanner scan = new Scanner(new FileInputStream(filename));
scan.useDelimiter(Pattern.compile(regex));
while (scan.hasNext()) {
    String str= scan.next();
    // todo
}

You could use this code below to convert BufferedReader to Scanner

 new Scanner(bufferedReader);
只等公子 2024-11-16 03:24:07

不确定是否有用,但有时我需要在读取已经很远的文件后找出行分隔符。

在本例中,我使用以下代码:

/**
* <h1> Identify which line delimiter is used in a string </h1>
*
* This is useful when processing files that were created on different operating systems.
*
* @param str - the string with the mystery line delimiter.
* @return  the line delimiter for windows, {@code \r\n}, <br>
*           unix/linux {@code \n} or legacy mac {@code \r} <br>
*           if none can be identified, it falls back to unix {@code \n}
*/
public static String identifyLineDelimiter(String str) {
    if (str.matches("(?s).*(\\r\\n).*")) {     //Windows //$NON-NLS-1$
        return "\r\n"; //$NON-NLS-1$
    } else if (str.matches("(?s).*(\\n).*")) { //Unix/Linux //$NON-NLS-1$
        return "\n"; //$NON-NLS-1$
    } else if (str.matches("(?s).*(\\r).*")) { //Legacy mac os 9. Newer OS X use \n //$NON-NLS-1$
        return "\r"; //$NON-NLS-1$
    } else {
        return "\n";  //fallback onto '\n' if nothing matches. //$NON-NLS-1$
    }
}

Not sure if useful, but sometimes I need to find out the line delimiter after I've read the file already far-down the road.

In this case I use this code:

/**
* <h1> Identify which line delimiter is used in a string </h1>
*
* This is useful when processing files that were created on different operating systems.
*
* @param str - the string with the mystery line delimiter.
* @return  the line delimiter for windows, {@code \r\n}, <br>
*           unix/linux {@code \n} or legacy mac {@code \r} <br>
*           if none can be identified, it falls back to unix {@code \n}
*/
public static String identifyLineDelimiter(String str) {
    if (str.matches("(?s).*(\\r\\n).*")) {     //Windows //$NON-NLS-1$
        return "\r\n"; //$NON-NLS-1$
    } else if (str.matches("(?s).*(\\n).*")) { //Unix/Linux //$NON-NLS-1$
        return "\n"; //$NON-NLS-1$
    } else if (str.matches("(?s).*(\\r).*")) { //Legacy mac os 9. Newer OS X use \n //$NON-NLS-1$
        return "\r"; //$NON-NLS-1$
    } else {
        return "\n";  //fallback onto '\n' if nothing matches. //$NON-NLS-1$
    }
}
满身野味 2024-11-16 03:24:07

如果您使用 groovy,您可以简单地执行以下操作:

def lineSeparator = new File('path/to/file').text.contains('\r\n') ? '\r\n' : '\n'

If you are using groovy, you can simply do:

def lineSeparator = new File('path/to/file').text.contains('\r\n') ? '\r\n' : '\n'
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文