当前位置：文江博客话题详情

如何找出 BufferedReader#readLine() 用于分割行的行分隔符？

发布于 2024-11-09 03:24:07 字数 505 浏览 5 评论 0 原文

我正在通过 BufferedReader 读取文件，

String filename = ...
br = new BufferedReader( new FileInputStream(filename));
while (true) {
   String s = br.readLine();
   if (s == null) break;
   ...
}

我需要知道行是否由 '\n' 或 '\r\n' 分隔我有什么办法可以查到吗？

我不想打开 FileInputStream，因此首先要扫描它。理想情况下，我想询问 BufferedReader，因为它必须知道。

我很高兴覆盖 BufferedReader 来破解它，但我真的不想打开文件流两次。

谢谢，

注意：当前行分隔符（由 System.getProperty("line.separator") 返回）无法使用，因为该文件可能已由另一个操作系统上的另一个应用程序写入。

原文

I am reading a file via the BufferedReader

String filename = ...
br = new BufferedReader( new FileInputStream(filename));
while (true) {
   String s = br.readLine();
   if (s == null) break;
   ...
}

I need to know if the lines are separated by '\n' or '\r\n'
is there way I can find out ?

I don't want to open the FileInputStream so to scan it initially.
Ideally I would like to ask the BufferedReader since it must know.

I am happy to override the BufferedReader to hack it but I really don't want to open the filestream twice.

Thanks,

Note: the current line separator (returned by System.getProperty("line.separator") ) can not be used as the file could have been written by another app on another operating system.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

梦年海沫深 2024-11-16 03:24:07

为了与 BufferedReader 类同步，您可以使用以下方法来处理 \n、\r、\n\r 和 \r\n 结束行分隔符：

public static String retrieveLineSeparator(File file) throws IOException {
    char current;
    String lineSeparator = "";
    FileInputStream fis = new FileInputStream(file);
    try {
        while (fis.available() > 0) {
            current = (char) fis.read();
            if ((current == '\n') || (current == '\r')) {
                lineSeparator += current;
                if (fis.available() > 0) {
                    char next = (char) fis.read();
                    if ((next != current)
                            && ((next == '\r') || (next == '\n'))) {
                        lineSeparator += next;
                    }
                }
                return lineSeparator;
            }
        }
    } finally {
        if (fis!=null) {
            fis.close();
        }
    }
    return null;
}

To be in phase with the BufferedReader class, you may use the following method that handles \n, \r, \n\r and \r\n end line separators:

public static String retrieveLineSeparator(File file) throws IOException {
    char current;
    String lineSeparator = "";
    FileInputStream fis = new FileInputStream(file);
    try {
        while (fis.available() > 0) {
            current = (char) fis.read();
            if ((current == '\n') || (current == '\r')) {
                lineSeparator += current;
                if (fis.available() > 0) {
                    char next = (char) fis.read();
                    if ((next != current)
                            && ((next == '\r') || (next == '\n'))) {
                        lineSeparator += next;
                    }
                }
                return lineSeparator;
            }
        }
    } finally {
        if (fis!=null) {
            fis.close();
        }
    }
    return null;
}

回复收藏 0 原文

难如初 2024-11-16 03:24:07

阅读 java 文档（我承认自己是一个Pythonista），似乎没有一种干净的方法来确定特定文件中使用的行结束编码。

我可以推荐的最好的办法是使用 BufferedReader.read() 并迭代文件中的每个字符。像这样的东西：

String filename = ...
br = new BufferedReader( new FileInputStream(filename));
while (true) {
   String l = "";
   Char c = " ";
   while (true){
        c = br.read();
        if not c == "\n"{
            // do stuff, not sure what you want with the endl encoding
            // break to return endl-free line
        }
        if not c == "\r"{
            // do stuff, not sure what you want with the endl encoding
            // break to return endl-free line
            Char ctwo = ' '
            ctwo = br.read();
            if ctwo == "\n"{
                // do extra stuff since you know that you've got a \r\n
            }
        }
        else{
            l = l + c;
        }
   if (l == null) break;
   ...
   l = "";
}

After reading the java docs (I confess to being a pythonista), it seems that there isn't a clean way to determine the line-end encoding used in a specific file.

The best thing I can recommended is that you use BufferedReader.read() and iterate over every character in the file. Something like this:

String filename = ...
br = new BufferedReader( new FileInputStream(filename));
while (true) {
   String l = "";
   Char c = " ";
   while (true){
        c = br.read();
        if not c == "\n"{
            // do stuff, not sure what you want with the endl encoding
            // break to return endl-free line
        }
        if not c == "\r"{
            // do stuff, not sure what you want with the endl encoding
            // break to return endl-free line
            Char ctwo = ' '
            ctwo = br.read();
            if ctwo == "\n"{
                // do extra stuff since you know that you've got a \r\n
            }
        }
        else{
            l = l + c;
        }
   if (l == null) break;
   ...
   l = "";
}

回复收藏 0 原文

笨死的猪 2024-11-16 03:24:07

BufferedReader.readLine() 不提供任何方法来确定换行符是什么。如果您需要知道，您需要自己阅读字符并自己找到换行符。

您可能对内部 Guava 的 rel="nofollow">LineBuffer 类（以及公共 LineReader 所使用的类）。 LineBuffer 提供回调方法 void handleLine(String line, String end) 其中 end 是换行符。您可能可以在此基础上做一些您想做的事情。 API 可能类似于 public Line readLine()，其中 Line 是一个包含行文本和行结尾的对象。

回复收藏 0 原文

一花一树开 2024-11-16 03:24:07

BufferedReader 不接受 FileInputStreams

不，您无法找到 BufferedReader 正在读取的文件中使用的行终止符。读取文件时该信息会丢失。

不幸的是，下面的所有答案都是不正确的。

编辑：是的，您始终可以扩展 BufferedReader 以包含您想要的附加功能。

回复收藏 0 原文

埋情葬爱 2024-11-16 03:24:07

答案是你无法找出该行的结尾是什么。

我正在寻找什么可能导致同一函数中的行结尾。查看 BufferedReader 源代码后，我可以发现 BufferedReader.readLine 在 '\r' 或 '\n' 上结束行，并跳过左边的 '\r' 或 '\n'。硬编码，不关心设置。

回复收藏 0 原文

美人迟暮 2024-11-16 03:24:07

如果您碰巧将此文件读入 Swing 文本组件，那么您只需使用 JTextComponent.read(...) 方法即可将该文件加载到文档中。然后您可以使用：

textComponent.getDocument().getProperty( DefaultEditorKit.EndOfLineStringProperty );

获取文件中使用的实际 EOL 字符串。

If you happen to be reading this file into a Swing text component then you can just use the JTextComponent.read(...) method to load the file into the Document. Then you can use:

textComponent.getDocument().getProperty( DefaultEditorKit.EndOfLineStringProperty );

to get actual EOL string that was used in the file.

回复收藏 0 原文

美人迟暮 2024-11-16 03:24:07

也许您可以使用Scanner来代替。

您可以将正则表达式传递给 Scanner#useDelimiter() 以设置自定义分隔符。

String regex="(\r)?\n";
String filename=....;
Scanner scan = new Scanner(new FileInputStream(filename));
scan.useDelimiter(Pattern.compile(regex));
while (scan.hasNext()) {
    String str= scan.next();
    // todo
}

您可以使用下面的代码将 BufferedReader 转换为 Scanner

 new Scanner(bufferedReader);

Maybe you could use Scanner instead.

You can pass regular expressions to Scanner#useDelimiter() to set custom delimiter.

String regex="(\r)?\n";
String filename=....;
Scanner scan = new Scanner(new FileInputStream(filename));
scan.useDelimiter(Pattern.compile(regex));
while (scan.hasNext()) {
    String str= scan.next();
    // todo
}

You could use this code below to convert BufferedReader to Scanner

 new Scanner(bufferedReader);

回复收藏 0 原文

只等公子 2024-11-16 03:24:07

不确定是否有用，但有时我需要在读取已经很远的文件后找出行分隔符。

在本例中，我使用以下代码：

/**
* <h1> Identify which line delimiter is used in a string </h1>
*
* This is useful when processing files that were created on different operating systems.
*
* @param str - the string with the mystery line delimiter.
* @return  the line delimiter for windows, {@code \r\n}, <br>
*           unix/linux {@code \n} or legacy mac {@code \r} <br>
*           if none can be identified, it falls back to unix {@code \n}
*/
public static String identifyLineDelimiter(String str) {
    if (str.matches("(?s).*(\\r\\n).*")) {     //Windows //$NON-NLS-1$
        return "\r\n"; //$NON-NLS-1$
    } else if (str.matches("(?s).*(\\n).*")) { //Unix/Linux //$NON-NLS-1$
        return "\n"; //$NON-NLS-1$
    } else if (str.matches("(?s).*(\\r).*")) { //Legacy mac os 9. Newer OS X use \n //$NON-NLS-1$
        return "\r"; //$NON-NLS-1$
    } else {
        return "\n";  //fallback onto '\n' if nothing matches. //$NON-NLS-1$
    }
}

Not sure if useful, but sometimes I need to find out the line delimiter after I've read the file already far-down the road.

In this case I use this code:

/**
* <h1> Identify which line delimiter is used in a string </h1>
*
* This is useful when processing files that were created on different operating systems.
*
* @param str - the string with the mystery line delimiter.
* @return  the line delimiter for windows, {@code \r\n}, <br>
*           unix/linux {@code \n} or legacy mac {@code \r} <br>
*           if none can be identified, it falls back to unix {@code \n}
*/
public static String identifyLineDelimiter(String str) {
    if (str.matches("(?s).*(\\r\\n).*")) {     //Windows //$NON-NLS-1$
        return "\r\n"; //$NON-NLS-1$
    } else if (str.matches("(?s).*(\\n).*")) { //Unix/Linux //$NON-NLS-1$
        return "\n"; //$NON-NLS-1$
    } else if (str.matches("(?s).*(\\r).*")) { //Legacy mac os 9. Newer OS X use \n //$NON-NLS-1$
        return "\r"; //$NON-NLS-1$
    } else {
        return "\n";  //fallback onto '\n' if nothing matches. //$NON-NLS-1$
    }
}

回复收藏 0 原文

满身野味 2024-11-16 03:24:07

如果您使用 groovy，您可以简单地执行以下操作：

def lineSeparator = new File('path/to/file').text.contains('\r\n') ? '\r\n' : '\n'

If you are using groovy, you can simply do:

def lineSeparator = new File('path/to/file').text.contains('\r\n') ? '\r\n' : '\n'

回复收藏 0 原文

~没有更多了~

关于作者

筱武穆

暂无简介

文章

923 人气

关注发私信

友情链接

文江博客

如何找出 BufferedReader#readLine() 用于分割行的行分隔符？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（9）

关于作者

相关话题

热门标签

推荐作者

西西弗的石头怪

5397313

烟沫凡尘

一个破名字

萌︼了一个春

当爱已成负担

友情链接

如何找出 BufferedReader#readLine() 用于分割行的行分隔符？

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（9）

关于作者

相关话题

热门标签

推荐作者

西西弗的石头怪

5397313

烟沫凡尘

一个破名字

萌︼了一个春

当爱已成负担

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。