java,ByteBuffer从文件中解析数据

发布于 2024-10-16 18:40:59 字数 284 浏览 1 评论 0原文

在java中,我想快速解析包含异构数据(数字和字符)的文件。

我一直在阅读有关 ByteBuffer 和内存映射文件的内容。

我可以复制它,但是解析数据时它变得很棘手。我想分配各种字节。但它变得依赖于编码?

例如,如果文件的格式是:

someString 8
一些其他字符串 88

如何将其解析为 StringInteger 对象?

谢谢!

乌多.

In java, I want to parse a file, with heterogenous data (numbers and characters), fast.

I've been reading about ByteBuffer and memory mapped files.

I can copy it, but when parsing data it becomes tricky. I'd like to do it allocating various bytes. But it become then dependent on the encoding?

If the format of the file is, for instance:

someString 8
some other string 88

How can I parse it to String or Integer objects?

Thanks!

Udo.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

何其悲哀 2024-10-23 18:40:59

假设您的格式类似于

{string possibly with spaces} {integer}\r?\n

您需要搜索换行符,然后向后工作,直到找到第一个空格。您可以自己解码该数字并将其转换为 int 或将其转换为 String 并解析它。除非必要,否则我不会使用 Integer。现在您知道行的开头在哪里以及整数的开头,您可以将字符串提取为字节并使用所需的编码将其转换为字符串。

这假设换行符和空格是编码中的一个字节。如果它们是多字节字节,那就更复杂了,但仍然可以完成。

编辑:以下示例打印...

text: ' someString', number: 8
text: 'some other string', number: -88

代码

ByteBuffer bb = ByteBuffer.wrap(" someString 8\r\nsome other string -88\n".getBytes());
while(bb.remaining()>0) {
    int start = bb.position(),end, ptr;
    for(end = start;end < bb.limit();end++) {
        byte b = bb.get(end);
        if (b == '\r' || b == '\n')
            break;
    }
    // read the number backwards
    long value = 0;
    long tens = 1;
    for(ptr = end-1;ptr>= start;ptr--) {
        byte b = bb.get(ptr);
        if (b >= '0' && b <= '9') {
            value += tens * (b - '0');
            tens *= 10;
        } else if (b == '-') {
            value = -value;
            ptr--;
            break;
        } else {
            break;
        }
    }
    // assume separator is a space....
    byte[] bytes = new byte[ptr-start];
    bb.get(bytes);
    String text = new String(bytes, "UTF-8");
    System.out.println("text: '"+text+"', number: "+value);

    // find the end of the line.
    if (bb.get(end) == '\r') end++;
    bb.position(end+1);
}

Assuming your format is something like

{string possibly with spaces} {integer}\r?\n

You need to search for the newline, and work backward until you find the first space. You can decode the number yourself and turn it into an int or turn it into a String and parse it. I wouldn't use an Integer unless you had to. Now you know where the start of the line is and the start of the integer you can extract the String as bytes and convert it into a String using your desired encoding.

This assumes that newline and space are one byte in your encoding. It would be more complicated if they are multi-byte byte it can still be done.

EDIT: The following example prints...

text: ' someString', number: 8
text: 'some other string', number: -88

Code

ByteBuffer bb = ByteBuffer.wrap(" someString 8\r\nsome other string -88\n".getBytes());
while(bb.remaining()>0) {
    int start = bb.position(),end, ptr;
    for(end = start;end < bb.limit();end++) {
        byte b = bb.get(end);
        if (b == '\r' || b == '\n')
            break;
    }
    // read the number backwards
    long value = 0;
    long tens = 1;
    for(ptr = end-1;ptr>= start;ptr--) {
        byte b = bb.get(ptr);
        if (b >= '0' && b <= '9') {
            value += tens * (b - '0');
            tens *= 10;
        } else if (b == '-') {
            value = -value;
            ptr--;
            break;
        } else {
            break;
        }
    }
    // assume separator is a space....
    byte[] bytes = new byte[ptr-start];
    bb.get(bytes);
    String text = new String(bytes, "UTF-8");
    System.out.println("text: '"+text+"', number: "+value);

    // find the end of the line.
    if (bb.get(end) == '\r') end++;
    bb.position(end+1);
}
眼眸印温柔 2024-10-23 18:40:59

您可以这样尝试:

CharacterIterator it = new StringCharacterIterator(StringBuffer.toString());
for (char c = it.first(); c != CharacterIterator.DONE; c = it.next()) {
    if (Character.isDigit(c)) {
        // character is digit
    } else {
        // character is not-digit
    }
}

或者如果您愿意,也可以使用正则表达式

String str = StringBuffer.toString();
String numbers = str.replaceAll("\\D", "");
String letters = str.replaceAll("\\W", "");

然后您需要像往常一样对字符串numbers中的字符执行Integer.parseInt()

You can try it this way:

CharacterIterator it = new StringCharacterIterator(StringBuffer.toString());
for (char c = it.first(); c != CharacterIterator.DONE; c = it.next()) {
    if (Character.isDigit(c)) {
        // character is digit
    } else {
        // character is not-digit
    }
}

Or you can use regex if you prefer

String str = StringBuffer.toString();
String numbers = str.replaceAll("\\D", "");
String letters = str.replaceAll("\\W", "");

Then you need to perform Integer.parseInt() as usual on the characters in your string numbers.

鹿港巷口少年归 2024-10-23 18:40:59

您是否在寻找 java.util.Scanner< /代码>?除非您有真正奇特的性能要求,否则应该足够快:

    Scanner s = new Scanner(new File("C:\\test.txt"));
    while (s.hasNext()) {
        String label = s.next();
        int number = s.nextInt();

        System.out.println(number + " " + label);
    }

Are you looking for java.util.Scanner? Unless you have really exotic performance requirements, that should be fast enough:

    Scanner s = new Scanner(new File("C:\\test.txt"));
    while (s.hasNext()) {
        String label = s.next();
        int number = s.nextInt();

        System.out.println(number + " " + label);
    }
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文