如何以最佳性能进行过滤? (JAVA)

发布于 2024-08-18 22:19:13 字数 1603 浏览 1 评论 0原文

我在一个小办公室工作,我有一个应用程序,它生成一个包含 14000 行的大文本文件;

每次生成后我都必须对其进行过滤,这真的很无聊;

我想用java编写一个应用程序,直到我能尽快处理它。

请帮我;我用扫描仪编写了一个应用程序(当然有帮助:))但它不好 因为它很慢;

例如,这是我的文件:

SET CELL:NAME=CELL:0,CELLID=3;
SET LSCID:NAME=LSC:0,NETITYPE=MDCS,T32=5,EACT=FILTER-NOFILTER-MINR-FILTER-NOFILTER,ENSUP=GV2&NCR,MINCELL=6,MSV=PFR,OVLHR=9500,OTHR=80,BVLH=TRUE,CELLID=3,BTLH=TRUE,MSLH=TRUE,EIHO=DISABLED,ENCHO=ENABLED,NARD=NAP_STLP,AMH=ENABLED(3)-ENABLED(6)-ENABLED(9)

并且我想要这个输出(过滤器:)

CELLID :  3
ENSUP  :  GV2&NCR
ENCHO  :  ENABLED
MSLH   :  TRUE
------------------------
Count of CELLID : 2

哪个解决方案是最好的,比另一个解决方案最快?

这是我的源代码:

public static void main(String[] args) throws FileNotFoundException {
        Scanner scanner = new Scanner(new File("i:\\1\\2.txt"));
        scanner.useDelimiter(";|,");
        Pattern words = Pattern.compile("(CELLID=|ENSUP=|ENCHO=)");

        while (scanner.hasNextLine()) {
          String key = scanner.findInLine(words);

          while (key != null) {
            String value = scanner.next();
            if (key.equals("CELLID=")) 
              System.out.print("CELLID:" + value+"\n");
             //continue with else ifs for other keys
              else if (key.equals("ENSUP="))
            System.out.print("ENSUP:" + value+"\n");

            else if (key.equals("ENCHO="))
            System.out.print("ENCHO:" + value+"\n");
            key = scanner.findInLine(words);
          }
          scanner.nextLine();
        }

}

确实非常感谢......

I'm working at a small office,I have an application,it's generate a big text file with 14000 lines;

after each generate i must filter it and it's really boring;

I wanna write an application with java till I'll can handle it as soon as possible.

Please help me; I wrote an application with scanner (Of course with help :) ) but it's not good
becase it was very slow;

For example it's my file :

SET CELL:NAME=CELL:0,CELLID=3;
SET LSCID:NAME=LSC:0,NETITYPE=MDCS,T32=5,EACT=FILTER-NOFILTER-MINR-FILTER-NOFILTER,ENSUP=GV2&NCR,MINCELL=6,MSV=PFR,OVLHR=9500,OTHR=80,BVLH=TRUE,CELLID=3,BTLH=TRUE,MSLH=TRUE,EIHO=DISABLED,ENCHO=ENABLED,NARD=NAP_STLP,AMH=ENABLED(3)-ENABLED(6)-ENABLED(9)

and I want this output (filter :)

CELLID :  3
ENSUP  :  GV2&NCR
ENCHO  :  ENABLED
MSLH   :  TRUE
------------------------
Count of CELLID : 2

which solution is the best and the fastest than the other ?

it's my source code :

public static void main(String[] args) throws FileNotFoundException {
        Scanner scanner = new Scanner(new File("i:\\1\\2.txt"));
        scanner.useDelimiter(";|,");
        Pattern words = Pattern.compile("(CELLID=|ENSUP=|ENCHO=)");

        while (scanner.hasNextLine()) {
          String key = scanner.findInLine(words);

          while (key != null) {
            String value = scanner.next();
            if (key.equals("CELLID=")) 
              System.out.print("CELLID:" + value+"\n");
             //continue with else ifs for other keys
              else if (key.equals("ENSUP="))
            System.out.print("ENSUP:" + value+"\n");

            else if (key.equals("ENCHO="))
            System.out.print("ENCHO:" + value+"\n");
            key = scanner.findInLine(words);
          }
          scanner.nextLine();
        }

}

Thank you very much indeed ...

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

人间☆小暴躁 2024-08-25 22:19:13

由于您的代码存在性能问题,因此您首先需要找到瓶颈。您可以使用您使用的 IDE 提供的分析器来分析它。

然而,由于您的代码计算量不高,但 IO 密集,无论是使用 System.out.print 读取文件还是输出,这就是我建议您改进的地方,以改进文件 IO。

替换这行代码

Scanner scanner = new Scanner(new File("i:\\1\\2.txt"));

通过这行代码

File file = new File("i:\\1\\2.txt");
BufferedReader br = new BufferedReader( new FileReader(file)  );
Scanner scanner = new Scanner(br);

让我们知道这是否有帮助。

由于以前的解决方案没有多大帮助,因此我做了一些更改来改进您的代码。如果有的话,您可能必须纠正解析错误。我能够在大约 5 秒内显示解析 392832 行的输出。原始解决方案需要 50 秒以上。

变化如下:

  1. 使用 StringTokenizer 代替
    扫描仪
  2. 使用 BufferedReader 读取文件
  3. 使用 StringBuilder 缓冲输出

public class FileParse {

    private static final int FLUSH_LIMIT = 1024 * 1024;
    private static StringBuilder outputBuffer = new StringBuilder(
            FLUSH_LIMIT + 1024);
    private static final long countCellId;

    public static void main(String[] args) throws IOException {
        long start = System.currentTimeMillis();
        String fileName = "i:\\1\\2.txt";
        File file = new File(fileName);
        BufferedReader br = new BufferedReader(new FileReader(file));
        String line;
        while ((line = br.readLine()) != null) {
            StringTokenizer st = new StringTokenizer(line, ";|, ");
            while (st.hasMoreTokens()) {
                String token = st.nextToken();
                processToken(token);
            }
        }
        flushOutputBuffer();
        System.out.println("----------------------------");
        System.out.println("CELLID Count: " + countCellId);
        long end = System.currentTimeMillis();
        System.out.println("Time: " + (end - start));
    }

    private static void processToken(String token) {
        if (token.startsWith("CELLID=")) {
            String value = getTokenValue(token);
            outputBuffer.append("CELLID:").append(value).append("\n");
            countCellId++;
        } else if (token.startsWith("ENSUP=")) {
            String value = getTokenValue(token);
            outputBuffer.append("ENSUP:").append(value).append("\n");
        } else if (token.startsWith("ENCHO=")) {
            String value = getTokenValue(token);
            outputBuffer.append("ENCHO:").append(value).append("\n");
        }
        if (outputBuffer.length() > FLUSH_LIMIT) {
            flushOutputBuffer();
        }
    }

    private static String getTokenValue(String token) {
        int start = token.indexOf('=') + 1;
        int end = token.length();
        String value = token.substring(start, end);
        return value;
    }

    private static void flushOutputBuffer() {
        System.out.print(outputBuffer);
        outputBuffer = new StringBuilder(FLUSH_LIMIT + 1024);
    }

}

关于 ENSUP 和 MSLH 的更新:

对我来说,您似乎已经在 if 语句中切换了 ENSUP 和 MSLH,如下所示。因此,您会看到“ENSUP”的“MSLH”值,反之亦然。

} else if (token.startsWith("MSLH=")) {
    String value = getTokenValue(token);
    outputBuffer.append("ENSUP:").append(value).append("\n");
} else if (token.startsWith("ENSUP=")) {
    String value = getTokenValue(token);
    outputBuffer.append("MSLH:").append(value).append("\n");
}

Since your code has performance issues, you first need to find bottle neck. You can profile it with profiler available with IDE you use.

However since your code is not high in computation but IO intensive, both in reading file and output using System.out.print, that is where I would suggest you to improve on for improving on file IO.

.

Replace this line of code

Scanner scanner = new Scanner(new File("i:\\1\\2.txt"));

.

With this lines of code

File file = new File("i:\\1\\2.txt");
BufferedReader br = new BufferedReader( new FileReader(file)  );
Scanner scanner = new Scanner(br);

Let us know if this helps.

.

Since previous solution did not helped much, I made few more changes to improve your code. You may have to correct errors in parsing if any. I was able to display output of parsing 392832 lines in approx 5 seconds. Original solution takes more than 50 seconds.

Chages are as below:

  1. Use of StringTokenizer instead of
    Scanner
  2. Use of BufferedReader for reading file
  3. Use of StringBuilder to buffer output

.

public class FileParse {

    private static final int FLUSH_LIMIT = 1024 * 1024;
    private static StringBuilder outputBuffer = new StringBuilder(
            FLUSH_LIMIT + 1024);
    private static final long countCellId;

    public static void main(String[] args) throws IOException {
        long start = System.currentTimeMillis();
        String fileName = "i:\\1\\2.txt";
        File file = new File(fileName);
        BufferedReader br = new BufferedReader(new FileReader(file));
        String line;
        while ((line = br.readLine()) != null) {
            StringTokenizer st = new StringTokenizer(line, ";|, ");
            while (st.hasMoreTokens()) {
                String token = st.nextToken();
                processToken(token);
            }
        }
        flushOutputBuffer();
        System.out.println("----------------------------");
        System.out.println("CELLID Count: " + countCellId);
        long end = System.currentTimeMillis();
        System.out.println("Time: " + (end - start));
    }

    private static void processToken(String token) {
        if (token.startsWith("CELLID=")) {
            String value = getTokenValue(token);
            outputBuffer.append("CELLID:").append(value).append("\n");
            countCellId++;
        } else if (token.startsWith("ENSUP=")) {
            String value = getTokenValue(token);
            outputBuffer.append("ENSUP:").append(value).append("\n");
        } else if (token.startsWith("ENCHO=")) {
            String value = getTokenValue(token);
            outputBuffer.append("ENCHO:").append(value).append("\n");
        }
        if (outputBuffer.length() > FLUSH_LIMIT) {
            flushOutputBuffer();
        }
    }

    private static String getTokenValue(String token) {
        int start = token.indexOf('=') + 1;
        int end = token.length();
        String value = token.substring(start, end);
        return value;
    }

    private static void flushOutputBuffer() {
        System.out.print(outputBuffer);
        outputBuffer = new StringBuilder(FLUSH_LIMIT + 1024);
    }

}

.

Update on ENSUP and MSLH:

To me it looks like you have switched ENSUP and MSLH in if statement as below. Hence you see "MSLH" value for "ENSUP" and vice a versa.

} else if (token.startsWith("MSLH=")) {
    String value = getTokenValue(token);
    outputBuffer.append("ENSUP:").append(value).append("\n");
} else if (token.startsWith("ENSUP=")) {
    String value = getTokenValue(token);
    outputBuffer.append("MSLH:").append(value).append("\n");
}
杀お生予夺 2024-08-25 22:19:13

简单的文本过滤可能更容易用 Perl(我的选择,因为我已经使用它多年)或 Python(我推荐给新手,因为它是一种更现代的语言)来编写。

Simple text filtering is probably easier to write in Perl (my choice because I've been using it for years) or Python (what I recommend to new people because it's a more modern language).

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文