邮件标头消息的正则表达式

发布于 2024-11-25 13:28:09 字数 1922 浏览 1 评论 0原文

我有一个邮箱文件,其中包含超过 50 兆的消息,这些消息按如下所示分隔:

From - Thu Jul 19 07:11:55 2007

我想在 Java 中为此构建一个正则表达式,以便一次提取每封邮件消息,因此我尝试使用扫描仪,使用以下模式作为分隔符:

public boolean ParseData(DataSource data_source) {

    boolean is_successful_transfer = false;
    String mail_header_regex = "^From\\s";
    LinkedList<String> ip_addresses = new LinkedList<String>();
    ASNRepository asn_repository = new ASNRepository();

    try {       

    Pattern mail_header_pattern = Pattern.compile(mail_header_regex);

    File input_file = data_source.GetInputFile();

    //parse out each message from the mailbox
    Scanner scanner = new Scanner(input_file);      

    while(scanner.hasNext(mail_header_pattern)) {


    String current_line = scanner.next(mail_header_pattern);

    Matcher mail_matcher = mail_header_pattern.matcher(current_line);

        //read each mail message and extract the proper "received from" ip address 
        //to put it in our list of ip's we can add to the database to prepare
        //for querying.
        while(mail_matcher.find()) {
            String message_text = mail_matcher.group();                 
            String ip_address = get_ip_address(message_text);

            //empty ip address means the line contains no received from
            if(!ip_address.trim().isEmpty()) 
                ip_addresses.add(ip_address);
        }

    }//next line

        //add ip addresses from mailbox to database 
        is_successful_transfer = asn_repository.AddIPAddresses(ip_addresses);           

    }

    //error reading file--unsuccessful transfer
    catch(FileNotFoundException ex) {
        is_successful_transfer = false;
    }

    return is_successful_transfer;

}

这似乎应该可以工作,但每当我运行它时,程序就会挂起,可能是因为它找不到模式。这个相同的正则表达式在 Perl 中使用相同的文件,但在 Java 中它总是挂在 String current_line = Scanner.next(mail_header_pattern);

这个正则表达式正确还是我错误地解析了文件?

I have a mailbox file containing over 50 megs of messages separated by something like this:

From - Thu Jul 19 07:11:55 2007

I want to build a regular expression for this in Java to extract each mail message one at a time, so I tried using a Scanner, using the following pattern as the delimiter:

public boolean ParseData(DataSource data_source) {

    boolean is_successful_transfer = false;
    String mail_header_regex = "^From\\s";
    LinkedList<String> ip_addresses = new LinkedList<String>();
    ASNRepository asn_repository = new ASNRepository();

    try {       

    Pattern mail_header_pattern = Pattern.compile(mail_header_regex);

    File input_file = data_source.GetInputFile();

    //parse out each message from the mailbox
    Scanner scanner = new Scanner(input_file);      

    while(scanner.hasNext(mail_header_pattern)) {


    String current_line = scanner.next(mail_header_pattern);

    Matcher mail_matcher = mail_header_pattern.matcher(current_line);

        //read each mail message and extract the proper "received from" ip address 
        //to put it in our list of ip's we can add to the database to prepare
        //for querying.
        while(mail_matcher.find()) {
            String message_text = mail_matcher.group();                 
            String ip_address = get_ip_address(message_text);

            //empty ip address means the line contains no received from
            if(!ip_address.trim().isEmpty()) 
                ip_addresses.add(ip_address);
        }

    }//next line

        //add ip addresses from mailbox to database 
        is_successful_transfer = asn_repository.AddIPAddresses(ip_addresses);           

    }

    //error reading file--unsuccessful transfer
    catch(FileNotFoundException ex) {
        is_successful_transfer = false;
    }

    return is_successful_transfer;

}

This seems like it should work, but whenever I run it, the program hangs, probably due to it not finding the pattern. This same regular expression works in Perl with the same file, but in Java it always hangs on the String current_line = scanner.next(mail_header_pattern);

Is this regular expression correct or am I parsing the file incorrectly?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

柏拉图鍀咏恒 2024-12-02 13:28:09

我倾向于更简单的东西,只需阅读,如下所示:

while(scanner.hasNextLine()) {
    String line = scanner.nextLine();
    if (line.matches("^From\\s.*")) {
       // it's a new email
    } else {
       // it's still part of the email body
    }
}

I'd be leaning toward something much simpler, by just reading lines, something like this:

while(scanner.hasNextLine()) {
    String line = scanner.nextLine();
    if (line.matches("^From\\s.*")) {
       // it's a new email
    } else {
       // it's still part of the email body
    }
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文