包含拼写错误的单词的类似子句中的文本搜索

发布于 2025-01-24 03:40:49 字数 707 浏览 2 评论 0原文

我们需要从免费文本子句中提取一些信息。让我们认为我们有一个关于AA船离开港口而去另一个港口的条款。相同的含义可以通过几种方式表达出来:

    The Ship A departed from the Port X on Monday, to reach Port Y.
    The ship A left the Port X on Monday to reach Port Y.
    The Ship A arrived to Port Y, it left Port X on Monday.
    Port Y will be visited by Ship A which left Port X on Monday.

作者也可能会拼错字:

   departed -> deported, dearted, depared, departeed, deparded
   reach -> reaach, rech, rreach, reac
   arrived -> arived, arivved, arrivd 

那么从这些条款中提取“ a shaph a”,“ a a a”,“ port x”,“ port y”,“ port y”的最佳方法是什么? 编程语言是Java。 我们应该使用Reqular表达式或Lucene Fuzzy Search或Elasticsearch等。 还是它们的某种组合?

谢谢

We need to extract some information from a free text clause. Let's think we have a clause about a a ship leaving a port and going another port. The same meaning can be expressed in several ways like this:

    The Ship A departed from the Port X on Monday, to reach Port Y.
    The ship A left the Port X on Monday to reach Port Y.
    The Ship A arrived to Port Y, it left Port X on Monday.
    Port Y will be visited by Ship A which left Port X on Monday.

And also author might misspell words:

   departed -> deported, dearted, depared, departeed, deparded
   reach -> reaach, rech, rreach, reac
   arrived -> arived, arivved, arrivd 

So what is the best way to extract "Ship A", "Port X", "Port Y", "Monday" words from those clauses?
Programming language is Java.
Shall we use reqular expressions or lucene fuzzy search or elasticsearch etc.
Or some combination of them?

Thank you

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

ゝ杯具 2025-01-31 03:40:49

该程序在示例字符串中找到所需的信息,并以正确的顺序列出。它需要进一步努力应对失误。我们还可以扩大日期的日期以接受日期。

import java.util.regex.*;
public class HelloWorld{

public static void main(String []args){
    System.out.println(getPorts("The Ship A departed from the Port X on Monday, to reach Port Y."));
    System.out.println(getPorts("The ship A left the Port X on Monday to reach Port Y."));
    System.out.println(getPorts("The Ship A arrived to Port Y, it left Port X on Monday."));
    System.out.println(getPorts("Port Y will be visited by Ship A which left Port X on Monday."));
     }
     
     public static String getPorts(String sentence){
         String port1 = "unknown";
         String port2 = "unknown";
         String ship = "unknown";
         String day = "unknown";
         Pattern pattern;
         if (sentence.matches(".*(arriv|reach|visit).*(left|depart).*")) {
            pattern = Pattern.compile("(?<port2>[Pp]ort\\s\\w+).*(?<port1>[Pp]ort\\s\\w+)");} 
         else if (sentence.matches(".*(left|depart).*(arriv|reach|visit).*")) {
            pattern = Pattern.compile("(?<port1>[Pp]ort\\s\\w+).*(?<port2>[Pp]ort\\s\\w+)");}
      else { return "not matched";}
      Matcher matcher = pattern.matcher(sentence);
      while (matcher.find()) {
          port2 = matcher.group("port2");
          port1 = matcher.group("port1");
      }
      pattern = Pattern.compile("(?<ship>[Ss]hip\\s\\w+)");
      matcher = pattern.matcher(sentence);
      while (matcher.find()) {
          ship = matcher.group("ship");
      }
      pattern = Pattern.compile("(?<day>(Mon|Tues?|Wed(nes)?|Thu(rs)?|Fri|Sat(ur)?|Sun)(day)?)");
      matcher = pattern.matcher(sentence);
      while (matcher.find()) {
          day = matcher.group("day");
      }
      return ship + " sailing from " + port1 + " to " + port2 + " on " + day +"." ;
}   
}

输出

Ship A sailing from Port X to Port Y on Monday.
ship A sailing from Port X to Port Y on Monday.
Ship A sailing from Port X to Port Y on Monday.
Ship A sailing from Port X to Port Y on Monday.

This program finds the information that you need in the sample strings and puts it in the right order. Its needs further work to cope with mispellings. We could also expand the day regex to accept dates.

import java.util.regex.*;
public class HelloWorld{

public static void main(String []args){
    System.out.println(getPorts("The Ship A departed from the Port X on Monday, to reach Port Y."));
    System.out.println(getPorts("The ship A left the Port X on Monday to reach Port Y."));
    System.out.println(getPorts("The Ship A arrived to Port Y, it left Port X on Monday."));
    System.out.println(getPorts("Port Y will be visited by Ship A which left Port X on Monday."));
     }
     
     public static String getPorts(String sentence){
         String port1 = "unknown";
         String port2 = "unknown";
         String ship = "unknown";
         String day = "unknown";
         Pattern pattern;
         if (sentence.matches(".*(arriv|reach|visit).*(left|depart).*")) {
            pattern = Pattern.compile("(?<port2>[Pp]ort\\s\\w+).*(?<port1>[Pp]ort\\s\\w+)");} 
         else if (sentence.matches(".*(left|depart).*(arriv|reach|visit).*")) {
            pattern = Pattern.compile("(?<port1>[Pp]ort\\s\\w+).*(?<port2>[Pp]ort\\s\\w+)");}
      else { return "not matched";}
      Matcher matcher = pattern.matcher(sentence);
      while (matcher.find()) {
          port2 = matcher.group("port2");
          port1 = matcher.group("port1");
      }
      pattern = Pattern.compile("(?<ship>[Ss]hip\\s\\w+)");
      matcher = pattern.matcher(sentence);
      while (matcher.find()) {
          ship = matcher.group("ship");
      }
      pattern = Pattern.compile("(?<day>(Mon|Tues?|Wed(nes)?|Thu(rs)?|Fri|Sat(ur)?|Sun)(day)?)");
      matcher = pattern.matcher(sentence);
      while (matcher.find()) {
          day = matcher.group("day");
      }
      return ship + " sailing from " + port1 + " to " + port2 + " on " + day +"." ;
}   
}

output

Ship A sailing from Port X to Port Y on Monday.
ship A sailing from Port X to Port Y on Monday.
Ship A sailing from Port X to Port Y on Monday.
Ship A sailing from Port X to Port Y on Monday.

Tested at https://www.tutorialspoint.com/compile_java_online.php

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文