删除 Java 中的停用词 --- 需要帮助

发布于 2024-12-01 10:22:40 字数 2098 浏览 9 评论 0原文

我使用一种方法来删除文件中定义的停用词,这将从我传递给该方法的查询字符串中删除这些单词...代码工作正常

现在我需要做的是...如果查询字符串只包含这些停用词,那么它不应该被删除..

例如:如果停用词文件有“is”“was”“and”

如果查询是“I was a Student”那么输出应该是“I a学生”,

但如果查询是“并且是”,现在我需要输出与“and is”相同。

下面是我编写的删除停用词的方法。

public static String removeStopWords(String  query) throws UnsupportedEncodingException
    {
      String []queryTerms = query.split("&");
      String queryString="";
      StringBuffer sb =new StringBuffer();
      for (int i=0;i<queryTerms.length;i++){
            if(queryTerms[i].startsWith("q=") && !queryTerms[i].startsWith("q.orig")){
                queryString = queryTerms[i].replaceAll("q=","").trim().replace("+"," ").replaceAll("\\s+"," ").trim();
                }

        }
      if(!queryString.equalsIgnoreCase("")) {
      String [] tokens=queryString.split("\\s+");
      List lStopWords=StopWordDataLoad.getlQueryStringStopword();
      List<String> lTokens=new ArrayList<String>();
      boolean noStopWord=false;
      for(String s: tokens)
        if(!lStopWords.contains(s)) {
              if(sb.length()==0) sb.append(s);
                  else sb.append(" ").append(s);
          } else noStopWord=true;

       queryString=sb.toString().replaceAll("\\s+", " ");
       if(queryString.equalsIgnoreCase("") || noStopWord ==false) return query;
      }
      else return query;


      String fque="";
      String finQue = "";
      ArrayList<String> list = new ArrayList<String>();
      for (int i=0;i<queryTerms.length;i++){
          if(queryTerms[i].startsWith("q=") && !queryTerms[i].startsWith("q.orig")){
              fque = "q="+URLEncoder.encode(queryString,PropertyLoader.getHttpEncoding());
              list.add(fque);

          } else if (!queryTerms[i].equalsIgnoreCase("")) list.add(queryTerms[i]);
      }
      ListIterator<String> iter = list.listIterator();
        while(iter.hasNext()){
            String str = iter.next();
            finQue=finQue+"&"+str;
        }


      return finQue.trim();

    }

Im using a method to remove stop word defined in a file, that will rip off those words from the query string that i pass to this method... The code is working fine

Now what i need to do is ... If the query string contains just those stop words alone then it should not be ripped of..

eg : if the stopwords file has "is" "was" "and"

if the query is "I was a student" then the output should be " I a student"

but if the query is "and is " now i need the output the same as "and is".

Below is the method that i wrote to remove stop words.

public static String removeStopWords(String  query) throws UnsupportedEncodingException
    {
      String []queryTerms = query.split("&");
      String queryString="";
      StringBuffer sb =new StringBuffer();
      for (int i=0;i<queryTerms.length;i++){
            if(queryTerms[i].startsWith("q=") && !queryTerms[i].startsWith("q.orig")){
                queryString = queryTerms[i].replaceAll("q=","").trim().replace("+"," ").replaceAll("\\s+"," ").trim();
                }

        }
      if(!queryString.equalsIgnoreCase("")) {
      String [] tokens=queryString.split("\\s+");
      List lStopWords=StopWordDataLoad.getlQueryStringStopword();
      List<String> lTokens=new ArrayList<String>();
      boolean noStopWord=false;
      for(String s: tokens)
        if(!lStopWords.contains(s)) {
              if(sb.length()==0) sb.append(s);
                  else sb.append(" ").append(s);
          } else noStopWord=true;

       queryString=sb.toString().replaceAll("\\s+", " ");
       if(queryString.equalsIgnoreCase("") || noStopWord ==false) return query;
      }
      else return query;


      String fque="";
      String finQue = "";
      ArrayList<String> list = new ArrayList<String>();
      for (int i=0;i<queryTerms.length;i++){
          if(queryTerms[i].startsWith("q=") && !queryTerms[i].startsWith("q.orig")){
              fque = "q="+URLEncoder.encode(queryString,PropertyLoader.getHttpEncoding());
              list.add(fque);

          } else if (!queryTerms[i].equalsIgnoreCase("")) list.add(queryTerms[i]);
      }
      ListIterator<String> iter = list.listIterator();
        while(iter.hasNext()){
            String str = iter.next();
            finQue=finQue+"&"+str;
        }


      return finQue.trim();

    }

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(1

摇划花蜜的午后 2024-12-08 10:22:40

只需将最后一行更改为:

String result = finQue.trim();
if (result.equals("")) {
    return query;
} else {
    return result;
}

Just change the last line to this:

String result = finQue.trim();
if (result.equals("")) {
    return query;
} else {
    return result;
}
~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文