摘录文章'文本来自维基百科

发布于 2024-12-19 02:33:05 字数 3085 浏览 5 评论 0原文

我正在编写一些java代码，以便获取一些维基百科文章的原始文本（给出单词的jList，在维基百科中搜索它们并提取相应文章的第一句）。我的 GUI 包含一个按钮，我为其定义了以下操作侦听器：

private void loadButtonActionPerformed(java.awt.event.ActionEvent evt) {                                           

final DefaultListModel conceptsListFilesModel = new DefaultListModel();

conceptsList.setModel(conceptsListFilesModel);

final List definitionWiki = new ArrayList();        

//Remplir la list avec la première collone de la liste
final Thread updater = new Thread(){
@Override public void run() {        
for(int i=0; i< 20 /*dataTable.getRowCount()*/ ; i++) {
conceptsListFilesModel.addElement(dataTable.getValueAt(i, 0));

try {
Object concept = conceptsListFilesModel.elementAt(i);
WikipediaParser parser = new WikipediaParser("en");
System.out.println(concept+"");
String firstParagraph = parser.fetchFirstParagraph(concept+"");
int point = firstParagraph.indexOf(".");
String firstsentence = firstParagraph.substring(0, point+1);
definitionWiki.add(i, firstsentence) ;
} catch (IOException ex) {
Logger.getLogger(Tex2TaxView.class.getName()).log(Level.SEVERE, null, ex);
}

try { Thread.sleep(1000);
} catch (InterruptedException e) {throw new RuntimeException(e) ;}
}
JOptionPane.showMessageDialog(null, "Successful loading !")  ;
}
};
updater.start(); 
}

WikipediaParser 类：

public class WikipediaParser {

private final String baseUrl; 

public WikipediaParser(String lang) {
this.baseUrl = String.format("http://%s.wikipedia.org/wiki/", lang);
}

public String fetchFirstParagraph(String article) throws IOException {
String url = baseUrl + article;
Document doc = Jsoup.connect(url).get();
Elements paragraphs = doc.select(".mw-content-ltr p");
Element firstParagraph = paragraphs.first();
return firstParagraph.text();
}

}

执行生成以下异常列表：

nov. 30, 2011 12:42:55 AM tex2tax.Tex2TaxView$11 run
Grave: null java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)

at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:641)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:589)
at  
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1319)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:381)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:364)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:143)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:132)
at tex2tax.WikipediaParser.fetchFirstParagraph(WikipediaParser.java:25)
at tex2tax.Tex2TaxView$11.run(Tex2TaxView.java:595)

需要帮助来解决此问题

原文

I'm writing some java code in order to get the raw text of some Wikipedia articles (Giving a jList of words, search them in wikipedia and extract the first sentence of the corresponding article). My GUI contains a button for which I defined the following action listener:

private void loadButtonActionPerformed(java.awt.event.ActionEvent evt) {                                           

final DefaultListModel conceptsListFilesModel = new DefaultListModel();

conceptsList.setModel(conceptsListFilesModel);

final List definitionWiki = new ArrayList();        

//Remplir la list avec la première collone de la liste
final Thread updater = new Thread(){
@Override public void run() {        
for(int i=0; i< 20 /*dataTable.getRowCount()*/ ; i++) {
conceptsListFilesModel.addElement(dataTable.getValueAt(i, 0));

try {
Object concept = conceptsListFilesModel.elementAt(i);
WikipediaParser parser = new WikipediaParser("en");
System.out.println(concept+"");
String firstParagraph = parser.fetchFirstParagraph(concept+"");
int point = firstParagraph.indexOf(".");
String firstsentence = firstParagraph.substring(0, point+1);
definitionWiki.add(i, firstsentence) ;
} catch (IOException ex) {
Logger.getLogger(Tex2TaxView.class.getName()).log(Level.SEVERE, null, ex);
}

try { Thread.sleep(1000);
} catch (InterruptedException e) {throw new RuntimeException(e) ;}
}
JOptionPane.showMessageDialog(null, "Successful loading !")  ;
}
};
updater.start(); 
}

The WikipediaParser class:

public class WikipediaParser {

private final String baseUrl; 

public WikipediaParser(String lang) {
this.baseUrl = String.format("http://%s.wikipedia.org/wiki/", lang);
}

public String fetchFirstParagraph(String article) throws IOException {
String url = baseUrl + article;
Document doc = Jsoup.connect(url).get();
Elements paragraphs = doc.select(".mw-content-ltr p");
Element firstParagraph = paragraphs.first();
return firstParagraph.text();
}

}

The execution generates the following list of exceptions:

nov. 30, 2011 12:42:55 AM tex2tax.Tex2TaxView$11 run
Grave: null java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:150)
at java.net.SocketInputStream.read(SocketInputStream.java:121)

at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:275)
at java.io.BufferedInputStream.read(BufferedInputStream.java:334)
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:641)
at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:589)
at  
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1319)
at java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:468)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:381)
at org.jsoup.helper.HttpConnection$Response.execute(HttpConnection.java:364)
at org.jsoup.helper.HttpConnection.execute(HttpConnection.java:143)
at org.jsoup.helper.HttpConnection.get(HttpConnection.java:132)
at tex2tax.WikipediaParser.fetchFirstParagraph(WikipediaParser.java:25)
at tex2tax.Tex2TaxView$11.run(Tex2TaxView.java:595)

Need help to solve this problem

分享到QQ

分享到微博