我没用过MINA,所以在这里请教下大家,MINA适合发起大量的http请求抓取网页吗?
贴一段代码:
这个是MainClient.java类:
package mina;
import java.net.InetSocketAddress;
import org.apache.mina.core.filterchain.DefaultIoFilterChainBuilder;import org.apache.mina.core.future.ConnectFuture;import org.apache.mina.filter.codec.ProtocolCodecFilter;import org.apache.mina.filter.codec.textline.TextLineCodecFactory;import org.apache.mina.transport.socket.nio.NioSocketConnector;
/** * 简单Mina Client示例 * * @author javaFound * @www.javaKe.com */public class MainClient { public static void main(String[] args) throws Exception { // Create TCP/IP connector. NioSocketConnector connector = new NioSocketConnector(); // 创建接收数据的过滤器 DefaultIoFilterChainBuilder chain = connector.getFilterChain(); // 设定这个过滤器将一行一行(/r/n)的读取数据 chain.addLast("myChin", new ProtocolCodecFilter( new TextLineCodecFactory())); // 设定服务器端的消息处理器:一个SamplMinaServerHandler对象, connector.setHandler(new SamplMinaClientHandler()); // Set connect timeout. connector.setConnectTimeout(30); // 连结到服务器: ConnectFuture cf = connector.connect( new InetSocketAddress("www.126.com", 80) ); // Wait for the connection attempt to be finished. cf.awaitUninterruptibly(); cf.getSession().getCloseFuture().awaitUninterruptibly(); connector.dispose(); }}
这个是SamplMinaClientHandler.java类:
import org.apache.mina.core.service.IoHandlerAdapter;import org.apache.mina.core.session.IoSession;
/** * Mina Client 接收消息的处理器 * * @author javaFound * @www.javaKe.com */public class SamplMinaClientHandler extends IoHandlerAdapter { // 当一个客端端连结进入时 @Override public void sessionOpened(IoSession session) throws Exception { System.out.println("incomming client : "+session.getRemoteAddress()); String host = "www.126.com"; String path = "/"; String request = "GET " + path + " HTTP/1.1rnHost:" + host + "rnrn"+ "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)rnrn" +"Connection:closernrn"; session.write(request.getBytes());// session.get }
// 当一个客户端关闭时 @Override public void sessionClosed(IoSession session) { System.out.println("one Clinet Disconnect !"); }
// 当客户端发送的消息到达时: @Override public void messageReceived(IoSession session, Object message) throws Exception { // 我们己设定了服务器解析消息的规则是一行一行读取,这里就可转为String: String s = (String) message; // Write the received data back to remote peer System.out.println("服务器发来的收到消息: " + s); // 测试将消息回送给客户端 //session.write(s); }}
运行MainClient,返回的是:
incomming client : www.126.com/220.170.91.160:80服务器发来的收到消息: HTTP/1.0 400 Bad request服务器发来的收到消息: Cache-Control: no-cache服务器发来的收到消息: Connection: close服务器发来的收到消息: Content-Type: text/html服务器发来的收到消息:服务器发来的收到消息: <html><body><h1>400 Bad request</h1>服务器发来的收到消息: Your browser sent an invalid request.服务器发来的收到消息: </body></html>one Clinet Disconnect !
应该如何解决?
HTTP/1.0 400 Bad request 说明请求信息不完整
这个可以用 FireBug 看看一般请求网页都要提供什么 Head,然后模拟就是了,缺啥补啥。
public void sessionOpened(IoSession session) throws Exception { System.out.println("incomming client : "+session.getRemoteAddress()); String host = "www.126.com"; String path = "/"; String request = "GET " + path + " HTTP/1.1rnHost:" + host + "rnrn"+ "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)rnrn" +"Connection:closernrn"; session.write(request.getBytes());// session.get }
好像写http头,不是在这里写。写对了还是400
MINA不是用来做这个的,抓取网页用HttpClient+htmlparser就可以了
看看mina-2.0.0-M3examplesrcmainjavaorgapacheminaexamplechatclient的 ChatClientSupport.java 类和SwingChatClientHandler.java 类,这二个类有用mina做客户端的例子
java的nio包提供了多路复用、非阻塞的通信方式,我想借用nio的特性来抓取网页。
网路上绝大部分都是服务器端的例子,而我的需求是在客户端上。
有什么好的例子吗?
不适合,MINA更适合做服务端应用。
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
暂无简介
文章 0 评论 0
接受
发布评论
评论(8)
贴一段代码:
这个是MainClient.java类:
package mina;
import java.net.InetSocketAddress;
import org.apache.mina.core.filterchain.DefaultIoFilterChainBuilder;
import org.apache.mina.core.future.ConnectFuture;
import org.apache.mina.filter.codec.ProtocolCodecFilter;
import org.apache.mina.filter.codec.textline.TextLineCodecFactory;
import org.apache.mina.transport.socket.nio.NioSocketConnector;
/**
* 简单Mina Client示例
*
* @author javaFound
* @www.javaKe.com
*/
public class MainClient {
public static void main(String[] args) throws Exception {
// Create TCP/IP connector.
NioSocketConnector connector = new NioSocketConnector();
// 创建接收数据的过滤器
DefaultIoFilterChainBuilder chain = connector.getFilterChain();
// 设定这个过滤器将一行一行(/r/n)的读取数据
chain.addLast("myChin", new ProtocolCodecFilter(
new TextLineCodecFactory()));
// 设定服务器端的消息处理器:一个SamplMinaServerHandler对象,
connector.setHandler(new SamplMinaClientHandler());
// Set connect timeout.
connector.setConnectTimeout(30);
// 连结到服务器:
ConnectFuture cf = connector.connect( new InetSocketAddress("www.126.com", 80) );
// Wait for the connection attempt to be finished.
cf.awaitUninterruptibly();
cf.getSession().getCloseFuture().awaitUninterruptibly();
connector.dispose();
}
}
这个是SamplMinaClientHandler.java类:
package mina;
import org.apache.mina.core.service.IoHandlerAdapter;
import org.apache.mina.core.session.IoSession;
/**
* Mina Client 接收消息的处理器
*
* @author javaFound
* @www.javaKe.com
*/
public class SamplMinaClientHandler extends IoHandlerAdapter {
// 当一个客端端连结进入时
@Override
public void sessionOpened(IoSession session) throws Exception {
System.out.println("incomming client : "+session.getRemoteAddress());
String host = "www.126.com";
String path = "/";
String request = "GET " + path + " HTTP/1.1rnHost:"
+ host + "rnrn"+ "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)rnrn"
+"Connection:closernrn";
session.write(request.getBytes());
// session.get
}
// 当一个客户端关闭时
@Override
public void sessionClosed(IoSession session) {
System.out.println("one Clinet Disconnect !");
}
// 当客户端发送的消息到达时:
@Override
public void messageReceived(IoSession session, Object message)
throws Exception {
// 我们己设定了服务器解析消息的规则是一行一行读取,这里就可转为String:
String s = (String) message;
// Write the received data back to remote peer
System.out.println("服务器发来的收到消息: " + s);
// 测试将消息回送给客户端
//session.write(s);
}
}
运行MainClient,返回的是:
incomming client : www.126.com/220.170.91.160:80
服务器发来的收到消息: HTTP/1.0 400 Bad request
服务器发来的收到消息: Cache-Control: no-cache
服务器发来的收到消息: Connection: close
服务器发来的收到消息: Content-Type: text/html
服务器发来的收到消息:
服务器发来的收到消息: <html><body><h1>400 Bad request</h1>
服务器发来的收到消息: Your browser sent an invalid request.
服务器发来的收到消息: </body></html>
one Clinet Disconnect !
应该如何解决?
HTTP/1.0 400 Bad request 说明请求信息不完整
这个可以用 FireBug 看看一般请求网页都要提供什么 Head,然后模拟就是了,缺啥补啥。
public void sessionOpened(IoSession session) throws Exception {
System.out.println("incomming client : "+session.getRemoteAddress());
String host = "www.126.com";
String path = "/";
String request = "GET " + path + " HTTP/1.1rnHost:"
+ host + "rnrn"+ "User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1;)rnrn"
+"Connection:closernrn";
session.write(request.getBytes());
// session.get
}
好像写http头,不是在这里写。写对了还是400
MINA不是用来做这个的,抓取网页用HttpClient+htmlparser就可以了
引用来自#4楼“JavaGG”的帖子
看看mina-2.0.0-M3examplesrcmainjavaorgapacheminaexamplechatclient的 ChatClientSupport.java 类和SwingChatClientHandler.java 类,这二个类有用mina做客户端的例子
看看mina-2.0.0-M3examplesrcmainjavaorgapacheminaexamplechatclient的 ChatClientSupport.java 类和SwingChatClientHandler.java 类,这二个类有用mina做客户端的例子
java的nio包提供了多路复用、非阻塞的通信方式,我想借用nio的特性来抓取网页。
网路上绝大部分都是服务器端的例子,而我的需求是在客户端上。
有什么好的例子吗?
不适合,MINA更适合做服务端应用。