当前位置：文江博客话题详情

Tomcat：如何停止 Tomcat 为所有请求创建会话？

发布于 2024-12-08 23:55:25 字数 105 浏览 6 评论 0原文

HAProxy 正在 ping tomcat 并请求一个非常小的页面，导致 Tomcat 每 2 秒创建一个新会话。有没有办法以编程方式（或通过配置）告诉 Tomcat 不要为特定页面创建新会话？

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

酒与心事 2024-12-15 23:55:25

你不需要实现任何东西，它已经存在了;)！

Tomcat容器提供Crawler Session Manager Valve（valve就像HttpServletFilter，但是在Tomcat容器内部（较低级别））。
您可以在此处找到更多详细信息 http://tomcat.apache.org /tomcat-7.0-doc/config/valve.html#Crawler_Session_Manager_Valve

您只需将标记添加到您的tomcat 的 server.xml 具有正确的配置。请记住为机器人用户代理提供正则表达式。

例如

<Valve className="org.apache.catalina.valves.CrawlerSessionManagerValve"
crawlerUserAgents=".*googlebot.\*|.*yahoo.*" sessionInactiveInterval="600"/>

你可以查看valve的源代码： http://grepcode.com/file/repo1.maven.org/maven2/org.apache.tomcat/tomcat-catalina/7.0.11/org/apache/catalina/valves/CrawlerSessionManagerValve.java< /a>

You don't need to implement anything, it's already there ;)!

Tomcat container provides Crawler Session Manager Valve (valve is just like HttpServletFilter, but inside Tomcat container (lower level).
You can find more details here http://tomcat.apache.org/tomcat-7.0-doc/config/valve.html#Crawler_Session_Manager_Valve

You just add <Valve> tag to your tomcat's server.xml with proper configuration. Remember to provide regular expressions for bot user agents.

For example

<Valve className="org.apache.catalina.valves.CrawlerSessionManagerValve"
crawlerUserAgents=".*googlebot.\*|.*yahoo.*" sessionInactiveInterval="600"/>

You can look at the source code of valve: http://grepcode.com/file/repo1.maven.org/maven2/org.apache.tomcat/tomcat-catalina/7.0.11/org/apache/catalina/valves/CrawlerSessionManagerValve.java

回复收藏 0 原文

梦魇绽荼蘼 2024-12-15 23:55:25

是的，有。这有点复杂，但对我们来说效果很好。

基本上，我们更改会话的过滤器链。我们为机器人（Google、Pear、Yahoo）执行此操作。

创建一个新的 Filter 并注册它，然后将此源用于 Filter 类：

public class BotFilter implements javax.servlet.Filter {
  private int inactive_seconds = 5*60;
  private String[] bots = new String[] { "googlebot", //google
    "msnbot", //msn
    "slurp", //yahoo
    "libcurl", //curl, sometimes used with bigbrother
    "bigbrother", //bigbrother availability check
    "whatsup", //whatsup availability check
    "surveybot", //unknown
    "wget", // nocomment
    "speedyspider", //http://www.entireweb.com/about/search_tech/speedyspider/
    "nagios-plugins", //Alle Nagios-Abfragen
    "pear.php.net", //Irgendwelcher PHP-Scheiß
    "mj12bot", //http://www.majestic12.co.uk/projects/dsearch/mj12bot.php
    "bingbot", //M$ Bing
    "dotbot", //We are just a few Seattle based guys trying to figure out how to make internet data as open as possible.
    "aggregator:spinn3r", //http://spinn3r.com/robot
    "baiduspider" //http://www.baidu.com/search/spider.htm
  };
  private HashMap<String, HttpSession> botsessions;

  public BotFilter() {
    this.botsessions = new HashMap<String, HttpSession>();
  }

  public void init(FilterConfig config) throws ServletException {

  }

  public void doFilter(ServletRequest request, ServletResponse response, FilterChain next) throws IOException, ServletException {
    if (request instanceof HttpServletRequest) {
      HttpServletRequest httprequest = (HttpServletRequest) request;
      try {
        String useragent = ((HttpServletRequest) request).getHeader("User-Agent");
        if (useragent == null) {
          ((HttpServletResponse) response).sendRedirect("http://www.google.com");
        }
        useragent = useragent.toLowerCase();
        if (httprequest.getSession(false) == null) {
        }
        for (int i = 0; i < this.bots.length; i++) {
          if (useragent.indexOf(this.bots[i]) > -1) {
            String key = httprequest.getRemoteAddr() + useragent;
            boolean SessionIsInvalid=false;
            synchronized(this.botsessions) {
              try {
                if(this.botsessions.get(key)!=null)
                  this.botsessions.get(key).getAttributeNames();
              } catch (java.lang.IllegalStateException ise) {
                SessionIsInvalid = true;
              }
              if(this.botsessions.get(key)==null||SessionIsInvalid) {
                httprequest.getSession().setMaxInactiveInterval(this.inactive_seconds);
                if(SessionIsInvalid)
                  this.botsessions.remove(key); //Remove first, if in there
                this.botsessions.put(key, httprequest.getSession()); //Then add a little spice
              } else {
                next.doFilter(new BotFucker(httprequest, this.botsessions.get(key)), response);
                return;
              }
            }
          };
        }
      } catch (Exception e) {
        //Error handling code
      }
    }
    next.doFilter(request, response);
  }

  public void destroy() {

  }
}

这个小源用于重定向类：

public class BotFucker extends HttpServletRequestWrapper {

  HttpSession session;

  public BotFucker(HttpServletRequest request, HttpSession session) {
    super(request);
    this.session = session;
  }
  @Override
  public HttpSession getSession(boolean create) {
    return this.session;
  }
  @Override
  public HttpSession getSession() {
    return this.session;
  }
}

如果机器人在某个网络中使用相同的 IP 再次连接，这两个类会重新使用机器人之前的会话。给定的时间限制。我们不能 100% 确定这会对机器人收到的数据产生什么影响，但由于此代码已经运行了好几个月并解决了我们的问题（来自 Google 的每个 IP 每秒有多个连接/会话）。

在有人试图提供帮助之前：该问题已通过网站管理员界面多次提交给 Google。抓取间隔已降低到尽可能低的设置，并且该问题在相应论坛上产生了 3 次回复线程，但没有任何提示说明此问题存在的原因。

Yes, there is. It's a bit complicated, but works well for us.

Basically, we change the Filter chain for sessions. We do this for bots (Google, Pear, Yahoo).

Create a new Filter and register it, then use this source for the Filter class:

public class BotFilter implements javax.servlet.Filter {
  private int inactive_seconds = 5*60;
  private String[] bots = new String[] { "googlebot", //google
    "msnbot", //msn
    "slurp", //yahoo
    "libcurl", //curl, sometimes used with bigbrother
    "bigbrother", //bigbrother availability check
    "whatsup", //whatsup availability check
    "surveybot", //unknown
    "wget", // nocomment
    "speedyspider", //http://www.entireweb.com/about/search_tech/speedyspider/
    "nagios-plugins", //Alle Nagios-Abfragen
    "pear.php.net", //Irgendwelcher PHP-Scheiß
    "mj12bot", //http://www.majestic12.co.uk/projects/dsearch/mj12bot.php
    "bingbot", //M$ Bing
    "dotbot", //We are just a few Seattle based guys trying to figure out how to make internet data as open as possible.
    "aggregator:spinn3r", //http://spinn3r.com/robot
    "baiduspider" //http://www.baidu.com/search/spider.htm
  };
  private HashMap<String, HttpSession> botsessions;

  public BotFilter() {
    this.botsessions = new HashMap<String, HttpSession>();
  }

  public void init(FilterConfig config) throws ServletException {

  }

  public void doFilter(ServletRequest request, ServletResponse response, FilterChain next) throws IOException, ServletException {
    if (request instanceof HttpServletRequest) {
      HttpServletRequest httprequest = (HttpServletRequest) request;
      try {
        String useragent = ((HttpServletRequest) request).getHeader("User-Agent");
        if (useragent == null) {
          ((HttpServletResponse) response).sendRedirect("http://www.google.com");
        }
        useragent = useragent.toLowerCase();
        if (httprequest.getSession(false) == null) {
        }
        for (int i = 0; i < this.bots.length; i++) {
          if (useragent.indexOf(this.bots[i]) > -1) {
            String key = httprequest.getRemoteAddr() + useragent;
            boolean SessionIsInvalid=false;
            synchronized(this.botsessions) {
              try {
                if(this.botsessions.get(key)!=null)
                  this.botsessions.get(key).getAttributeNames();
              } catch (java.lang.IllegalStateException ise) {
                SessionIsInvalid = true;
              }
              if(this.botsessions.get(key)==null||SessionIsInvalid) {
                httprequest.getSession().setMaxInactiveInterval(this.inactive_seconds);
                if(SessionIsInvalid)
                  this.botsessions.remove(key); //Remove first, if in there
                this.botsessions.put(key, httprequest.getSession()); //Then add a little spice
              } else {
                next.doFilter(new BotFucker(httprequest, this.botsessions.get(key)), response);
                return;
              }
            }
          };
        }
      } catch (Exception e) {
        //Error handling code
      }
    }
    next.doFilter(request, response);
  }

  public void destroy() {

  }
}

And this little one for the redirection class:

public class BotFucker extends HttpServletRequestWrapper {

  HttpSession session;

  public BotFucker(HttpServletRequest request, HttpSession session) {
    super(request);
    this.session = session;
  }
  @Override
  public HttpSession getSession(boolean create) {
    return this.session;
  }
  @Override
  public HttpSession getSession() {
    return this.session;
  }
}

These two classes re-use the sessions that the bots had before, if they connect again using the same IP within a given time limit. We're not 100% sure what this does to the data that the bot receives, but as this code is running for many months now and solved our problem (multiple connects/sessions per second per IP from Google).

And before somebody tries to help: The problem has been submitted multiple times to Google via Webmaster interface. The crawling interval has been lowered to the lowest possible setting, and the problem spawned a 3x reply thread on the appropriate forum without any hint as to why this problem exists.

回复收藏 0 原文