Expressjs Node.js 为 google/etc 机器人和人员流量提供不同的数据

发布于 2024-12-06 03:53:27 字数 138 浏览 1 评论 0原文

我想确定传入请求是来自机器人（例如 google、bing）还是人类，并为每个请求提供不同的数据，例如用于客户端 JavaScript 构建网站或预处理 html 的 json 数据。

使用expressjs，有没有简单的方法来做到这一点？谢谢。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

电影里的梦 2024-12-13 03:53:27

您可以检查“Mozilla/5.0（兼容；Googlebot/2.1；+http://www.google.com/bot.html'。如果您知道这是 Google，并且可以向其发送不同的数据。

http://www.google.com/support/webmasters/bin/answer .py?answer=1061943

如何获取标头
http://expressjs.com/4x/api.html#req.get

回复收藏 0 原文

↙厌世 2024-12-13 03:53:27

我建议您根据请求的 MIME 类型（存在于“Accept”标头中）进行响应。您可以通过 Express 执行此操作：

app.get('/route', function (req, res) {
    if (req.is('json')) res.json(data);
    else if (req.is('html')) res.render('view', {});
    else ...
});

I recommend you to response according to the requested MIME type (which is present in the "Accept" header). You can do this with Express this way:

app.get('/route', function (req, res) {
    if (req.is('json')) res.json(data);
    else if (req.is('html')) res.render('view', {});
    else ...
});

回复收藏 0 原文

年少掌心 2024-12-13 03:53:27

按照建议检查请求标头 User-Agent 或 MIME 类型并不可靠，因为任何 HTTP GET 请求可以随意定义 User-Agent 和 headers。

最可靠、最安全的方法是通过IP进行检查。

因此，我开发了一个 NPM 包来实现这一点。它会在启动时将所有已知来自 Google bot 的 IP 范围存储在内存中爬虫，用于非常快速的中间件处理。

const express = require('express')
const isGCrawler = require('express-is-googlecrawler')

const app = express()
app.use(isGCrawler)

app.get('/', (req, res) => {
  res.send(res.locals.isGoogleCrawler) // Boolean
})

app.listen(3000)

Checking for request header User-Agent or MIME type as suggested is not reliable, since any HTTP GET request can define User-Agent and headers at will.

The most reliable and secure approach is to check by IP.

Therefore I developed an NPM package that does exactly that. It stores at startup in-memory all known IP ranges coming from Google bots and crawlers, for very fast middleware processing.

const express = require('express')
const isGCrawler = require('express-is-googlecrawler')

const app = express()
app.use(isGCrawler)

app.get('/', (req, res) => {
  res.send(res.locals.isGoogleCrawler) // Boolean
})

app.listen(3000)

回复收藏 0 原文

~没有更多了~