文章来源于网络收集而来，版权归原创者所有，如有侵权请及时联系！

3 缓存工具

发布于 2024-09-08 18:09:29 字数 18165 浏览 0 评论 0 收藏 0

表格 varnish 和 squid 比较

	varnish	squid
稳定性	更高	经常需要重启
访问速度	更快。Varnish 采用了“Visual Page Cache”技术，所有缓存数据都直接从内存读取	从磁盘读取。
并发连接	更高。Varnish 的 TCP 连接释放要比 Squid 快。
清除缓存	可以通过管理端口，使用正则表达式批量清除部分缓存。
资源开销	高并发状态下 CPU、I/O 和内存等资源开销更高。
持续化	重启后资源清空。

备注：共同点是都是开源的反向代理服务器。

varnish

varnish 是一款高性能且开源的反向代理服务器和 HTTP 加速器，它的开发者 poul-Henning kamp FreeBSD 核心的开发人员之一。varnish 采用全新的软件体系机构，和现在的硬件体系配合紧密，
varnish 是一个轻量级的 cache 和反向代理软件。先进的设计理念和成熟的设计框架式是 varnish 的主要特点。

什么是 varnish？

是一个反向 http 代理，有时称为 http 加速器或 web 加速器。
varnish 将文件或文件片段存储在内存中，使他们能够快速被提供。
varnish 本质上是一个键/值存储，它通过使用 url 作为键。
varnish 是为现代硬件、现代操作系统和现代工作负载而设计的。
varnish 是做 CDN 即缓存和访问代理

varnish 的特点 ：

1、基于内存进行缓存，重启后数据将消失。
2、利用虚拟内存方式，I\O 性能好。
3、支持设置 0~60 秒的精确缓存时间。
4、VCL 配置管理比较灵活。
5、具有强大的管理功能，例如 top、stat、admin、list 等。
6、状态机设计巧妙、结构清晰。
7、利用二叉堆管理缓存文件，可达到积极删除目的。

用户篇

安装

$ sudo yum install varnish

# 启动
$ sudo service varnish start
Redirecting to /bin/systemctl start varnish.service

命令

$ ps -ef|grep -v grep |grep varnish
varnish   1451     1  0 09:16 ?        00:00:00 /usr/sbin/varnishd -a :6081 -f /etc/varnish/default.vcl -s malloc,256m
varnish   1461  1451  0 09:16 ?        00:00:00 /usr/sbin/varnishd -a :6081 -f /etc/varnish/default.vcl -s malloc,256m

$ varnishd --help
Usage: varnishd [options]

Basic options:
  -a address[:port][,proto]    # HTTP listen address and port
     [,user=<u>][,group=<g>]   # Can be specified multiple times.
     [,mode=<m>]               #   default: ":80,HTTP"
                               # Proto can be "PROXY" or "HTTP" (default)
                               # user, group and mode set permissions for
                               #   a Unix domain socket.
  -b [addr[:port]|path]        # Backend address and port
                               #   or socket file path
                               #   default: ":80"
  -f vclfile                   # VCL program
                               # Can be specified multiple times.
  -n dir                       # Working directory

-b can be used only once, and not together with -f

Documentation options:
  -?                           # Prints this usage message
  -x parameter                 # Parameter documentation
  -x vsl                       # VSL record documentation
  -x cli                       # CLI command documentation
  -x builtin                   # Builtin VCL program
  -x optstring                 # List of getopt options

Operations options:
  -F                           # Run in foreground
  -T address[:port]            # CLI address
                               # Can be specified multiple times.
  -M address:port              # Reverse CLI destination
                               # Can be specified multiple times.
  -P file                      # PID file
  -i identity                  # Identity of varnish instance
  -I clifile                   # Initialization CLI commands

Tuning options:
  -t TTL                       # Default TTL
  -p param=value               # set parameter
                               # Can be specified multiple times.
  -s [name=]kind[,options]     # Storage specification
                               # Can be specified multiple times.
                               #   -s default (=malloc)
                               #   -s malloc
                               #   -s file
  -l vsl                       # Size of shared memory log
                               #   vsl: space for VSL records [80m]

Security options:
  -r param[,param...]          # Set parameters read-only from CLI
                               # Can be specified multiple times.
  -S secret-file               # Secret file for CLI authentication
  -j jail[,options]            # Jail specification
                               #   -j unix
                               #   -j none

Advanced/Dev/Debug options:
  -d                           # debug mode
                               # Stay in foreground, CLI on stdin.
  -C                           # Output VCL code compiled to C language
  -V                           # version
  -h kind[,options]            # Hash specification
  -W waiter                    # Waiter implementation

说明：默认启动项 -a address[ip:port], -f vclfile

相关的二进制程序：

$ ls /usr/bin/varnish*
varnishadm   varnishhist  varnishlog   varnishncsa  varnishstat  varnishtest  varnishtop

varnishd varnish 服务端程序
varnishadm 命令行终端，可以在终端上执行各种管理命令。如加载配置文件 vcl.list

$ varnishadm
200
-----------------------------
Varnish Cache CLI 1.0
-----------------------------
Linux,4.18.0-147.5.1.el8_1.aarch64,aarch64,-jnone,-smalloc,-sdefault,-hcritbit
varnish-6.0.8 revision 97e54ada6ac578af332e52b44d2038bb4fa4cd4a

Type 'help' for command list.
Type 'quit' to close CLI session.

varnish> vcl.load superset /home/ai/projects/config/superset.vcl
varnish> vcl.use superset
# default 缺省，superset 本次增加配置
varnish> vcl.list
200
active      auto/warm          0 boot
available   auto/warm          0 default
available   auto/warm          0 superset

varnishstat 统计 varnish 运行状态，可以统计缓存命中次数，未命中次数，请求数，缓存大小等。
varnishlog 日志，一个请求会产生二条日志记录。 varnishlog -I log
varnishtop

varnish 返回响应

varnish 响应会增加三个相应头信息，分别是“X-Varnish”、“Via”和“Age”。

X-Varnish：后面会有一个或两个数字，如果是一个数字，就表明 varnish 在缓存中没有发现这个请求，这个数字的含义是 varnish 为这个请求所做的标记 ID。如果 X-Varnish 后是两个数字，就表明 varnish 在缓存中命中了这个请求，第一个数字是请求的标识 ID，第二个数字是缓存的标识 ID。
Age：标识出这个请求将被缓存多长时间（单位：秒）。首次请求的“Age”为 0，后续的重复请求将会使 Age 值增大。如果后续的请求没有是“Age”增加，那就说明 varnish 没有缓存这个响应的结果。
Via：表明这个请求将经过一个代理。会携带 varnish 的版本。

varnish 响应示例

$  curl -I 'http://$host/static/assets/images/favicon.png'
HTTP/1.1 200 OK
Content-Length: 2242
Content-Type: image/png
Last-Modified: Mon, 17 Jan 2022 10:12:45 GMT
Cache-Control: public, max-age=31536000
Expires: Thu, 19 Jan 2023 02:59:34 GMT
ETag: "1642414365.1485384-2242-1447828744"
Date: Wed, 19 Jan 2022 02:59:34 GMT
Access-Control-Allow-Origin: *
Server: Werkzeug/1.0.1 Python/3.8.11
X-Varnish: 327735 163861
Age: 346
Via: 1.1 varnish (Varnish/6.0)
X-Cache: HIT Via ecs-4ed9
Accept-Ranges: bytes
Connection: keep-alive

varnish 配置文件

VCL（varnish configuration language）是 varnish 配置语言，其用来定义 varnish 的存取策略。VCL 语法比较简单，跟 C 和 Perl 比较相似。主要有以下几点：

块是由花括号分隔，语句以分号结束，使用‘ # ’符号可以添加注释。
VCL 使用指定运算符“=”、比较运算符“==”、逻辑运算符“!,&&,!!”等形式，还支持正则表达式和用“~”进行 ACL 匹配运算。
VCL 没有用户自己定义的变量，你可以在 backend、request 或 object 上设置变量值，采用 set 关键字进行设置。例如 set req.backend = director_employeeui;
两个字符串的连接，他们之间没有任何运算符。
\”字符在 VCL 里没有特别的含义，这点与其他语言略有不同。

VCL 可以使用 set 关键字设置任何 HTTP 头，可以使用 remove 或是 unset 关键字移除 HTTP 头。
VCL 有 if/else 的判断语句，但是没有循环语句。

VCL 策略在启用前，会由 management 进程将其转换为 C 代码，而后再由 gcc 编译器将 C 代码编译成二进制程序。编译完成后， management 负责将其连接至 varnish 实例，即 child 进程。正是由于编译工作在 child 进程之外完成，它避免了装载错误格式 VCL 的风险。因此，varnish 修改配置的开销非常小，其可以同时保有几份尚在引用的旧版本配置，也能够让新的配置即刻生效。编译后的旧版本配置通常在 varnish 重启时才会被丢弃，如果需要手动清理，则可以使用 varnishadm 的 vcl.discard 命令完成。

Varnish 处理 HTTP 请求的运行流程图如下：

varnish1

Varnish 处理 HTTP 请求的过程大致分为如下几个步骤；

Receive 状态。也就是请求处理的入口状态，根据 VCL 规则判断该请求应该 pass 或者 pipe，还是进入 lookup(本地查询)
Lookup 状态。进入此状态后，会在 hash 表中查找数据，若找到，则进入 Hit 状态，否则进入 Miss 状态。
Pass 状态。在此状态下，会进入后端请求，即进入 fetch 状态。
Fetch 状态。在 fetch 状态下，对请求进行后端获取，发送请求，获得数据，并进行本地存储。
Deliver 状态。将获取到的数据发送给客户端，然后完成本次请求。

VCL Objects : req (请求) bereq (后端请求) beresp (后端响应) resp obj (只读对象)

VCL 内置函数 (vcl_开头)：vcl_init vcl_recv vcl_pipe vcl_hash vcl_hit vcl_miss vcl_pass vcl_fetch vcl_deliver

VCL 内置变量 ： VCL-Variables — Varnish version 7.0.1 documentation (varnish-cache.org)

vcl 配置文件示例

$ cat /etc/varnish/default.vcl
#
# This is an example VCL file for Varnish.
#
# It does not do anything by default, delegating control to the
# builtin VCL. The builtin VCL is called when there is no explicit
# return statement.
#
# See the VCL chapters in the Users Guide at https://www.varnish-cache.org/docs/
# and https://www.varnish-cache.org/trac/wiki/VCLExamples for more examples.

# Marker to tell the VCL compiler that this VCL has been adapted to the
# new 4.0 format.
vcl 4.0;

# Default backend definition. Set this to point to your content server.
backend default {
    .host = "127.0.0.1";
    .port = "80";
}

# 自定义后端，也可以多个后端绑定成组；另外也可对后端进行健康检查
backend java {
    .host = "127.0.0.1";
    .port = "8080";
    .probe = {
        .url = "/";
        .timeout = 1s;
        .interval = 5s;
        .window = 5;
        .threshold = 3;
    }
}

sub vcl_recv {
    # Happens before we check if we have this in cache already.
    # 在请求开始的时候调用，判断是否处理该请求，怎样处理以及访问后台哪个服务器。
    # Typically you clean up the request here, removing cookies you don't need,
    # rewriting the request, etc.
    if (req.url ~ "^/java/") {
        set req.backend_hint = java;
    } else {
        set req.backend_hint = default；
    }

    # 忽略 cookie 进行缓存。缺省情况下，带 cookie 不缓存。
    if (req.method == "GET" && req.url ~ "\.(js|css|html|jpg|png|gif|swf|jpeg|ico)$") {
        unset req.http.cookie;
    }
}

sub vcl_backend_response {
    # Happens after we have read the response headers from the backend.
    #
    # Here you clean the response headers, removing silly Set-Cookie headers
    # and other mistakes your backend does.
}

sub vcl_deliver {
    # Happens when we have all the pieces we need, and are about to send the
    # response to the client.
    #
    # You can do accounting or modifying the final object here.
    # 添加一个 Header 标识，以判断缓存是否命中
    if (obj.hits > 0) {
        set resp.http.X-Cache = "HIT Via " + server.hostname;
    } else {
        set resp.http.X-Cache = "MISS";
    }
    return (deliver);
}

多个后端示例：

import directors;    # load the directors

backend server1 {
    .host = "192.168.0.10";
}
backend server2 {
    .host = "192.168.0.10";
}

sub vcl_init {
    new bar = directors.round_robin();
    bar.add_backend(server1);
    bar.add_backend(server2);
}

sub vcl_recv {
    # send all traffic to the bar director:
    set req.backend_hint = bar.backend();
}

进阶篇

当几个客户端请求同一个页面的时候，varnish 只发送一个请求的后端服务器，然后让那个其他几个请求挂起等待返回结果，返回结果后，复制请求结果发送给客户端。为了解决请求扎堆等待，varnish 有二种处理模式，分别是 Grace 模式和 Saint 模式

Grace 优雅模式：指示 varnish 去保持缓存的对象超过他们的 TTL。先获取旧结果。
Saint 神圣模式：抛弃一个后端服务器的某个页面，并尝试从其他服务器获取，或提供缓存中的旧内容。

Varnish 的默认缓存策略是偏向保守的（可以通过配置改变）

它默认只缓存 GET 请求和 HEAD 请求，不缓存带有 Cookie 和认证信息的请求，也不会缓存带有 Set-Cookie 或者有变化的头信息的响应。
Varnish 也会检查请求和响应中的 Cache-Control 头信息，这个头信息中会包含一些选项来控制缓存行为。当 Cache- control 中 Max-age 的控制和默认策略冲突时，varnish 不会单纯的根据 Cache-control 信息就改变自己的缓存行为。
例如：Cache-Control: max-age=n，n 为数字，如果 varnish 收到 web 服务器的响应中包含 max-age，varnish 会以此值设定缓存的过期时间（单位：秒），否则 varnish 将会设置为参数配置的时间，默认为 120 秒。
提高 Varnish 命中率的根本方法，就是仔细规划请求和应答，并自定义缓存策略，通过 VCL 来配置自己想要缓存的内容，并主动设置对象的 ttl，尽量不去依赖 Http header。

原理篇

varnish 系统架构

图 varnish 系统架构

varnish 主要运行两个进程：Management 进程和 Child 进程（也叫 Cache 进程)。

Management 进程是主进程，管理进程。主要实现应用新的配置、编译 VCL、监控 varnish、初始化 varnish 以及提供一个命令行接口等。Management 进程会每隔几秒钟探测一下 Child 进程以判断其是否正常运行，如果在指定的时长内未得到 Child 进程的回应， Management 将会重启此 Child 进程。
Child 进程包含多种类型的线程，常见的如：
- Acceptor 线程：接收新的连接请求并响应；
- Worker 线程：child 进程会为每个会话启动一个 worker 线程，因此，在高并发的场景中可能会出现数百个 worker 线程甚至更多；
- Object Expiry 线程：从缓存中清理过期内容；
- Commad line 线程 : 管理接口
- Storage/hashing 线程：缓存存储
- Log/stats 线程：日志管理线程
- Backend Communication 线程：管理后端主机线程

Varnish 依赖“工作区(workspace)”以降低线程在申请或修改内存时出现竞争的可能性。在 varnish 内部有多种不同的工作区，其中最关键的当属用于管理会话数据的 session 工作区。

squid

Squid 是一个高性能的代理缓存服务器，Squid 支持 FTP、gopher、HTTPS 和 HTTP 协议。和一般的代理缓存软件不同，Squid 用一个单独的、非模块化的、I/O 驱动的进程来处理所有的客户端请求。

squid 将数据元缓存在内存和硬盘中，同时也缓存 DNS 查询的结果。Squid 支持 SSL，支持访问控制。由于使用了 ICP（轻量 Internet 缓存协议），Squid 能够实现层叠的代理阵列，从而最大限度的节约带宽。

squid 缓存原理

缓存的存放方式：每一台 Squid 代理服务器上都有若干颗硬盘，每颗硬盘又分割成多个分区，每一个分区又可建立很多目录，目录下才放文件(Squid 把它叫 object)。

表格 Squid 版本（详见 http://www.squid-cache.org/Versions/ ）

Version	First STABLE release Date	Latest Release	Latest Release Date
6	Squid RoadMap
5	20 Jan 2020	5.3	07 Dec 2021
4	02 Jul 2018	4.16	05 Jul 2021
3.5	17 Jan 2015	3.5.28	15 Jul 2018
3.4	09 Dec 2013	3.4.14	01 Aug 2015
3.3	09 Feb 2013	3.3.14	01 May 2015
3.2	14 Aug 2012	3.2.14	01 May 2015
3.1	29 Mar 2010	3.1.23	09 Jan 2013
3.0	13 Dec 2007	STABLE26	28 Aug 2011
2.7	31 May 2008	STABLE9	16 Mar 2010
2.6	01 Jul 2006	STABLE23	17 Sep 2009
2.5	25 Sep 2002	STABLE14	20 May 2006
2.4	20 Mar 2001	STABLE7	02 Jul 2002

Apache Traffic Server (ATS)

官网： https://trafficserver.apache.org/

Apache Traffic Server（ATS 或 TS）软件是一种快速、可扩展和可扩展的 HTTP/1.1 和 HTTP/2 兼容缓存代理服务器。以前的商业产品，雅虎捐赠给了 Apache 基金会，目前被几个主要的 CDN 和内容所有者使用。

ATS 开发语言为 C++。它通过将频繁访问的信息缓存在网络的边缘来改善网络的效率和性能，使得访问内容在地里上更接近终端用户，在更快分发的同事也减少带宽的占用。

ATS 特色

缓存：改进响应时间的同时降低了服务器负载与对带宽的需求，这是通过缓存并且重用经常请求的网页、图片和 Web Service 调用实现的。
代理：很容易添加持续连接、过滤器或异步内容请求，还可以通过添加代理层实现负载平衡。
速度：在现代的 SMP 硬件上具有很好的可伸缩性，每秒钟可以处理数以万计的请求。
可扩展性：API 考虑到了自定义插件，可以修改头与内容，还可以实现新的协议处理器。
可靠性：能够完美处理 TB 级别的数据，包括正向与反向代理。

ATS 部署选项

作为一个反向代理: ATS 需要配置为用户直接连接的源服务器（典型的用法是将源服务器的主机名解析到 ATS）,反向代理的功能也被叫做服务器加速。
作为一个 web 代理缓存: 作为 web 代理缓存，ATS 接收用户直接发往源站的 web 内容请求。如果 ATS 包含请求的内容，它将直接提供服务。如果请求的内容不再缓存里，ATS 将作为一个代理；为用户从源站服务器获取请求的内容，并在本地保存一份拷贝以服务于将来相同的请求。
部署在多级缓存: ATS 可以灵活地参与多级缓存，当 internet 请求不能在一个缓存中得到满足的时候，将被路由到其他区域的缓存，从而利用附近缓存的内容。在一个多级代理中，ATS 可以作为其他 ATS 系统或者其他相似的缓存产品的父节点或者子节点。