Lisp 的奇怪 HTTP 问题/错误

发布于 2024-07-11 16:47:34 字数 1901 浏览 13 评论 0原文

我正在尝试了解有关 SBCL 中处理套接字和网络连接的更多信息; 所以我为 HTTP 编写了一个简单的包装器。 到目前为止,它只是创建一个流并执行请求以最终获取网站的标头数据和页面内容。

到目前为止,它的运作还算不错。 没什么值得吹嘘的,但至少有效。

然而,我遇到了一个奇怪的问题; 我不断收到“400 Bad Request”错误。

起初,我对如何处理 HTTP 请求(或多或少将请求字符串作为函数参数传递)有些怀疑,然后我创建了一个函数,用我需要的所有部分格式化查询字符串并将其返回以供使用后来...但我仍然收到错误。

更奇怪的是,错误并不是每次都会发生。 如果我在像 Google 这样的页面上尝试该脚本,我会得到“200 Ok”返回值...但在其他网站上的其他时候,我会得到“400 Bad Request”。

我确信我的代码有问题,但如果我确切知道是什么原因造成的,那我就该死了。

这是我正在使用的代码:

(use-package :sb-bsd-sockets)

(defun read-buf-nonblock (buffer stream)
  (let ((eof (gensym)))
    (do ((i 0 (1+ i))
         (c (read-char stream nil eof)
            (read-char-no-hang stream nil eof)))
        ((or (>= i (length buffer)) (not c) (eq c eof)) i)
      (setf (elt buffer i) c))))

(defun http-connect (host &optional (port 80))
"Create I/O stream to given host on a specified port"
  (let ((socket (make-instance 'inet-socket
                   :type :stream
                   :protocol :tcp)))
    (socket-connect
     socket (car (host-ent-addresses (get-host-by-name host))) port)
    (let ((stream (socket-make-stream socket
                    :input t
                    :output t
                    :buffering :none)))
      stream)))

(defun http-request (stream request &optional (buffer 1024))
"Perform HTTP request on a specified stream"
  (format stream "~a~%~%" request )
  (let ((data (make-string buffer)))
    (setf data (subseq data 0
               (read-buf-nonblock data
                      stream)))
    (princ data)
    (> (length data) 0)))

(defun request (host request)
"formated HTTP request"
  (format nil "~a HTTP/1.0 Host: ~a" request host))

(defun get-page (host &optional (request "GET /"))
"simple demo to get content of a page"
  (let ((stream (http-connect host)))
    (http-request stream (request host request)))

I'm attempting to learn a little more about handling sockets and network connections in SBCL; so I wrote a simple wrapper for HTTP. Thus far, it merely makes a stream and performs a request to ultimately get the header data and page content of a website.

Until now, it has worked at somewhat decently. Nothing to brag home about, but it at least worked.

I have come across a strange problem, however; I keep getting "400 Bad Request" errors.

At first, I was somewhat leery about how I was processing the HTTP requests (more or less passing a request string as a function argument), then I made a function that formats a query string with all the parts I need and returns it for use later... but I still get errors.

What's even more odd is that the errors don't happen every time. If I try the script on a page like Google, I get a "200 Ok" return value... but at other times on other sites, I'll get "400 Bad Request".

I'm certain its a problem with my code, but I'll be damned if I know exactly what is causing it.

Here is the code that I am working with:

(use-package :sb-bsd-sockets)

(defun read-buf-nonblock (buffer stream)
  (let ((eof (gensym)))
    (do ((i 0 (1+ i))
         (c (read-char stream nil eof)
            (read-char-no-hang stream nil eof)))
        ((or (>= i (length buffer)) (not c) (eq c eof)) i)
      (setf (elt buffer i) c))))

(defun http-connect (host &optional (port 80))
"Create I/O stream to given host on a specified port"
  (let ((socket (make-instance 'inet-socket
                   :type :stream
                   :protocol :tcp)))
    (socket-connect
     socket (car (host-ent-addresses (get-host-by-name host))) port)
    (let ((stream (socket-make-stream socket
                    :input t
                    :output t
                    :buffering :none)))
      stream)))

(defun http-request (stream request &optional (buffer 1024))
"Perform HTTP request on a specified stream"
  (format stream "~a~%~%" request )
  (let ((data (make-string buffer)))
    (setf data (subseq data 0
               (read-buf-nonblock data
                      stream)))
    (princ data)
    (> (length data) 0)))

(defun request (host request)
"formated HTTP request"
  (format nil "~a HTTP/1.0 Host: ~a" request host))

(defun get-page (host &optional (request "GET /"))
"simple demo to get content of a page"
  (let ((stream (http-connect host)))
    (http-request stream (request host request)))

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

浮光之海 2024-07-18 16:47:34

一些东西。 首先,对于您返回的 400 错误的担忧,我想到了几种可能性:

  • “Host:”实际上并不是 HTTP/1.0 中的有效标头字段,并且取决于您所联系的 Web 服务器的法西斯程度关于标准,它会根据您声称正在使用的协议将其视为错误请求而拒绝。
  • 您的请求行和每个标题行之间需要有一个 CRLF。
  • 您的(请求)函数可能会为 Request-URI 字段返回一些内容——您将请求的值替换为请求行这一部分的内容——这在某种程度上是伪造的(严重的)转义字符等)。 看看它输出的内容可能会有所帮助。

其他一些更通用的指针可以帮助您:

  • (read-buf-nonblock) 非常令人困惑。 符号“c”在哪里定义的? 为什么对 'eof' (gensym) 进行了处理,然后没有分配任何值? 它看起来非常像直接从命令式程序中取出的逐字节副本,然后放入 Lisp 中。 看起来您在这里重新实现的是(读取序列)。 去 Common Lisp Hyperspec 中的这里看看,看看这是否是你需要什么。 另一半是将您创建的套接字设置为非阻塞。 这非常简单,尽管 SBCL 文档几乎没有提及该主题。 使用这个:

    <代码>(socket-make-stream 套接字
    :输入t
    :输出t
    :缓冲:无
    :timeout 0)

  • (http-connect) 的最后一个 (let) 形式不是必需的。 只是评估

    <代码>(socket-make-stream 套接字
    :输入t
    :输出t
    :buffering :none)

没有 let,http-connect 仍然应该返回正确的值。

  • 在 (http-request)...

替换:

 (format stream "~a~%~%" request )
 (let ((data (make-string buffer)))
 (setf data (subseq data 0
            (read-buf-nonblock data
                               stream)))
 (princ data)
 (> (length data) 0)))

with

(format stream "~a~%~%" request )
(let ((data (read-buf-nonblock stream)))
    (princ data)
    (> (length data) 0)))

和 make (read-buf-nonblock) 返回数据字符串,而不是让它在函数内分配。 因此,在分配了 buffer 的地方,在其中创建一个变量 buffer 然后返回它。 您所做的事情称为依赖“副作用”,并且往往会产生更多错误并且更难发现错误。 仅在必要时才使用它,尤其是使用一种可以轻松不依赖它们的语言。

  • 我最喜欢 get-page 的定义方式。 感觉非常符合函数式编程范式。 但是,您应该更改(请求)函数的名称或变量请求。 两者都存在会令人困惑。

哎呀,手受伤了。 但希望这会有所帮助。 打字完毕。 :-)

A few things. First, to your concern about the 400 errors you are getting back, a few possibilities come to mind:

  • "Host:" isn't actually a valid header field in HTTP/1.0, and depending on how fascist the web server you are contacting is about standards, it would reject this as a bad request based on the protocol you claim to be speaking.
  • You need a CRLF between your Request-line and each of the header lines.
  • It is possible that your (request) function is returning something for the Request-URI field -- you substitute in the value of request as the contents of this part of the Request-line -- that is bogus in one way or another (badly escaped characters, etc.). Seeing what it is outputting might help out some.

Some other more general pointer to help you along your way:

  • (read-buf-nonblock) is very confusing. Where is the symbol 'c' defined? Why is 'eof' (gensym)ed and then not assigned any value? It looks very much like a byte-by-byte copy taken straight out of an imperative program, and plopped into Lisp. It looks like what you have reimplemented here is (read-sequence). Go look here in the Common Lisp Hyperspec, and see if this is what you need. The other half of this is to set your socket you created to be non-blocking. This is pretty easy, even though the SBCL documentation is almost silent on the topic. Use this:

    (socket-make-stream socket
    :input t
    :output t
    :buffering :none
    :timeout 0)

  • The last (let) form of (http-connect) isn't necessary. Just evaluate

    (socket-make-stream socket
    :input t
    :output t
    :buffering :none)

without the let, and http-connect should still return the right value.

  • In (http-request)...

Replace:

 (format stream "~a~%~%" request )
 (let ((data (make-string buffer)))
 (setf data (subseq data 0
            (read-buf-nonblock data
                               stream)))
 (princ data)
 (> (length data) 0)))

with

(format stream "~a~%~%" request )
(let ((data (read-buf-nonblock stream)))
    (princ data)
    (> (length data) 0)))

and make (read-buf-nonblock) return the string of data, rather that having it assign within the function. So where you have buffer being assigned, create a variable buffer within and then return it. What you are doing is called relying on "side-effects," and tends to produce more errors and harder to find errors. Use it only when you have to, especially in a language that makes it easy not to depend on them.

  • I mostly like the the way get-page is defined. It feels very much in the functional programming paradigm. However, you should either change the name of the (request) function, or the variable request. Having both in there is confusing.

Yikes, hands hurt. But hopefully this helps. Done typing. :-)

青衫负雪 2024-07-18 16:47:34

这是一种可能性:

HTTP/1.0 将序列 CR LF 定义为行尾标记。

~% 格式指令生成一个 #\Newline (在大多数平台上为 LF,但请参阅 CLHS)。

有些网站可能可以容忍缺少 CR,但其他网站则不然。

Here's a possibility:

HTTP/1.0 defines the sequence CR LF as the end-of-line marker.

The ~% format directive is generating a #\Newline (LF on most platforms, though see CLHS).

Some sites may be tolerant of the missing CR, others not so much.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文