在 Erlang 中解析二进制文件

发布于 2024-10-26 17:11:03 字数 490 浏览 0 评论 0原文

如果我有以下二进制文件： <<"GET http://www.google.com HTTP/1.1">>，如何将其拆分以便只能检索主机 (http://www.google.com)？

我从以下内容开始：

get_host(<<$G, Rest/binary>>) -> get_host(休息); get_host(<<$E, Rest/binary>>) ->; get_host(休息); get_host(<<$T, Rest/binary>>) ->; get_host(Rest);

但我不知道如何从这里继续。我正在考虑反转 Rest 并从二进制文件的末尾重新开始。

原文

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

記柔刀 2024-11-02 17:11:03

看来您正在尝试为 HTTP 1.1 实现一个最小的解析器。这是一个解决方案，它遵循 HTTP 1.1 的规范并解析第一行一个http请求。在不知道您的具体情况的情况下，我在大多数情况下建议在简化的“分割二进制”或类似的之前使用通用的 HTTP 解析器。

1> erlang:decode_packet(http,<<"GET http://www.google.com HTTP/1.1\n">>,[]).  
{ok,{http_request,'GET',
              {absoluteURI,http,"www.google.com",undefined,"/"},
              {1,1}},
<<>>}

It seems you're trying to implement a minimal parser for HTTP 1.1. This is one solution that does follow the specifications for HTTP 1.1 and parses the first line of a http request. Without knowing your specific situation I would in most cases recommend using a generic HTTP parser before a simplified "split binary" or similar.

1> erlang:decode_packet(http,<<"GET http://www.google.com HTTP/1.1\n">>,[]).  
{ok,{http_request,'GET',
              {absoluteURI,http,"www.google.com",undefined,"/"},
              {1,1}},
<<>>}

回复收藏 0 原文

你对谁都笑 2024-11-02 17:11:03

我建议使用 erlang:decode_packet 来实现此目的，但为了展示如何完成此操作，这里有一对函数，它们可以去除前导的 "GET " 然后返回所有内容到第一个空格（但如果没有空格则会崩溃）。

get_host(<<"GET ", Rest/binary>>) ->
    get_host2(Rest, <<>>).

get_host2(<<" ", _/binary>>, Acc) ->
    Acc;
get_host2(<<C, Rest/binary>>, Acc) ->
    get_host2(Rest, <<Acc/binary, C>>).

基本上，我将每个不是空格的字节放入我的“累加器”中，当我找到空格时，我返回我的累加器。这是一个常见的技巧，在列表中更常见。（对于列表，您需要将新项目放在列表的前面，并在末尾反转列表，以避免 O(N) 算法变成 O(N²)，但这不是二进制文件需要。）

I would recommend erlang:decode_packet for this, but to show how it can be done, here is a pair of functions that strips the leading "GET " and then returns everything up to the first space (but crashes if there is no space).

get_host(<<"GET ", Rest/binary>>) ->
    get_host2(Rest, <<>>).

get_host2(<<" ", _/binary>>, Acc) ->
    Acc;
get_host2(<<C, Rest/binary>>, Acc) ->
    get_host2(Rest, <<Acc/binary, C>>).

Basically, I put each byte that is not a space into my "accumulator", and when I find the space I return my accumulator. This is a common trick that is more often seen with lists. (With lists you will want to put new items at the front of the list and reverse the list at the end, to avoid your O(N) algorithm turning into O(N²), but that is not needed for binaries.)

回复收藏 0 原文

完美的未来在梦里 2024-11-02 17:11:03

简单的答案（但可能不是您真正想要的）

B = <<"GET http://www.google.com HTTP/1.1">> .
{_,H}=split_binary(B,4). 
split_binary(H,21).

The simple answer (but probably not what you really is asking for)

B = <<"GET http://www.google.com HTTP/1.1">> .
{_,H}=split_binary(B,4). 
split_binary(H,21).

回复收藏 0 原文

~没有更多了~

关于作者

橘和柠

暂无简介

0 文章

0 评论

22 人气

关注发私信

友情链接

文江博客

在 Erlang 中解析二进制文件

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

烙印

singlesman

给自己一个微笑

独孤求败

晨钟暮鼓

我是自愿种绣球花的

友情链接

在 Erlang 中解析二进制文件

如果你对这篇内容有疑问，欢迎到本站社区发帖提问 参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

评论（3）

关于作者

相关话题

热门标签

推荐作者

烙印

singlesman

给自己一个微笑

独孤求败

晨钟暮鼓

我是自愿种绣球花的

友情链接

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。