在 C++ 中解析这个的最好方法是什么?
在我的程序中,我有一个以下格式的“服务器地址”列表:
host[:port]
这里的括号表示端口
是可选的。
host
可以是主机名、IPv4 或 IPv6 地址(可能采用“括号内”表示法)。port
,如果存在,可以是数字端口号或服务字符串(例如:“http”或“ssh”)。
如果 port
存在并且 host
是 IPv6 地址,则 host
必须 采用“括号括起来”表示法 (示例:[::1]
)
以下是一些有效的示例:
localhost
localhost:11211
127.0.0.1:http
[::1]:11211
::1
[::1]
和一个无效的示例:
::1:80 // Invalid: Is this the IPv6 address ::1:80 and a default port, or the IPv6 address ::1 and the port 80 ?
::1:http // This is not ambigous, but for simplicity sake, let's consider this is forbidden as well.
我的目标是将这些条目分为两部分(显然是 host
和 端口
)。我不在乎 host
或 port
是否无效,只要它们不包含非括号括起来的 :
( 290.234.34.34.5
对于host
来说是可以的,在接下来的过程中会被拒绝);我只是想将这两部分分开,或者如果没有 port
部分,以某种方式知道。
我尝试用 std::stringstream
做一些事情,但我所做的一切看起来都很老套,而且不太优雅。
在 C++
中你会如何做到这一点?
我不介意用 C
回答,但首选 C++
。任何boost
解决方案也受到欢迎。
谢谢。
In my program, I have a list of "server address" in the following format:
host[:port]
The brackets here, indicate that the port
is optional.
host
can be a hostname, an IPv4 or IPv6 address (possibly in "bracket-enclosed" notation).port
, if present can be a numeric port number or a service string (like: "http" or "ssh").
If port
is present and host
is an IPv6 address, host
must be in "bracket-enclosed" notation (Example: [::1]
)
Here are some valid examples:
localhost
localhost:11211
127.0.0.1:http
[::1]:11211
::1
[::1]
And an invalid example:
::1:80 // Invalid: Is this the IPv6 address ::1:80 and a default port, or the IPv6 address ::1 and the port 80 ?
::1:http // This is not ambigous, but for simplicity sake, let's consider this is forbidden as well.
My goal is to separate such entries in two parts (obviously host
and port
). I don't care if either the host
or port
are invalid as long as they don't contain a non-bracket-enclosed :
(290.234.34.34.5
is ok for host
, it will be rejected in the next process); I just want to separate the two parts, or if there is no port
part, to know it somehow.
I tried to do something with std::stringstream
but everything I come up to seems hacky and not really elegant.
How would you do this in C++
?
I don't mind answers in C
but C++
is prefered. Any boost
solution is welcome as well.
Thank you.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(7)
你看过 boost ::精神?不过,这对于你的任务来说可能有点过分了。
Have you looked at boost::spirit? It might be overkill for your task, though.
这是一个简单的类,它使用 boost::xpressive 来完成验证 IP 地址类型的工作,然后您可以解析其余部分以获得结果。
用法:
类的头文件,IpAddress.h
类的源文件,IpAddress.cpp
我只设置了 IPv4 的规则,因为我不知道 IPv6 的正确格式。但我很确定实施起来并不难。 Boost Xpressive只是一个基于模板的解决方案,因此不需要将任何.lib文件编译到您的exe中,我相信这是一个优点。
顺便说一下,简而言之,只是为了分解正则表达式的格式......
^ = 字符串开头
$ = 字符串结尾
[] = 可以出现的一组字母或数字
[0-9] = 0 到 9 之间的任何个位数
[0-9]+ = 0 到 9 之间的一位或多位数字
这 '。'对于正则表达式有特殊含义,但由于我们的格式在 IP 地址格式中有 1 个点,因此我们需要指定我们想要一个“.”。数字之间使用“\.”。但由于 C++ 需要“\”的转义序列,我们必须使用“\\。”
? = 可选组件
因此,简而言之,"^[0-9]+$" 表示正则表达式,这对于整数来说是正确的。
“^[0-9]+\.$”表示以“.”结尾的整数
"^[0-9]+\.[0-9]?$" 是一个以“.”结尾的整数或小数。
对于整数或实数,正则表达式为 "^[0-9]+(\.[0-9]*)?$"。
正则表达式中 2 到 3 个数字之间的整数为 "^[0-9]{2,3}$"。
现在来分解 ip 地址的格式:
这与:“^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1, 3}\.[0-9]+(\:[0-9]{1,5})?$",这意味着:
第二个正则表达式比这个更简单。它只是字母数字值的组合,后跟可选的冒号和端口号。
顺便说一句,如果您想测试 RegEx,可以使用此网站。
编辑:我没有注意到您可以选择使用 http 而不是端口号。为此,您可以将表达式更改为:
这接受如下格式:
127.0.0.1
127.0.0.1:3282
127.0.0.1:http
217.0.0.1:ftp
18.123.2.1:smtp
Here's a simple class that uses boost::xpressive to do the job of verifying the type of IP address and then you can parse the rest to get the results.
Usage:
The header file of the class, IpAddress.h
The source file for the class, IpAddress.cpp
I have only set the rules for IPv4 because I don't know the proper format for IPv6. But I'm pretty sure it's not hard to implement it. Boost Xpressive is just a template based solution and hence do not require any .lib files to be compiled into your exe, which I believe makes is a plus.
By the way just to break down the format of regex in a nutshell...
^ = start of string
$ = end of string
[] = a group of letters or digits that can appear
[0-9] = any single-digit between 0 and 9
[0-9]+ = one or more digits between 0 and 9
the '.' has a special meaning for regex but since our format has 1 dot in an ip-address format we need to specify that we want a '.' between digits by using '\.'. But since C++ needs an escape sequence for '\' we'll have to use "\\."
? = optional component
So, in short, "^[0-9]+$" represents a regex, which is true for an integer.
"^[0-9]+\.$" means an integer that ends with a '.'
"^[0-9]+\.[0-9]?$" is either an integer that ends with a '.' or a decimal.
For an integer or a real number, the regex would be "^[0-9]+(\.[0-9]*)?$".
RegEx an integer that is between 2 and 3 numbers is "^[0-9]{2,3}$".
Now to break down the format of the ip address:
This is synonymous to: "^[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]+(\:[0-9]{1,5})?$", which means:
The second RegEx is simpler than this. It's just a combination of a alpha-numeric value followed by an optional colon and port-number.
By the way, if you would like to test out RegEx you can use this site.
Edit: I failed to notice that you optionally had http instead of port number. For that you can change the expression to:
This accepts formats like:
127.0.0.1
127.0.0.1:3282
127.0.0.1:http
217.0.0.1:ftp
18.123.2.1:smtp
我参加聚会迟到了,但我正在谷歌上搜索如何做到这一点。 Spirit 和 C++ 已经成长了很多,所以让我添加 2021 年的内容:
Live On Compiler Explorer
打印
奖励
验证/解析地址。解析100%不变,只是使用Asio解析结果,同时验证它们:
打印(有限网络Live On Wandbox< /a> 和 Coliruhttp://coliru.stacked-crooked.com/a/497d8091b40c9f2d)
I'm late to the party, but I was googling for just how to do this. Spirit and C++ have grown up a lot, so let me add a 2021 take:
Live On Compiler Explorer
Printing
BONUS
Validating/resolving the addresses. The parsing is 100% unchanged, just using Asio to resolve the results, also validating them:
Prints (limited network Live On Wandbox and Coliruhttp://coliru.stacked-crooked.com/a/497d8091b40c9f2d)
如前所述,Boost.Spirit.Qi 可以处理这个问题。
如前所述,这(真的)太过分了。
我真的不认为这需要一个解析库,由于
:
的超载使用,它可能不会提高可读性。现在我的解决方案当然不是完美无缺的,例如人们可能会想知道它的效率......但我真的认为它已经足够了,至少你不会失去下一个维护者,因为根据经验,Qi 表达式几乎是清晰的!
As mentioned, Boost.Spirit.Qi could handle this.
As mentioned, it's overkill (really).
I really don't think this warrants a parsing library, it might not gain in readability because of the overloaded use of
:
.Now my solution is certainly not flawless, one could for example wonder about its efficiency... but I really think it's sufficient, and at least you'll not lose the next maintainer, because from experience Qi expressions can be all but clear!
如果您通过字符串或 C++ 中的字符数组获取端口和主机;你可以得到字符串的长度。执行 for 循环,直到字符串末尾,直到找到一个单独的冒号,然后在该位置将字符串分成两部分。
只是一个建议,它有点深,我确信有一种更有效的方法,但希望这会有所帮助,
大风
If you are getting the port and host via a string or in C++ an array of characters; you could get the length of the string. Do a for loop until the end of the string and go until you find a single colon by itself and the split the string into two parts at that location.
Just a suggestion its kinda deep and I'm sure there is a more efficient way but hope this helps,
Gale