Scala 的“repsep”遇到问题如解析器组合器中所示
请帮忙!我正在尝试构建一个解析器来解析 中定义的 SSDP 消息UPnP 协议。 (请参阅“发现”部分)
基本上它是 HTTP OK 标头,后面是名称:值对,最后是一个空行。
经过大约 5000 种组合后,这是我最认为“应该”起作用的组合: *
class SsdpParser() extends JavaTokenParsers {def fulldoc: Parser[SsdpMessage] = header ~ nameValuePairs <~ "\r\n" ^^ { 案例标题 ~ 值 => SsdpMessage(标头,值) def
nameValuePairs:Parser[List[(String, String)]] = repsep(nameValuePair, "\r\n")
def nameValuePair:Parser[(String, String)] = ( //姓名 ("""[-a-zA-Z0-9.]*""".r <~ ":") //价值 ~ """[a-zA-Z0-9:/,_; [].-\"\'\?=]*""".r ) ^^ { 案例名称~值=> (名称、值) def
header: Parser[SsdpType] = (notifyLine | okLine | mSearch) def notificationLine:Parser[SsdpType] = "NOTIFY * HTTP/1.1\r\n" ^^ (x =>Notify) def mSearch:Parser[SsdpType] = "M-SEARCH * HTTP/1.1\r\n" ^^ (x=>Search) def okLine:Parser[SsdpType] = "HTTP/1.1 200 OK\r\n" ^^ (x=>OK)
}
然而,当它运行时,它看起来像是被第一个名称/值对阻塞了
失败:[3.1]失败:字符串匹配正则表达式
\r\n' 需要,但是
'发现日期:2011 年 3 月 28 日星期一 06:37:31 GMT
^
正在查找换行符,为什么不匹配?是的,我已经验证了这两个字符(UTF-8 中的 0x0d 0x0a)在规范和收到的实际数据中都是如此。
示例数据转储如下: *
HTTP/1.1 200 OK Cache-Control: max-age=300 Date: Mon, 28 Mar 2011 06:37:31 GMT Ext: Location: http://192.168.1.1:1780/InternetGatewayDevice.xml Server: POSIX UPnP/1.0 DD-WRT Linux/V24 ST: urn:schemas-upnp-org:service:WANCommonInterfaceConfig:1 USN: uuid:01A0DDA7-7404-815B-63C4-539B920D5E56::urn:schemas-upnp-org:service:WANCommonInterfaceConfig:10000 00 50 8d b5 2b df 00 12 17 39 a2 17 08 00 45 00 .P..+....9....E。 0010 01 83 00 00 40 00 40 11 b5 f0 c0 a8 01 01 c0 a8 ....@.@......... 0020 01 28 07 6c cf 07 01 6f 7d b8 48 54 54 50 2f 31 .(.l...o}.HTTP/1 0030 2e 31 20 32 30 30 20 4f 4b 0d 0a 43 61 63 68 65 .1 200 好的..缓存 0040 2d 43 6f 6e 74 72 6f 6c 3a 20 6d 61 78 2d 61 67 -控制:最大ag 0050 65 3d 33 30 30 0d 0a 44 61 74 65 3a 20 53 75 6e e=300..日期:周日 0060 2c 20 32 37 20 4d 61 72 20 32 30 31 31 20 30 38 , 2011 年 3 月 27 日 08 0070 3a 34 37 3a 33 34 20 47 4d 54 0d 0a 45 78 74 3a :47:34 GMT..分机: 0080 20 0d 0a 4c 6f 63 61 74 69 6f 6e 3a 20 68 74 74 ..位置:htt 0090 70 3a 2f 2f 31 39 32 2e 31 36 38 2e 31 2e 31 3a p://192.168.1.1: 00a0 31 37 38 30 2f 49 6e 74 65 72 6e 65 74 47 61 74 1780/InternetGat 00b0 65 77 61 79 44 65 76 69 63 65 2e 78 6d 6c 0d 0a ewayDevice.xml.. 00c0 53 65 72 76 65 72 3a 20 50 4f 53 49 58 20 55 50 服务器:POSIX UP 00d0 6e 50 2f 31 2e 30 20 44 44 2d 57 52 54 20 4c 69 nP/1.0 DD-WRT Li 00e0 6e 75 78 2f 56 32 34 0d 0a 53 54 3a 20 75 72 6e nux/V24..ST:瓮 00f0 3a 73 63 68 65 6d 61 73 2d 75 70 6e 70 2d 6f 72 :schemas-upnp-or 0100 67 3a 73 65 72 76 69 63 65 3a 57 41 4e 43 6f 6d g:服务:WANCom 0110 6d 6f 6e 49 6e 74 65 72 66 61 63 65 43 6f 6e 66 monInterfaceConf 0120 69 67 3a 31 0d 0a 55 53 4e 3a 20 75 75 69 64 3a ig:1..USN: uuid: 0130 30 31 41 30 44 44 41 37 2d 37 34 30 34 2d 38 31 01A0DDA7-7404-81 0140 35 42 2d 36 33 43 34 2d 35 33 39 42 39 32 30 44 5B-63C4-539B920D 0150 35 45 35 36 3a 3a 75 72 6e 3a 73 63 68 65 6d 61 5E56::瓮:架构 0160 73 2d 75 70 6e 70 2d 6f 72 67 3a 73 65 72 76 69 s-upnp-org:servi 0170 63 65 3a 57 41 4e 43 6f 6d 6d 6f 6e 49 6e 74 65 ce:WANCommonInte 0180 72 66 61 63 65 43 6f 6e 66 69 67 3a 31 0d 0a 0d rface配置:1... 0190 0a
Please help! I am trying to build a parser to parse SSDP messages as defined in the UPnP protocol. (see "Discovery" section)
Basically it's a header of HTTP OK, followed by name: value pairs, and finally a blank line.
After roughly 5000 combinations, this is the one that I most think "should" work:
*
class SsdpParser() extends JavaTokenParsers {def fulldoc: Parser[SsdpMessage] = header ~ nameValuePairs <~ "\r\n" ^^ { case header ~ values => SsdpMessage(header, values) }
def nameValuePairs:Parser[List[(String, String)]] = repsep(nameValuePair, "\r\n")
def nameValuePair:Parser[(String, String)] = ( //name ("""[-a-zA-Z0-9.]*""".r <~ ":") //value ~ """[a-zA-Z0-9:/,_; [].-\"\'\?=]*""".r ) ^^ { case name ~ value => (name, value) }
def header: Parser[SsdpType] = (notifyLine | okLine | mSearch) def notifyLine:Parser[SsdpType] = "NOTIFY * HTTP/1.1\r\n" ^^ (x =>Notify) def mSearch:Parser[SsdpType] = "M-SEARCH * HTTP/1.1\r\n" ^^ (x=>Search) def okLine:Parser[SsdpType] = "HTTP/1.1 200 OK\r\n" ^^ (x=>OK)
}
However, when it's run, it looks like it chokes on the very first name/value pair
Failure: [3.1] failure: string matching regex
\r\n' expected but
' foundDate: Mon, 28 Mar 2011 06:37:31 GMT
^
It is finding a line break, why isn't this matching? And yes, I have verified that the two characters are (a.k.a 0x0d 0x0a in UTF-8) in both the specification and the actual data received.
A sample data dump is here:
*
HTTP/1.1 200 OK Cache-Control: max-age=300 Date: Mon, 28 Mar 2011 06:37:31 GMT Ext: Location: http://192.168.1.1:1780/InternetGatewayDevice.xml Server: POSIX UPnP/1.0 DD-WRT Linux/V24 ST: urn:schemas-upnp-org:service:WANCommonInterfaceConfig:1 USN: uuid:01A0DDA7-7404-815B-63C4-539B920D5E56::urn:schemas-upnp-org:service:WANCommonInterfaceConfig:10000 00 50 8d b5 2b df 00 12 17 39 a2 17 08 00 45 00 .P..+....9....E. 0010 01 83 00 00 40 00 40 11 b5 f0 c0 a8 01 01 c0 a8 ....@.@......... 0020 01 28 07 6c cf 07 01 6f 7d b8 48 54 54 50 2f 31 .(.l...o}.HTTP/1 0030 2e 31 20 32 30 30 20 4f 4b 0d 0a 43 61 63 68 65 .1 200 OK..Cache 0040 2d 43 6f 6e 74 72 6f 6c 3a 20 6d 61 78 2d 61 67 -Control: max-ag 0050 65 3d 33 30 30 0d 0a 44 61 74 65 3a 20 53 75 6e e=300..Date: Sun 0060 2c 20 32 37 20 4d 61 72 20 32 30 31 31 20 30 38 , 27 Mar 2011 08 0070 3a 34 37 3a 33 34 20 47 4d 54 0d 0a 45 78 74 3a :47:34 GMT..Ext: 0080 20 0d 0a 4c 6f 63 61 74 69 6f 6e 3a 20 68 74 74 ..Location: htt 0090 70 3a 2f 2f 31 39 32 2e 31 36 38 2e 31 2e 31 3a p://192.168.1.1: 00a0 31 37 38 30 2f 49 6e 74 65 72 6e 65 74 47 61 74 1780/InternetGat 00b0 65 77 61 79 44 65 76 69 63 65 2e 78 6d 6c 0d 0a ewayDevice.xml.. 00c0 53 65 72 76 65 72 3a 20 50 4f 53 49 58 20 55 50 Server: POSIX UP 00d0 6e 50 2f 31 2e 30 20 44 44 2d 57 52 54 20 4c 69 nP/1.0 DD-WRT Li 00e0 6e 75 78 2f 56 32 34 0d 0a 53 54 3a 20 75 72 6e nux/V24..ST: urn 00f0 3a 73 63 68 65 6d 61 73 2d 75 70 6e 70 2d 6f 72 :schemas-upnp-or 0100 67 3a 73 65 72 76 69 63 65 3a 57 41 4e 43 6f 6d g:service:WANCom 0110 6d 6f 6e 49 6e 74 65 72 66 61 63 65 43 6f 6e 66 monInterfaceConf 0120 69 67 3a 31 0d 0a 55 53 4e 3a 20 75 75 69 64 3a ig:1..USN: uuid: 0130 30 31 41 30 44 44 41 37 2d 37 34 30 34 2d 38 31 01A0DDA7-7404-81 0140 35 42 2d 36 33 43 34 2d 35 33 39 42 39 32 30 44 5B-63C4-539B920D 0150 35 45 35 36 3a 3a 75 72 6e 3a 73 63 68 65 6d 61 5E56::urn:schema 0160 73 2d 75 70 6e 70 2d 6f 72 67 3a 73 65 72 76 69 s-upnp-org:servi 0170 63 65 3a 57 41 4e 43 6f 6d 6d 6f 6e 49 6e 74 65 ce:WANCommonInte 0180 72 66 61 63 65 43 6f 6e 66 69 67 3a 31 0d 0a 0d rfaceConfig:1... 0190 0a
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
正如 jrudolph 所说,RegexParsers(及其子类 JavaTokenParsers)默认跳过空格。您可以选择通过覆盖
skipWhitespace
来告诉它不要跳过空白,并且您还可以选择通过覆盖受保护的内容来告诉它您认为是空白的内容。val 空白:正则表达式
。问题来自于此:
这里,
\r\n
被自动跳过,因此永远找不到它。更改skipWhitespace
后,您会收到错误,因为文件末尾有一个额外的\r\n
,因此它期望看到另一个nameValuePair
。您可能会更幸运:
或者,您可以完全删除
\r\n
并让解析器跳过空格。As jrudolph said,
RegexParsers
(and its subclassJavaTokenParsers
) skip whitespace by default. You have the option of telling it not to skip whitespace, by overridingskipWhitespace
, and you also have the option of telling it what you consider to be whitespace, by overriding the protectedval whiteSpace: Regex
.The problem comes from this:
Here,
\r\n
is being skipped automatically, so it is never found. Once you changedskipWhitespace
, you got errors because there's an extra\r\n
at the end of the file, so it expects to see anothernameValuePair
.You might have better luck with this:
Alternatively, you might remove
\r\n
altogether and let the parser skip whitespace.快速浏览一下,三个建议:
After a quick glance, three suggestions: