标记带引号的字符串
我正在尝试标记字符串。只要没有引号字符,一切都很好:
string:tokens ("abc def ghi", " ").
["abc","def","ghi"]
但是 string:tokens/2 确实对带引号的字符串有很大帮助。它的行为符合预期:
string:tokens ("abc \"def xyz\" ghi", " ").
["abc","\"def","xyz\"","ghi"]
我需要一个函数,它接受要标记化的字符串、分隔符和引号字符。就像:
tokens ("abc \"def xyz\" ghi", " ", "\"").
["abc","def xyz","ghi"]
现在,在我开始重新发明轮子之前,我的问题是:
标准库中是否有这样的函数或类似的函数?
编辑:
好的,我编写了自己的实现,但我仍然对原始问题的答案非常感兴趣。到目前为止我的代码如下:
tokens (String) -> tokens (String, [], [] ).
tokens ( [], Tokens, Buffer) ->
lists:map (fun (Token) -> string:strip (Token, both, $") end, Tokens ++ [Buffer] );
tokens ( [Character | String], Tokens, Buffer) ->
case {Character, Buffer} of
{$ , [] } -> tokens (String, Tokens, Buffer);
{$ , [$" | _] } -> tokens (String, Tokens, Buffer ++ [Character] );
{$ , _} -> tokens (String, Tokens ++ [Buffer], [] );
{$", [] } -> tokens (String, Tokens, "\"" );
{$", [$" | _] } -> tokens (String, Tokens ++ [Buffer ++ "\""], [] );
{$", _} -> tokens (String, Tokens ++ [Buffer], "\"");
_ -> tokens (String, Tokens, Buffer ++ [Character] )
end.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(4)
如果正则表达式在一般情况下是可以接受的,您可以使用:
如果您想根据任何空格而不仅仅是空格进行分割,也可以使用
"\s\"|\"\s"
。如果您碰巧从输入文件中解析此内容,您可能需要使用 strip_split/2 >字符串。
If regular expressions are acceptable in the general case you can use:
You can also use
"\s\"|\"\s"
if you want to split based on any whitespace instead of just spaces.If you happen to be parsing this from an input file, you may want to use
strip_split/2
from estring.string:tokens ("abc \"def ghi\" foo.bla", " .\"").
将在空格、点和双引号上标记字符串。结果:[" abc", "def", "ghi", "foo", "bla"]
。如果你想保留引用的部分,你可能需要考虑创建一个 Token/Lexer,因为正则表达式不是很好在这项工作中。string:tokens ("abc \"def ghi\" foo.bla", " .\"").
will tokenize the string on space, point and double quote. Result:["abc", "def", "ghi", "foo", "bla"]
. If you want to preserve the quoted parts, you might want to consider creating a Token/Lexer, because regex is not very good at this work.您可以使用 re 模块。它带有
split/3
函数。例如:第二个参数是正则表达式(您可能需要调整我的示例以删除空列表...)
You could use the re module. It comes with a
split/3
function. For eg :The second argument is a regular expression (you might have to tweak my example to remove the empty lists...)
这大约是我的写法(未经测试!):
This is approximately how I would write it (not tested!):