使用 Prolog DCG 分割字符串

发布于 2024-11-09 05:42:54 字数 1090 浏览 5 评论 0原文

我正在尝试使用 DCG 将字符串拆分为用空格分隔的两部分。例如“abc def”应该给我返回“abc”和“abc def”。 “定义”。该计划& DCG如下。

main:-
    prompt(_, ''),
    repeat,
    read_line_to_codes(current_input, Codes),
    (
        Codes = end_of_file
    ->
        true
    ;
        processData(Codes),
        fail
    ).

processData(Codes):-
    (
        phrase(data(Part1, Part2), Codes)
    ->
        format('~s, ~s\n', [ Part1, Part2 ])
    ;
        format('Didn''t recognize data.\n')
    ).

data([ P1 | Part1 ], [ P2 | Part2 ]) --> [ P1 | Part1 ], spaces(_), [ P2 | Part2 ].
spaces([ S | S1 ]) --> [ S ], { code_type(S, space) }, (spaces(S1); "").

这工作正常。但我发现必须输入 [ P1 |第 1 部分] & <代码>[ P2 |第 2 部分] 确实很冗长。因此，我尝试替换 [ P1 | 的所有实例第 1 部分] w/ 第 1 部分 &同样w/ [ P2 | Part2]中data的定义，即如下。

data(Part1, Part2) --> Part1, spaces(_), Part2.

这更容易输入，但这给了我一个 Arguments are not充分实例化 错误。因此，看起来未绑定的变量不会自动解释为 DCG 中的代码列表。有没有其他方法可以让这个不那么冗长？我的目的是在其他编程语言中使用正则表达式的地方使用 DCG。

原文

I'm trying to use a DCG to split a string into two parts separated by spaces. E.g. 'abc def' should give me back "abc" & "def". The program & DCG are below.

main:-
    prompt(_, ''),
    repeat,
    read_line_to_codes(current_input, Codes),
    (
        Codes = end_of_file
    ->
        true
    ;
        processData(Codes),
        fail
    ).

processData(Codes):-
    (
        phrase(data(Part1, Part2), Codes)
    ->
        format('~s, ~s\n', [ Part1, Part2 ])
    ;
        format('Didn''t recognize data.\n')
    ).

data([ P1 | Part1 ], [ P2 | Part2 ]) --> [ P1 | Part1 ], spaces(_), [ P2 | Part2 ].
spaces([ S | S1 ]) --> [ S ], { code_type(S, space) }, (spaces(S1); "").

This works correctly. But I found that having to type [ P1 | Part1 ] & [ P2 | Part2 ] was really verbose. So, I tried replacing all instances of [ P1 | Part1 ] w/ Part1 & likewise w/ [ P2 | Part2 ] in the definition of data, i.e. the following.

data(Part1, Part2) --> Part1, spaces(_), Part2.

That's much easier to type, but that gave me an Arguments are not sufficiently instantiated error. So it looks like an unbound variable isn't automatically interpreted as a list of codes in a DCG. Is there any other way to make this less verbose? My intent is to use DCG's where I would use regular expressions in other programming languages.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

谁的新欢旧爱 2024-11-16 05:42:54

你的直觉是正确的； DCG 的术语扩展过程（至少在 SWI-Prolog 中，但应该适用于其他）与您修改后的 data 版本给出以下内容：

?- listing(data). 

data(A, D, B, F) :-
    phrase(A, B, C),
    spaces(_, C, E),
    phrase(D, E, F).

如您所见，变量 Part1< DCG 规则的 /code> 和 Part2 部分已被解释为再次调用 phrase/3，而不是列表；您需要明确指定它们是列表，以便将它们视为列表。

我可以建议一个更通用的替代版本。考虑以下一组 DCG 规则：

data([A|As]) --> 
    spaces(_), 
    chars([X|Xs]), 
    {atom_codes(A, [X|Xs])}, 
    spaces(_), 
    data(As).
data([]) --> [].

chars([X|Xs]) --> char(X), !, chars(Xs).
chars([]) --> [].

spaces([X|Xs]) --> space(X), !, spaces(Xs).
spaces([]) --> [].

space(X) --> [X], {code_type(X, space)}. 
char(X) --> [X], {\+ code_type(X, space)}.

看一下顶部的第一个子句； data 规则现在尝试匹配 0 到多个空格（由于剪切而尽可能多），然后匹配一对多非空格字符来构造一个原子 (A）从代码中，然后再次0到多空格，然后递归以在字符串中找到更多原子（As）。你最终得到的是一个原子列表，它出现在输入字符串中，没有任何空格。您可以通过以下方式将此版本合并到您的代码中：

processData(Codes) :-
    % convert the list of codes to a list of code lists of words
    (phrase(data(AtomList), Codes) ->
        % concatenate the atoms into a single one delimited by commas
        concat_atom(AtomList, ', ', Atoms),
        write_ln(Atoms)
    ;
        format('Didn''t recognize data.\n')
    ).

此版本将字符串分开，单词之间有任意数量的空格，即使它们出现在字符串的开头和结尾。

Your intuition is correct; the term-expansion procedure for DCGs (at least in SWI-Prolog, but should apply to others) with your modified version of data gives the following:

?- listing(data). 

data(A, D, B, F) :-
    phrase(A, B, C),
    spaces(_, C, E),
    phrase(D, E, F).

As you can see, the variable Part1 and Part2 parts of your DCG rule have been interpreted into calls to phrase/3 again, and not lists; you need to explicitly specify that they are lists for them to be treated as such.

I can suggest an alternative version which is more general. Consider the following bunch of DCG rules:

data([A|As]) --> 
    spaces(_), 
    chars([X|Xs]), 
    {atom_codes(A, [X|Xs])}, 
    spaces(_), 
    data(As).
data([]) --> [].

chars([X|Xs]) --> char(X), !, chars(Xs).
chars([]) --> [].

spaces([X|Xs]) --> space(X), !, spaces(Xs).
spaces([]) --> [].

space(X) --> [X], {code_type(X, space)}. 
char(X) --> [X], {\+ code_type(X, space)}.

Take a look at the first clause at the top; the data rule now attempts to match 0-to-many spaces (as many as possible, because of the cut), then one-to-many non-space characters to construct an atom (A) from the codes, then 0-to-many spaces again, then recurses to find more atoms in the string (As). What you end up with is a list of atoms which appeared in the input string without any spaces. You can incorporate this version into your code with the following:

processData(Codes) :-
    % convert the list of codes to a list of code lists of words
    (phrase(data(AtomList), Codes) ->
        % concatenate the atoms into a single one delimited by commas
        concat_atom(AtomList, ', ', Atoms),
        write_ln(Atoms)
    ;
        format('Didn''t recognize data.\n')
    ).

This version breaks a string apart with any number of spaces between words, even if they appear at the start and end of the string.

回复收藏 0 原文

~没有更多了~