解析 D 中的字符串

发布于 2024-12-20 08:27:04 字数 942 浏览 0 评论 0原文

我正在尝试学习 D,但由于缺乏文档(或我对它的理解)而苦苦挣扎,所以我来到了这里。今天早些时候我已经问了一个不同但不相关的问题。

不管怎样,这里是:

我想解析一个字符串以获得不同的东西。

字符串格式类似于:

[<label>] <mnemonic> [parameters]

如果没有标签,则必须有空格。参数可以用逗号分隔。参数类型取决于助记符。

我想使用 Phobos 库中的 std.conv: parse 来帮助我,但我无法理解有关如何解析“单词”的文档,例如,一些由空格分隔的字符任一端。它适用于整数等,如 int i = parse!int(line)。但如果我执行 string s = parse!string(line) ,它会抓取整行。

我使用 char** (或 ref string)作为数据类型手动冷解析它,就像我在 C 中编写此代码时所做的那样。学习D就不必了。

我尝试过类似的手动操作:

string get_word(ref string s)
{
        int i = 0;
        while (i < s.length && isAlphaNum(s[i]))
                i++;

        string word = s[0 .. i];
        s = s[i+1 .. $];
        return word;
}

这是一个好方法吗?有更干净的方法吗?更快的方法?也许是一种更安全的方法?我不确定 i+1 索引是否始终存在。

感谢您的帮助!

我对 D 的信心已经稍微减弱了,因为我遇到了各种各样的问题。但这条路肯定是值得的。

I'm trying to learn D but am struggling with lack of documentation (or my understanding of it), so I came here. I already asked a different but unrelated question earlier today.

Anyway, here goes:

I would like to parse a string for different things.

String format is something like:

[<label>] <mnemonic> [parameters]

If there is no label, there is mandatory whitespace. Parameters can be comma-delimited. Parameter types are dependent on the mnemonic.

I would like to use std.conv: parse from the Phobos library to aid me, but I fail to understand the documentation on how to parse a "word", as in, some characters seperated by whitespace on either end. It works fine for integers and the like as int i = parse!int(line). But if I were to do string s = parse!string(line) it would grab the entire line.

I cold parse this by hand, using char** (or, ref string) as a datatype, just like I did when I wrote this in C. But I'm learning D to not have to.

I tried something like this to do it manually:

string get_word(ref string s)
{
        int i = 0;
        while (i < s.length && isAlphaNum(s[i]))
                i++;

        string word = s[0 .. i];
        s = s[i+1 .. $];
        return word;
}

Is this a good way to do it? Is there a cleaner way? A faster way? A safer way, perhaps? I'm not sure the i+1 index always exists.

Thanks for the help!

My faith in D is slightly dwindling already, as I've run into all sorts of problems. But the path is surely going to be worth it.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

沧桑㈠ 2024-12-27 08:27:04

首先,std.conv.parse用于将事物转换为字符串,而不是分离和理解字符串意义上的解析。您需要的解决方案的复杂程度取决于格式字符串语法的复杂程度。
看看 std.string.split ,默认情况下,它将在空格上分割您的输入并返回单词数组。
如果格式太复杂,您可以:

  1. 使用正则表达式进行捕获:http ://d-programming-language.org/phobos/std_regex.html#RegexMatch

  2. 编写您自己的解析器,逐个字符前进并提取您需要的信息。

First of all, std.conv.parse is for converting things to strings, not parsing in the sense of separating and understanding a string. How complex of a solution you need will depend on the complexity of the grammar of your format string.
Look at std.string.split which, by default, will split your input on whitespace and return an array of the words.
If the format is too complex, you can:

  1. use regex with captures: http://d-programming-language.org/phobos/std_regex.html#RegexMatch

  2. write your own parser which advances character by character and extracts the info you need.

白鸥掠海 2024-12-27 08:27:04

代码是即时编写的

import std.string;
import std.stdio;
import std.algorithm;
import std.math;

enum string[] separators = [ " ", "\t", ",", ";", "\n", "\r\n" ];

string get_word( ref string s ){
    string token;
    sizediff_t storePositions[separators.length + 1]; // set size array to the number of separator in array "separators" and latest field for current string lenght 
    foreach( i, separator; separators ){             // compute position for each separator
        sizediff_t position = countUntil( s, separator );
        if( position == -1 ) position = sizediff_t.max;
        storePositions[i] = position;
    }
    storePositions[ $ -1 ] = s.length;
    sizediff_t end    = reduce!min( storePositions );
    token             = s[0 .. end].idup;
    writefln( "%s | %d", s, end );
    return token;
}

void main( string[] args ){
    string s        = "a long;string\tyeah\n strange; ok";
    bool   isRunning= true;
    size_t start    = 0;
    writefln( "parse: %s", s ); 
    while( isRunning ){
        string result = get_word( s[ start .. $] );
        if( result == "" )
            isRunning = false;
        else{
            start  += result.length + 1;
            result = get_word( s[ start .. $] );
        }
        writefln( "token: %s, position: %d", result, start );
        writeln( "----" );
    } 
}

输出:

parse: a long;string yeah
 strange; ok
a long;string yeah
 strange; ok | 1
long;string yeah
 strange; ok | 4
token: long, position: 2
----
long;string yeah
 strange; ok | 4
string yeah
 strange; ok | 6
token: string, position: 7
----
string yeah
 strange; ok | 6
yeah
 strange; ok | 4
token: yeah, position: 14
----
yeah
 strange; ok | 4
 strange; ok | 0
token: , position: 19
----
 strange; ok | 0
token: , position: 19

code is wrote on the fly

import std.string;
import std.stdio;
import std.algorithm;
import std.math;

enum string[] separators = [ " ", "\t", ",", ";", "\n", "\r\n" ];

string get_word( ref string s ){
    string token;
    sizediff_t storePositions[separators.length + 1]; // set size array to the number of separator in array "separators" and latest field for current string lenght 
    foreach( i, separator; separators ){             // compute position for each separator
        sizediff_t position = countUntil( s, separator );
        if( position == -1 ) position = sizediff_t.max;
        storePositions[i] = position;
    }
    storePositions[ $ -1 ] = s.length;
    sizediff_t end    = reduce!min( storePositions );
    token             = s[0 .. end].idup;
    writefln( "%s | %d", s, end );
    return token;
}

void main( string[] args ){
    string s        = "a long;string\tyeah\n strange; ok";
    bool   isRunning= true;
    size_t start    = 0;
    writefln( "parse: %s", s ); 
    while( isRunning ){
        string result = get_word( s[ start .. $] );
        if( result == "" )
            isRunning = false;
        else{
            start  += result.length + 1;
            result = get_word( s[ start .. $] );
        }
        writefln( "token: %s, position: %d", result, start );
        writeln( "----" );
    } 
}

output:

parse: a long;string yeah
 strange; ok
a long;string yeah
 strange; ok | 1
long;string yeah
 strange; ok | 4
token: long, position: 2
----
long;string yeah
 strange; ok | 4
string yeah
 strange; ok | 6
token: string, position: 7
----
string yeah
 strange; ok | 6
yeah
 strange; ok | 4
token: yeah, position: 14
----
yeah
 strange; ok | 4
 strange; ok | 0
token: , position: 19
----
 strange; ok | 0
token: , position: 19

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文