解析 D 中的字符串

发布于 2024-12-20 08:27:04 字数 942 浏览 0 评论 0原文

我正在尝试学习 D，但由于缺乏文档（或我对它的理解）而苦苦挣扎，所以我来到了这里。今天早些时候我已经问了一个不同但不相关的问题。

不管怎样，这里是：

我想解析一个字符串以获得不同的东西。

字符串格式类似于：

[<label>] <mnemonic> [parameters]

如果没有标签，则必须有空格。参数可以用逗号分隔。参数类型取决于助记符。

我想使用 Phobos 库中的 std.conv: parse 来帮助我，但我无法理解有关如何解析“单词”的文档，例如，一些由空格分隔的字符任一端。它适用于整数等，如 int i = parse!int(line)。但如果我执行 string s = parse!string(line) ，它会抓取整行。

我使用 char** （或 ref string）作为数据类型手动冷解析它，就像我在 C 中编写此代码时所做的那样。学习D就不必了。

我尝试过类似的手动操作：

string get_word(ref string s)
{
        int i = 0;
        while (i < s.length && isAlphaNum(s[i]))
                i++;

        string word = s[0 .. i];
        s = s[i+1 .. $];
        return word;
}

这是一个好方法吗？有更干净的方法吗？更快的方法？也许是一种更安全的方法？我不确定 i+1 索引是否始终存在。

感谢您的帮助！

我对 D 的信心已经稍微减弱了，因为我遇到了各种各样的问题。但这条路肯定是值得的。

原文

I'm trying to learn D but am struggling with lack of documentation (or my understanding of it), so I came here. I already asked a different but unrelated question earlier today.

Anyway, here goes:

I would like to parse a string for different things.

String format is something like:

[<label>] <mnemonic> [parameters]

If there is no label, there is mandatory whitespace. Parameters can be comma-delimited. Parameter types are dependent on the mnemonic.

I would like to use std.conv: parse from the Phobos library to aid me, but I fail to understand the documentation on how to parse a "word", as in, some characters seperated by whitespace on either end. It works fine for integers and the like as int i = parse!int(line). But if I were to do string s = parse!string(line) it would grab the entire line.

I cold parse this by hand, using char** (or, ref string) as a datatype, just like I did when I wrote this in C. But I'm learning D to not have to.

I tried something like this to do it manually:

string get_word(ref string s)
{
        int i = 0;
        while (i < s.length && isAlphaNum(s[i]))
                i++;

        string word = s[0 .. i];
        s = s[i+1 .. $];
        return word;
}

Is this a good way to do it? Is there a cleaner way? A faster way? A safer way, perhaps? I'm not sure the i+1 index always exists.

Thanks for the help!

My faith in D is slightly dwindling already, as I've run into all sorts of problems. But the path is surely going to be worth it.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

沧桑㈠ 2024-12-27 08:27:04

首先，std.conv.parse用于将事物转换为字符串，而不是分离和理解字符串意义上的解析。您需要的解决方案的复杂程度取决于格式字符串语法的复杂程度。
看看 std.string.split ，默认情况下，它将在空格上分割您的输入并返回单词数组。
如果格式太复杂，您可以：

使用正则表达式进行捕获：http ://d-programming-language.org/phobos/std_regex.html#RegexMatch
编写您自己的解析器，逐个字符前进并提取您需要的信息。

回复收藏 0 原文

白鸥掠海 2024-12-27 08:27:04

代码是即时编写的

import std.string;
import std.stdio;
import std.algorithm;
import std.math;

enum string[] separators = [ " ", "\t", ",", ";", "\n", "\r\n" ];

string get_word( ref string s ){
    string token;
    sizediff_t storePositions[separators.length + 1]; // set size array to the number of separator in array "separators" and latest field for current string lenght 
    foreach( i, separator; separators ){             // compute position for each separator
        sizediff_t position = countUntil( s, separator );
        if( position == -1 ) position = sizediff_t.max;
        storePositions[i] = position;
    }
    storePositions[ $ -1 ] = s.length;
    sizediff_t end    = reduce!min( storePositions );
    token             = s[0 .. end].idup;
    writefln( "%s | %d", s, end );
    return token;
}

void main( string[] args ){
    string s        = "a long;string\tyeah\n strange; ok";
    bool   isRunning= true;
    size_t start    = 0;
    writefln( "parse: %s", s ); 
    while( isRunning ){
        string result = get_word( s[ start .. $] );
        if( result == "" )
            isRunning = false;
        else{
            start  += result.length + 1;
            result = get_word( s[ start .. $] );
        }
        writefln( "token: %s, position: %d", result, start );
        writeln( "----" );
    } 
}

输出：

parse: a long;string yeah
 strange; ok
a long;string yeah
 strange; ok | 1
long;string yeah
 strange; ok | 4
token: long, position: 2
----
long;string yeah
 strange; ok | 4
string yeah
 strange; ok | 6
token: string, position: 7
----
string yeah
 strange; ok | 6
yeah
 strange; ok | 4
token: yeah, position: 14
----
yeah
 strange; ok | 4
 strange; ok | 0
token: , position: 19
----
 strange; ok | 0
token: , position: 19

code is wrote on the fly

import std.string;
import std.stdio;
import std.algorithm;
import std.math;

enum string[] separators = [ " ", "\t", ",", ";", "\n", "\r\n" ];

string get_word( ref string s ){
    string token;
    sizediff_t storePositions[separators.length + 1]; // set size array to the number of separator in array "separators" and latest field for current string lenght 
    foreach( i, separator; separators ){             // compute position for each separator
        sizediff_t position = countUntil( s, separator );
        if( position == -1 ) position = sizediff_t.max;
        storePositions[i] = position;
    }
    storePositions[ $ -1 ] = s.length;
    sizediff_t end    = reduce!min( storePositions );
    token             = s[0 .. end].idup;
    writefln( "%s | %d", s, end );
    return token;
}

void main( string[] args ){
    string s        = "a long;string\tyeah\n strange; ok";
    bool   isRunning= true;
    size_t start    = 0;
    writefln( "parse: %s", s ); 
    while( isRunning ){
        string result = get_word( s[ start .. $] );
        if( result == "" )
            isRunning = false;
        else{
            start  += result.length + 1;
            result = get_word( s[ start .. $] );
        }
        writefln( "token: %s, position: %d", result, start );
        writeln( "----" );
    } 
}

output:

parse: a long;string yeah
 strange; ok
a long;string yeah
 strange; ok | 1
long;string yeah
 strange; ok | 4
token: long, position: 2
----
long;string yeah
 strange; ok | 4
string yeah
 strange; ok | 6
token: string, position: 7
----
string yeah
 strange; ok | 6
yeah
 strange; ok | 4
token: yeah, position: 14
----
yeah
 strange; ok | 4
 strange; ok | 0
token: , position: 19
----
 strange; ok | 0
token: , position: 19

回复收藏 0 原文

~没有更多了~