使用 C# 对自定义文本文件格式文件进行标记

发布于 2024-11-05 09:03:00 字数 784 浏览 5 评论 0原文

我想解析一种基于文本的文件格式,其语法有点奇怪。这里有一些有效的示例行:

<region>sample=piano C3.wav key=48 ampeg_release=0.7 // a comment here
<region>key = 49 sample = piano Db3.wav
<region>
group=1
key = 48
    sample = piano D3.ogg

我认为对我来说,想出一个有意义的正则表达式太复杂了,但我想知道是否有一种好的方法可以在不编写自己的解析器的情况下标记这种类型的输入?即我想要读取上面的输入并吐出“令牌”流的东西,例如,我的示例格式开始的输出将类似于:

new Region(), new Sample("piano C3.wav"), new Key("48"), new AmpegRelease("0.7"), new Region()

是否有一个好的库/教程可以指出我的以优雅的方式实现这一点的正确方向?

更新:我用Irony尝试了这个,但是我需要解析语法的怪癖(特别是 example= 后面的数据可以有一个空格)导致他们建议我最好基于 String.Split 编写自己的代码。请参阅此处的讨论。

I want to parse a text-based file format that has a slightly quirky syntax. Here's a few valid example lines:

<region>sample=piano C3.wav key=48 ampeg_release=0.7 // a comment here
<region>key = 49 sample = piano Db3.wav
<region>
group=1
key = 48
    sample = piano D3.ogg

I think it would be too complicated for me to come up with a regular expression that makes sense of that, but I am wondering if there is a good way of tokenising this type of input without writing my own parser? i.e I would like something that reads the above input and spits out a stream of 'tokens', for example, the output for the start of my example format would be something like:

new Region(), new Sample("piano C3.wav"), new Key("48"), new AmpegRelease("0.7"), new Region()

Is there a good library / tutorial that would point me in the right direction for an elegant way to implement this?

Update: I tried this with Irony, but the quirks of the syntax I need to parse (in particular the fact that the data following sample= can have a space in it) led them to suggest that I might be better of writing my own code based on String.Split. See discussion here.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(3

滥情空心 2024-11-12 09:03:01

对于这种类型的事情,我会选择轻量级但坚固的 CoCo/R。如果您向我展示更多示例输入,我可能会想出一个语法起点。


我以前使用过lex和yacc,所以有一些解析经验。 – 马克·希思 17 分钟前

那么你很幸运:我在 Fedora 的 soundfont-utils 包中找到了 sfz 的 lex 语法。该软件包包含 sfz2pat util。您可以在此处获取(源)包:

http:// /rpmfind.net//linux/RPM/fedora/14/i386/soundfont-utils-0.4-10.fc12.i686.html
(src.rpm< /a>)


根据快速探测,最新版本的语法是 2004 年 11 月发布的,但相当复杂(sfz2pat.l 中的 58k)。下面是一个示例,供您品尝:

%option noyywrap
%option nounput
%option outfile = "sfz2pat.c"

nm  ([^\n]+".wav"|[^ \t\n\r]+|\"[^\"\n]+\")
ipn [A-Ga-g][#b]?([0-9]|"-1")

%s  K

%%

"//".*  ;

<K>"<group>"    {
    int i;
    leave_region();
    leave_group();
    if (!enter_group()) {
        SFZERR
        "Can't start group\n");
        return 1;
    }
    am_in_group_scope = TRUE;
    for (i = FIRST_SFZ_PARM; i < MAX_SFZ_PARM; i++) group_parm[i] = default_parm[i];
    for (i = 0; i < MAX_FLOAT_PARM; i++) group_flt_parm[i] = default_flt_parm[i];
    group_parm[REGION_IN_GROUP] = current_group;
    BEGIN(0);
}
<K>"<region>"   {
    int i;
    if (!am_in_group) {
        SFZERR
        "Can't start region outside group.\n");
        return 1;
    }
    leave_region();
    if (!enter_region()) {
        SFZERR
        "Can't start region\n");
        return 1;
    }
    am_in_group_scope = FALSE;
    for (i = 0; i < MAX_SFZ_PARM; i++) region_parm[i] = group_parm[i];
    for (i = 0; i < MAX_FLOAT_PARM; i++) region_flt_parm[i] = group_flt_parm[i];
    BEGIN(0);
}
<K>"sample="{nm} {
    int i = 7, j;
    unsigned namelen;
    if (yytext[i] == '"') {
        i++;
        for (j = i; j < yyleng && yytext[j] != '"'; j++) ;
    }
    else j = yyleng;
    namelen = (unsigned)(j - i + 1);
    sfzname = strncpy( (char *)malloc(namelen), yytext+i, (unsigned)(j-i) );
    sfzname[j-i] = '\0';
    for (i = 0; i < (int)namelen; i++) if (sfzname[i] == '\\') sfzname[i] = '/';
    SFZDBG
    "Sample name is \"%s\"", sfzname);
    SFZNL
    if (read_sample(sfzname)) {
#ifndef LOADER
        fprintf(stderr, "\n");
#endif
        return 0;
    }
    BEGIN(0);
}
[...snip...]

For this type of thing I'd get the lightweight but robust CoCo/R. If you show me some more sample input, I might come up with a grammar starting point.


I've used lex and yacc before, so I have some parsing experience. – Mark Heath 17 mins ago

Well you're in luck: I've found a lex grammar for sfz in Fedora's soundfont-utils package. That package contains the sfz2pat util. You can get the (source) package here:

http://rpmfind.net//linux/RPM/fedora/14/i386/soundfont-utils-0.4-10.fc12.i686.html
(src.rpm)

According to a quick probe the latest version of the grammar is from november 2004 but quite elaborate (58k in sfz2pat.l). Here is a sample to get a taste:

%option noyywrap
%option nounput
%option outfile = "sfz2pat.c"

nm  ([^\n]+".wav"|[^ \t\n\r]+|\"[^\"\n]+\")
ipn [A-Ga-g][#b]?([0-9]|"-1")

%s  K

%%

"//".*  ;

<K>"<group>"    {
    int i;
    leave_region();
    leave_group();
    if (!enter_group()) {
        SFZERR
        "Can't start group\n");
        return 1;
    }
    am_in_group_scope = TRUE;
    for (i = FIRST_SFZ_PARM; i < MAX_SFZ_PARM; i++) group_parm[i] = default_parm[i];
    for (i = 0; i < MAX_FLOAT_PARM; i++) group_flt_parm[i] = default_flt_parm[i];
    group_parm[REGION_IN_GROUP] = current_group;
    BEGIN(0);
}
<K>"<region>"   {
    int i;
    if (!am_in_group) {
        SFZERR
        "Can't start region outside group.\n");
        return 1;
    }
    leave_region();
    if (!enter_region()) {
        SFZERR
        "Can't start region\n");
        return 1;
    }
    am_in_group_scope = FALSE;
    for (i = 0; i < MAX_SFZ_PARM; i++) region_parm[i] = group_parm[i];
    for (i = 0; i < MAX_FLOAT_PARM; i++) region_flt_parm[i] = group_flt_parm[i];
    BEGIN(0);
}
<K>"sample="{nm} {
    int i = 7, j;
    unsigned namelen;
    if (yytext[i] == '"') {
        i++;
        for (j = i; j < yyleng && yytext[j] != '"'; j++) ;
    }
    else j = yyleng;
    namelen = (unsigned)(j - i + 1);
    sfzname = strncpy( (char *)malloc(namelen), yytext+i, (unsigned)(j-i) );
    sfzname[j-i] = '\0';
    for (i = 0; i < (int)namelen; i++) if (sfzname[i] == '\\') sfzname[i] = '/';
    SFZDBG
    "Sample name is \"%s\"", sfzname);
    SFZNL
    if (read_sample(sfzname)) {
#ifndef LOADER
        fprintf(stderr, "\n");
#endif
        return 0;
    }
    BEGIN(0);
}
[...snip...]
给妤﹃绝世温柔 2024-11-12 09:03:01

假设语言相当规则,我建议使用 ANTLR 编写一个快速解析器。对于有解析经验的人来说,它的学习曲线非常简单,并且它输出 C#(以及其他内容)。

Assuming the language is fairly regular, I'd recommend writing a quick parser using ANTLR. It's got a pretty easy learning curve for someone with parsing experience, and it outputs C# (among other things).

夏日落 2024-11-12 09:03:01

我使用了 Gardens Point LEXGardens Point Parser Generator 用于生成解析器。它们工作得很好,特别是如果您有一些 lex/yacc 知识。

IMO,这两个是 .NET 最好的解析器生成器。

一个奖励点:创建者对错误报告和建议做出快速响应,如此处所示。

I used Gardens Point LEX and Gardens Point Parser Generator for generating parsers. They work well especially if you have some lex/yacc knowledge.

IMO, these two make the best parser generator for .NET.

One bonus point: the creators respond fast to bug reports and suggestions as can be seen here.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文