基于格式字符串的智能变量扩展

发布于 2024-07-16 13:39:32 字数 1314 浏览 9 评论 0原文

我有一个守护进程，它读取配置文件以便知道在哪里写入内容。在配置文件中，存在这样的一行：

output = /tmp/foo/%d/%s/output

或者，它可能看起来像这样：

output = /tmp/foo/%s/output/%d

... 或简单地像这样：

output = /tmp/foo/%s/output

... 或最后：

output = /tmp/output

我在程序中将该行作为 cfg->pathfmt 。我现在想做的是想出一些巧妙的使用方法。

多解释一下，路径最多可以包含两个要格式化的组件。 %d 将扩展为作业 ID（整数），%s 将扩展为作业名称（字符串）。用户可能希望在配置文件中使用其中之一、两者或都不使用。在我最终将其传递给 snprintf() 之前，我需要知道他们想要什么以及按什么顺序。我可以缩小范围，但我一直想与 strtok() 交谈，这看起来很难看。

我想为用户提供这种灵活性，但是我在寻找一种合理的、可移植的方式来实现它时迷失了方向。我也完全不知道如何开始寻找这个。

我会非常高兴，如果：

有人可以帮助我缩小搜索短语范围，找到好的例子
有人可以发布一些实现此功能的 OSS 项目的链接
有人可以发布一些伪代码

我不希望为我编写代码，我我只是真的坚持（我认为）应该是非常简单的事情，并且需要一些帮助来咬第一口。我真的觉得我想太多并且忽视了显而易见的事情。

最终结果应该是一个像这样的布尔函数：

bool output_sugar(const char *fmt, int jobid, const char *jobname, struct job *j);

然后它会在 j->outpath 上调用 snprintf() （明智地），如果存在某种垃圾（即 % 后跟不是 s、d 或 % 的东西），则返回 false配置行（或其空）。健全性检查很简单，我只是花了一些时间来获取参数的数量（和顺序）以使其格式正确。

提前致谢。另外，如果您有这样做的声誉，请随意编辑此标题，正如我所说，我不太确定如何在一行中提出问题。我认为我需要的是一个解析器，但使用完整的词法分析器感觉很尴尬/解析器来处理一个简单的字符串。

原文

I have a daemon that reads a configuration file in order to know where to write something. In the configuration file, a line like this exists:

output = /tmp/foo/%d/%s/output

Or, it may look like this:

output = /tmp/foo/%s/output/%d

... or simply like this:

output = /tmp/foo/%s/output

... or finally:

output = /tmp/output

I have that line as cfg->pathfmt within my program. What I am trying to do now is to come up with some clever way of using it.

A little more explanation, the path can contain up to two components to be formatted. %d will be expanded as a job ID (int), %s as a job name (string). The user may want to use one, both or none in the configuration file. I need to know what they want and in what order before I finally pass it to snprintf(). I can kind of narrow it down, but I keep wanting to talk to strtok() and that seems ugly.

I want to give users this kind of flexibility, however I'm getting lost looking for a sensible, portable way to implement it. I'm also at a complete and total loss for how to begin searching for this.

I'd be very happy if:

Someone could help me narrow down the search phrase to find good examples
Someone could post a link to some OSS project implementing this
Someone could post some psuedo code

I don't want the code written for me, I'm just really stuck on what (I think) should be something very simple and need some help taking the first bite. I really feel like I'm over thinking and overlooking the obvious.

The end result should be a boolean function like this:

bool output_sugar(const char *fmt, int jobid, const char *jobname, struct job *j);

It would then call snprintf() (sensibly) on j->outpath, returning false if some kind of garbage (i.e. % followed by something not s, d or %) is in the config line (or its null). The sanity checks are easy, I'm just having a bit of a time getting the number (and order) of arguments to format correct.

Thanks in advance. Also, feel free to edit this title if you have the reputation to do so, as I said, I'm not quite sure how to ask the question in a single line. I think what I need is a parser, but it feels awkward using a full blown lexer / parser to handle one simple string.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

简单爱 2024-07-23 13:39:32

是的，您需要某种解析器。不过，它不必很复杂：

void format_filename(const char *fmt, int jobid, const char *jobname,
                     char *buffer, size_t buflen)
{
    char *end = buffer + buflen - 1;
    const char *src = fmt;
    char *dst = buffer;
    char c;
    assert(buffer != 0 && fmt != 0 && buflen != 0 && jobname != 0);
    while ((c = *src++) != '\0')
    {
        if (dst >= end)
            err_exit("buffer overflow in %s(): format = %s\n",
                     __func__, fmt);
        else if (c != '%')
            *dst++ = c;
        else if ((c = *src++) == '\0' || c == '%')
        {
            *dst++ = '%';
            if (c == '\0')
                break;
        }
        else if (c == 's')
        {
            size_t len = strlen(jobname);
            if (len > end - dst)
                err_exit("buffer overflow on jobname in %s(): format = %s\n",
                         __func__, fmt);
            else
            {
                strcpy(dst, jobname);
                dst += len;
            }
        }
        else if (c == 'd')
        {
             int nchars = snprintf(dst, end - dst, "%d", jobid);
             if (nchars < 0 || nchars >= end - dst)
                 err_exit("format error on jobid in %s(); format = %s\n",
                          __func__, fmt);
             dst += nchars;
        }
        else
            err_exit("invalid format character %d in %s(): format = %s\n",
                     c, __func__, fmt);
    }
    *dst = '\0';
}

现在已经测试过代码。请注意，它支持“%%”表示法，以允许用户在输出中嵌入单个“%”。此外，它将字符串末尾的单个“%”视为有效且等同于“%%”。出错时调用 err_exit() ；您可以选择适合您的系统的替代错误策略。我只是假设您已包含、和以及标头对于 err_exit() （可变参数）函数。

测试代码...

#include <stdio.h>
#include <string.h>
#include <stdarg.h>
#include <assert.h>

static void err_exit(const char *fmt, ...)
{
    va_list args;
    va_start(args, fmt);
    vfprintf(stderr, fmt, args);
    va_end(args);
    exit(1);
}

...然后如上所述 format_filename() ，然后...

#define DIM(x) (sizeof(x)/sizeof(*(x)))

static const char *format[] =
{
    "/tmp/%d/name/%s",
    "/tmp/%s/number/%d",
    "/tmp/%s.%d%%",
    "/tmp/%",
};

int main(void)
{
    char buffer[64];
    size_t i;

    for (i = 0; i < DIM(format); i++)
    {
        format_filename(format[i], 1234, "job-name", buffer, sizeof(buffer));
        printf("fmt = %-20s; name = %s\n", format[i], buffer);
    }

    return(0);
}

Yes, you need a parser of some sort. It need not be complex, though:

void format_filename(const char *fmt, int jobid, const char *jobname,
                     char *buffer, size_t buflen)
{
    char *end = buffer + buflen - 1;
    const char *src = fmt;
    char *dst = buffer;
    char c;
    assert(buffer != 0 && fmt != 0 && buflen != 0 && jobname != 0);
    while ((c = *src++) != '\0')
    {
        if (dst >= end)
            err_exit("buffer overflow in %s(): format = %s\n",
                     __func__, fmt);
        else if (c != '%')
            *dst++ = c;
        else if ((c = *src++) == '\0' || c == '%')
        {
            *dst++ = '%';
            if (c == '\0')
                break;
        }
        else if (c == 's')
        {
            size_t len = strlen(jobname);
            if (len > end - dst)
                err_exit("buffer overflow on jobname in %s(): format = %s\n",
                         __func__, fmt);
            else
            {
                strcpy(dst, jobname);
                dst += len;
            }
        }
        else if (c == 'd')
        {
             int nchars = snprintf(dst, end - dst, "%d", jobid);
             if (nchars < 0 || nchars >= end - dst)
                 err_exit("format error on jobid in %s(); format = %s\n",
                          __func__, fmt);
             dst += nchars;
        }
        else
            err_exit("invalid format character %d in %s(): format = %s\n",
                     c, __func__, fmt);
    }
    *dst = '\0';
}

Now tested code. Note that it supports the '%%' notation to allow the user to embed a single '%' in the output. Also, it treats a single '%' at the end of the string as valid and equivalent to '%%'. It calls err_exit() on error; you can choose alternative error strategies as suits your system. I simply assume you have included <assert.h>, <stdio.h> and <string.h> and the header for the err_exit() (variadic) function.

Test code...

#include <stdio.h>
#include <string.h>
#include <stdarg.h>
#include <assert.h>

static void err_exit(const char *fmt, ...)
{
    va_list args;
    va_start(args, fmt);
    vfprintf(stderr, fmt, args);
    va_end(args);
    exit(1);
}

... then format_filename() as above, then ...

#define DIM(x) (sizeof(x)/sizeof(*(x)))

static const char *format[] =
{
    "/tmp/%d/name/%s",
    "/tmp/%s/number/%d",
    "/tmp/%s.%d%%",
    "/tmp/%",
};

int main(void)
{
    char buffer[64];
    size_t i;

    for (i = 0; i < DIM(format); i++)
    {
        format_filename(format[i], 1234, "job-name", buffer, sizeof(buffer));
        printf("fmt = %-20s; name = %s\n", format[i], buffer);
    }

    return(0);
}

回复收藏 0 原文

红衣飘飘貌似仙 2024-07-23 13:39:32

使用 strtok 很容易出错。您可以使用 (fl)lex 和 yacc 将变量视为迷你语言。这里有一个简单的教程

%{
#include <stdio.h>
%}

%%
%d                      printf("%04d",jobid);
%s                      printf("%s",stripspaces(dirname));
%%

我制作了一个 ODBC 包装器，它将让你做类似 dbprintf("插入 blah 值 %s %D %T %Y"，这里的东西...); 但那是很多年前的事了，我咬了一下并使用 strtok 解析了格式字符串。

Using strtok is a error prone. You can treat your variables as a mini language using (fl)lex and yacc. There is simple tutorial here

%{
#include <stdio.h>
%}

%%
%d                      printf("%04d",jobid);
%s                      printf("%s",stripspaces(dirname));
%%

I made an ODBC wrapper that would let you do stuff like dbprintf("insert into blah values %s %D %T %Y", stuff here...); But it was many years ago and I bit it and parsed the format string using strtok.

回复收藏 0 原文

前事休说 2024-07-23 13:39:32

如果选项数量很少，并且您不想要/不需要解析器的额外灵活性和复杂性，您可以简单地使用 strstr() 搜索每个潜在的替换子字符串。

如果只有这两个选项，则可以创建一个四分支的 if/else 结构（只有 A，只有 B，A 都在 B 之前，B 都在 A 之前），其中以正确的顺序调用 sprintf()论据。否则，进行多次 sprintf() 调用，每个调用仅替换格式字符串中的第一个替换标记。（这意味着建立一个需要替换的列表并按外观顺序对它们进行排序......）

回复收藏 0 原文

~没有更多了~