C++ 中的命名参数字符串格式

发布于 2024-09-18 22:27:01 字数 651 浏览 5 评论 0原文

我想知道是否有像 Boost Format 这样的库，但它支持命名参数而不是位置参数。这是 Python 中的常见习惯用法，您有一个上下文来格式化字符串，该字符串可能会或可能不会使用所有可用参数，例如

mouse_state = {}
mouse_state['button'] = 0
mouse_state['x'] = 50
mouse_state['y'] = 30

#...

"You clicked %(button)s at %(x)d,%(y)d." % mouse_state
"Targeting %(x)d, %(y)d." % mouse_state

是否有任何库提供最后两行的功能？我希望它能提供类似的 API：

PrintFMap(string format, map<string, string> args);

在 Google 搜索中，我发现许多库提供位置参数的变体，但没有一个支持命名参数。理想情况下，该库几乎没有依赖项，因此我可以轻松地将其放入我的代码中。 C++ 不会那么惯用地收集命名参数，但可能有人比我想得更多。

性能很重要，特别是我想减少内存分配（在 C++ 中总是很棘手），因为这可能在没有虚拟内存的设备上运行。但即使是从一个缓慢的开始，也可能比我自己从头开始编写它要快。

原文

I'm wondering if there is a library like Boost Format, but which supports named parameters rather than positional ones. This is a common idiom in e.g. Python, where you have a context to format strings with that may or may not use all available arguments, e.g.

mouse_state = {}
mouse_state['button'] = 0
mouse_state['x'] = 50
mouse_state['y'] = 30

#...

"You clicked %(button)s at %(x)d,%(y)d." % mouse_state
"Targeting %(x)d, %(y)d." % mouse_state

Are there any libraries that offer the functionality of those last two lines? I would expect it to offer a API something like:

PrintFMap(string format, map<string, string> args);

In Googling I have found many libraries offering variations of positional parameters, but none that support named ones. Ideally the library has few dependencies so I can drop it easily into my code. C++ won't be quite as idiomatic for collecting named arguments, but probably someone out there has thought more about it than me.

Performance is important, in particular I'd like to keep memory allocations down (always tricky in C++), since this may be run on devices without virtual memory. But having even a slow one to start from will probably be faster than writing it from scratch myself.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

別甾虛僞 2024-09-25 22:27:02

鉴于 Python 本身是用 C 编写的，并且格式化是一种常用的功能，您也许能够（忽略复制写入问题）从 python 解释器中提取相关代码并将其移植为使用 STL 映射而不是 Python 的本机字典。

回复收藏 0 原文

沫尐诺 2024-09-25 22:27:02

我为此目的编写了一个库，请在 GitHub 上查看。

欢迎贡献。

回复收藏 0 原文

遮了一弯 2024-09-25 22:27:01

{fmt} 库支持命名参数：

print("You clicked {button} at {x},{y}.",
      arg("button", "b1"), arg("x", 50), arg("y", 30));

作为语法糖，您甚至可以（ab）使用 user -定义的文字来传递参数：

print("You clicked {button} at {x},{y}.",
      "button"_a="b1", "x"_a=50, "y"_a=30);

为了简洁起见，上面的示例中省略了命名空间fmt。

免责声明：我是这个库的作者。

The {fmt} library supports named arguments:

print("You clicked {button} at {x},{y}.",
      arg("button", "b1"), arg("x", 50), arg("y", 30));

And as a syntactic sugar you can even (ab)use user-defined literals to pass arguments:

print("You clicked {button} at {x},{y}.",
      "button"_a="b1", "x"_a=50, "y"_a=30);

For brevity the namespace fmt is omitted in the above examples.

Disclaimer: I'm the author of this library.

回复收藏 0 原文

橙味迷妹 2024-09-25 22:27:01

我一直对 C++ I/O（尤其是格式化）持批评态度，因为在我看来，它相对于 C 来说是一个倒退。格式需要是动态的，并且非常有意义，例如加载它们来自外部资源作为文件或参数。

然而，我以前从未尝试过实际实施替代方案，而你的问题让我尝试在这个想法上投入一些周末时间。

当然，问题比我想象的更复杂（例如，仅整数格式化例程就有 200 多行），但我认为这种方法（动态格式字符串）更有用。

您可以从此链接下载我的实验（它只是一个 .h 文件）和测试程序此链接（测试可能不是正确的术语，我用它只是为了看看我是否能够来编译）。

以下是一个示例，

#include "format.h"
#include <iostream>

using format::FormatString;
using format::FormatDict;

int main()
{
    std::cout << FormatString("The answer is %{x}") % FormatDict()("x", 42);
    return 0;
}

它与 boost.format 方法不同，因为使用命名参数并且因为
格式字符串和格式字典应该单独构建（并且对于
传递的例子）。另外我认为格式选项应该是
字符串（如 printf），而不是在代码中。

FormatDict 使用一个技巧来保持语法合理：

FormatDict fd;
fd("x", 12)
  ("y", 3.141592654)
  ("z", "A string");

FormatString 只是从 const std::string& 解析（我决定准备格式字符串，但较慢但可能可接受的方法是仅传递字符串并每次重新解析它）。

通过专门化转换函数模板，可以将格式扩展为用户定义的类型；例如

struct P2d
{
    int x, y;
    P2d(int x, int y)
        : x(x), y(y)
    {
    }
};

namespace format {
    template<>
    std::string toString<P2d>(const P2d& p, const std::string& parms)
    {
        return FormatString("P2d(%{x}; %{y})") % FormatDict()
            ("x", p.x)
            ("y", p.y);
    }
}

，之后可以将 P2d 实例简单地放置在格式化字典中。

此外，还可以通过将参数放置在 % 和 { 之间来将参数传递给格式化函数。

目前，我只实现了一个整数格式化专门化，支持

左/右/中心对齐的固定大小
自定义填充字符
通用基数（2-36），小写或大写
数字分隔符（具有自定义字符和计数）
溢出字符
符号显示

I'我们还为常见情况添加了一些快捷方式，例如

"%08x{hexdata}"

使用 8 位数字填充“0”的十六进制数字。

"%026/2,8:{bindata}"

是一个 24 位二进制数（按 "/2" 要求），每 8 位带有数字分隔符 ":"（按 ",8: 要求） “）。

请注意，代码只是一个想法，例如，现在我只是在允许存储格式字符串和字典可能是合理的情况下阻止了复制（对于字典来说，提供避免复制对象的能力非常重要，因为它需要添加到 FormatDict 中，虽然在我看来这是可能的，但它也会引发有关生命周期的重要问题）。

更新

我对初始方法做了一些更改：

现在可以复制格式字符串
自定义类型的格式化是使用模板类而不是函数完成的（这允许部分专业化）
我添加了序列的格式化程序（两个迭代器）。语法仍然很粗糙。

我已经为其创建了一个 github 项目，并具有 boost 许可。

I've always been critic with C++ I/O (especially formatting) because in my opinion is a step backward in respect to C. Formats needs to be dynamic, and makes perfect sense for example to load them from an external resource as a file or a parameter.

I've never tried before however to actually implement an alternative and your question made me making an attempt investing some weekend hours on this idea.

Sure the problem was more complex than I thought (for example just the integer formatting routine is 200+ lines), but I think that this approach (dynamic format strings) is more usable.

You can download my experiment from this link (it's just a .h file) and a test program from this link (test is probably not the correct term, I used it just to see if I was able to compile).

The following is an example

#include "format.h"
#include <iostream>

using format::FormatString;
using format::FormatDict;

int main()
{
    std::cout << FormatString("The answer is %{x}") % FormatDict()("x", 42);
    return 0;
}

It is different from boost.format approach because uses named parameters and because
the format string and format dictionary are meant to be built separately (and for
example passed around). Also I think that formatting options should be part of the
string (like printf) and not in the code.

FormatDict uses a trick for keeping the syntax reasonable:

FormatDict fd;
fd("x", 12)
  ("y", 3.141592654)
  ("z", "A string");

FormatString is instead just parsed from a const std::string& (I decided to preparse format strings but a slower but probably acceptable approach would be just passing the string and reparsing it each time).

The formatting can be extended for user defined types by specializing a conversion function template; for example

struct P2d
{
    int x, y;
    P2d(int x, int y)
        : x(x), y(y)
    {
    }
};

namespace format {
    template<>
    std::string toString<P2d>(const P2d& p, const std::string& parms)
    {
        return FormatString("P2d(%{x}; %{y})") % FormatDict()
            ("x", p.x)
            ("y", p.y);
    }
}

after that a P2d instance can be simply placed in a formatting dictionary.

Also it's possible to pass parameters to a formatting function by placing them between % and {.

For now I only implemented an integer formatting specialization that supports

Fixed size with left/right/center alignment
Custom filling char
Generic base (2-36), lower or uppercase
Digit separator (with both custom char and count)
Overflow char
Sign display

I've also added some shortcuts for common cases, for example

"%08x{hexdata}"

is an hex number with 8 digits padded with '0's.

"%026/2,8:{bindata}"

is a 24-bit binary number (as required by "/2") with digit separator ":" every 8 bits (as required by ",8:").

Note that the code is just an idea, and for example for now I just prevented copies when probably it's reasonable to allow storing both format strings and dictionaries (for dictionaries it's however important to give the ability to avoid copying an object just because it needs to be added to a FormatDict, and while IMO this is possible it's also something that raises non-trivial problems about lifetimes).

UPDATE

I've made a few changes to the initial approach:

Format strings can now be copied
Formatting for custom types is done using template classes instead of functions (this allows partial specialization)
I've added a formatter for sequences (two iterators). Syntax is still crude.

I've created a github project for it, with boost licensing.

回复收藏 0 原文

嘴硬脾气大 2024-09-25 22:27:01

答案似乎是，不，没有一个 C++ 库可以做到这一点，而且根据我收到的评论，C++ 程序员显然甚至没有看到需要一个库。我将不得不再次写自己的。

回复收藏 0 原文

没有伤那来痛 2024-09-25 22:27:01

好吧，我也会添加我自己的答案，不是我知道（或已经编码）这样的库，而是回答“保持内存分配较低”的问题。

一如既往，我可以设想某种速度/内存的权衡。

一方面，您可以解析“Just In Time”：

class Formater:
  def __init__(self, format): self._string = format

  def compute(self):
    for k,v in context:
      while self.__contains(k):
        left, variable, right = self.__extract(k)
        self._string = left + self.__replace(variable, v) + right

这样您就不必保留“已解析”的结构，并且希望大多数时候您只需将新数据插入到位（与 Python、C++ 不同）字符串不是不可变的）。

然而，它远非高效......

另一方面，您可以构建一个完全构造的树来表示解析的格式。您将有几个类，例如：Constant、String、Integer、Real等...可能还有一些子类/装饰器以及格式化本身。

然而，我认为最有效的方法是将两者结合起来。

将格式字符串分解为 Constant 列表，Variable
索引另一个结构中的变量（具有开放寻址的哈希表会很好，或者类似于 Loki::AssocVector）。

就这样：您已经完成了仅 2 个动态分配的数组（基本上）。如果您想允许同一键重复多次，只需使用 std::vector 作为索引值：好的实现不应为小尺寸向量动态分配任何内存（VC++ 2010 不适用于少于 16 字节的数据）。

在评估上下文本身时，查找实例。然后，您“及时”解析格式化程序，检查它是否用于替换它的值的当前类型，并处理格式。

优点和缺点：
- 及时：您一次又一次地扫描字符串
- 一次解析：需要大量专用类，可能需要许多分配，但格式会在输入时进行验证。与 Boost 一样，它可以被重复使用。
- 混合：更高效，特别是如果您不替换某些值（允许某种“空”值），但延迟格式的解析会延迟错误的报告。

就我个人而言，我会选择 One Parse 方案，尝试使用 boost::variant 和策略模式尽可能地减少分配。

Well I'll add my own answer as well, not that I know (or have coded) such a library, but to answer to the "keep the memory allocation down" bit.

As always I can envision some kind of speed / memory trade-off.

On the one hand, you can parse "Just In Time":

class Formater:
  def __init__(self, format): self._string = format

  def compute(self):
    for k,v in context:
      while self.__contains(k):
        left, variable, right = self.__extract(k)
        self._string = left + self.__replace(variable, v) + right

This way you don't keep a "parsed" structure at hand, and hopefully most of the time you'll just insert the new data in place (unlike Python, C++ strings are not immutable).

However it's far from being efficient...

On the other hand, you can build a fully constructed tree representing the parsed format. You will have several classes like: Constant, String, Integer, Real, etc... and probably some subclasses / decorators as well for the formatting itself.

I think however than the most efficient approach would be to have some kind of a mix of the two.

explode the format string into a list of Constant, Variable
index the variables in another structure (a hash table with open-addressing would do nicely, or something akin to Loki::AssocVector).

There you are: you're done with only 2 dynamically allocated arrays (basically). If you want to allow a same key to be repeated multiple times, simply use a std::vector<size_t> as a value of the index: good implementations should not allocate any memory dynamically for small sized vectors (VC++ 2010 doesn't for less than 16 bytes worth of data).

When evaluating the context itself, look up the instances. You then parse the formatter "just in time", check it agaisnt the current type of the value with which to replace it, and process the format.

Pros and cons:
- Just In Time: you scan the string again and again
- One Parse: requires a lot of dedicated classes, possibly many allocations, but the format is validated on input. Like Boost it may be reused.
- Mix: more efficient, especially if you don't replace some values (allow some kind of "null" value), but delaying the parsing of the format delays the reporting of errors.

Personally I would go for the One Parse scheme, trying to keep the allocations down using boost::variant and the Strategy Pattern as much I could.

回复收藏 0 原文

~没有更多了~