变体类型没有内存泄漏的野牛令牌

发布于 2025-01-26 02:55:49 字数 2563 浏览 3 评论 0原文

什么是最佳的非内存渗透变体类型用于野牛中的文本(例如%token< std :: string>)?

我想用更现代的类型替换char *作为代币的变体类型(主要是为了避免内存泄漏)。我尝试了三种类型,但仍然char *是迄今为止最快的:

char *                          2.35 (some metric)
std::string                     2.72 (some metric)
std::shared_ptr<std::string>    2.88 (some metric)

当我用char * std :: String char *时,性能下降了。我知道这是因为Bison的内部堆栈和默认操作{$$ = $ 1}在每个语句中都进行了字符串的冗余副本,如std :: String是设计的要创建一个新副本并切勿共享(而不是新的std :: string_view)。我决定尝试包装std :: String进入轻型复制操作类,例如std :: shardy_ptr,但令我惊讶的是,std :: shared_ptr&lt&lt ; std :: string&gt;导致甚至更低的性能!现在我完全迷失了。

我把我用过的文件留在这里。目前,我忽略了这三个类的施工时间,但与其他两个相比,只能使char *更快。

像这样的编译后,我运行了一个大文件和平均时间系统执行时间:

flex -o parser.cpp parser.l
bison -v -d --output=grammar.cpp grammar.y
g++ parser.cpp grammar.cpp

sharedText.hpp

#pragma once

// choose one option of "using SharedText = ..."

// option (3)
#include <memory>
#include <string>
using SharedText = std::shared_ptr<std::string>;

// option (2)
#include <string>
using SharedText = std::string;

// option (1)
using SharedText = char*;

yylex.hpp

# define YY_DECL        yy::grammar::symbol_type yylex()
YY_DECL;

grammar.y

%{
#include "sharedText.hpp"
%}

%require "3.8.2"
%language "c++"
%define api.parser.class {grammar}
%define api.value.type  variant
%define api.token.constructor

%code
{
#include "yylex.hpp"
}

%token <SharedText> SYMBOL NUMBER

%start start

%%
start
    : symbol_or_number
    | symbol_or_number start
    ;

symbol_or_number
    : SYMBOL
    | NUMBER
    ;

%%

namespace yy
{
void grammar::error(const std::string& description)
{
}
}

extern FILE *yyin;

int main(int argc, char *argv[])
{
    if (argc != 2)
    {
        return 1;
    }

    yy::grammar parse;
    yyin = fopen(argv[1], "r");
    parse();

    return 0;
}

parser.l parser.l

%{
#include "sharedText.hpp"
#include "grammar.hpp"
#include "yylex.hpp"
%}

%option noyywrap

%%

<INITIAL>
{
[A-Za-z]+       { return yy::grammar::token::yytokentype::SYMBOL; }
[0-9]+          { return yy::grammar::token::yytokentype::NUMBER; }

"\n"            { ++yylineno; }
[ ]+            ;
.               ;
}
%%

What is the best non-memory-leaking variant type to use for texts in Bison (for example %token <std::string>)?

I wanted to replace char * as the variant type for tokens with a more modern type (mainly to avoid memory leaks). I have tried with three types but still char * is by far the fastest:

char *                          2.35 (some metric)
std::string                     2.72 (some metric)
std::shared_ptr<std::string>    2.88 (some metric)

When I replaced char * with std::string, performance dropped. I knew it was because Bison's internal stack and the default operation {$$ = $1} in each statement were making redundant copies of the string, as std::string was designed to create a new copy and never share it (as opposed to the new std::string_view). I decided to try wrapping std::string into a light-weight copy operation class such as std::shared_ptr, but to my surprise, std::shared_ptr<std::string> resulted in even lower performance! Now I am completely lost.

I leave down here the files I have used. For the moment I neglected construction time of those three classes, but will only make char * faster compared to the other two.

After compiling them like this I ran it passing a big file and averaging time system execution time:

flex -o parser.cpp parser.l
bison -v -d --output=grammar.cpp grammar.y
g++ parser.cpp grammar.cpp

sharedText.hpp

#pragma once

// choose one option of "using SharedText = ..."

// option (3)
#include <memory>
#include <string>
using SharedText = std::shared_ptr<std::string>;

// option (2)
#include <string>
using SharedText = std::string;

// option (1)
using SharedText = char*;

yylex.hpp

# define YY_DECL        yy::grammar::symbol_type yylex()
YY_DECL;

grammar.y

%{
#include "sharedText.hpp"
%}

%require "3.8.2"
%language "c++"
%define api.parser.class {grammar}
%define api.value.type  variant
%define api.token.constructor

%code
{
#include "yylex.hpp"
}

%token <SharedText> SYMBOL NUMBER

%start start

%%
start
    : symbol_or_number
    | symbol_or_number start
    ;

symbol_or_number
    : SYMBOL
    | NUMBER
    ;

%%

namespace yy
{
void grammar::error(const std::string& description)
{
}
}

extern FILE *yyin;

int main(int argc, char *argv[])
{
    if (argc != 2)
    {
        return 1;
    }

    yy::grammar parse;
    yyin = fopen(argv[1], "r");
    parse();

    return 0;
}

parser.l

%{
#include "sharedText.hpp"
#include "grammar.hpp"
#include "yylex.hpp"
%}

%option noyywrap

%%

<INITIAL>
{
[A-Za-z]+       { return yy::grammar::token::yytokentype::SYMBOL; }
[0-9]+          { return yy::grammar::token::yytokentype::NUMBER; }

"\n"            { ++yylineno; }
[ ]+            ;
.               ;
}
%%

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。
列表为空,暂无数据
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文