如何将 string_view 拆分为多个 string_view 对象而不进行任何动态分配
下面的代码片段来自这个答案。
#include <string>
#include <vector>
void tokenize(std::string str, std::vector<string> &token_v){
size_t start = str.find_first_not_of(DELIMITER), end=start;
while (start != std::string::npos){
// Find next occurence of delimiter
end = str.find(DELIMITER, start);
// Push back the token found into vector
token_v.push_back(str.substr(start, end-start));
// Skip all occurences of the delimiter to find new start
start = str.find_first_not_of(DELIMITER, end);
}
}
现在对于这样的缓冲区:
std::array<char, 150> buffer;
我想要一个 sting_view
(指向缓冲区)并将其传递给 tokenizer 函数,并且令牌应该以 std:: 的形式返回string_view 通过输出参数(而不是向量)进行处理,并且它将返回提取的标记数量。界面如下所示:
size_t tokenize( const std::string_view inputStr,
const std::span< std::string_view > foundTokens_OUT,
const size_t expectedTokenCount )
{
// implementation
}
int main( )
{
std::array<char, 150> buffer { " @a hgs -- " };
const std::string_view sv { buffer.data( ), buffer.size( ) };
const size_t expectedTokenCount { 4 };
std::array< std::string_view, expectedTokenCount > foundTokens; // the span for storing found tokens
const size_t num_of_found_tokens { tokenize( sv, foundTokens, expectedTokenCount ) };
if ( num_of_found_tokens == expectedTokenCount )
{
// do something
std::clog << "success\n" << num_of_found_tokens << '\n';
}
for ( size_t idx { }; idx < num_of_found_tokens; ++idx )
{
std::cout << std::quoted( foundTokens[ idx ] ) << '\n';
}
}
如果有人可以实现类似的 tokenize 函数,但对于基于空格和制表符进行分割的 string_view ,我将不胜感激。我尝试自己写一个,但它没有按预期工作(不支持选项卡)。另外,如果 inputStr
中找到的令牌数量超过 expectedTokenCount
,我希望此函数停止工作并返回 expectedTokenCount + 1
。这显然效率更高。
这是我的虚拟版本:
size_t tokenize( const std::string_view inputStr,
const std::span< std::string_view > foundTokens_OUT,
const size_t expectedTokenCount )
{
if ( inputStr.empty( ) )
{
return 0;
}
size_t start { inputStr.find_first_not_of( ' ' ) };
size_t end { start };
size_t foundTokensCount { };
while ( start != std::string_view::npos && foundTokensCount < expectedTokenCount )
{
end = inputStr.find( ' ', start );
foundTokens_OUT[ foundTokensCount++ ] = inputStr.substr( start, end - start );
start = inputStr.find_first_not_of( ' ', end );
}
return foundTokensCount;
}
注意:范围库还没有适当的支持(至少在 GCC 上),所以我试图避免这种情况。
The snippet below comes from this answer.
#include <string>
#include <vector>
void tokenize(std::string str, std::vector<string> &token_v){
size_t start = str.find_first_not_of(DELIMITER), end=start;
while (start != std::string::npos){
// Find next occurence of delimiter
end = str.find(DELIMITER, start);
// Push back the token found into vector
token_v.push_back(str.substr(start, end-start));
// Skip all occurences of the delimiter to find new start
start = str.find_first_not_of(DELIMITER, end);
}
}
Now for a buffer like this:
std::array<char, 150> buffer;
I want to have a sting_view
(that points to the buffer) and pass it to the tokenizer function and the tokens should be returned in the form of std::string_view
s via an out parameter (and not a vector) and also it will return the numbers of tokens that were extracted. The interface looks like this:
size_t tokenize( const std::string_view inputStr,
const std::span< std::string_view > foundTokens_OUT,
const size_t expectedTokenCount )
{
// implementation
}
int main( )
{
std::array<char, 150> buffer { " @a hgs -- " };
const std::string_view sv { buffer.data( ), buffer.size( ) };
const size_t expectedTokenCount { 4 };
std::array< std::string_view, expectedTokenCount > foundTokens; // the span for storing found tokens
const size_t num_of_found_tokens { tokenize( sv, foundTokens, expectedTokenCount ) };
if ( num_of_found_tokens == expectedTokenCount )
{
// do something
std::clog << "success\n" << num_of_found_tokens << '\n';
}
for ( size_t idx { }; idx < num_of_found_tokens; ++idx )
{
std::cout << std::quoted( foundTokens[ idx ] ) << '\n';
}
}
I would appreciate it if someone could implement a similar tokenize function but for string_view
that splits based on space and tab characters. I tried to write one myself but it didn't work as expected (didn't support the tab). Also, I want this function to stop the work and return expectedTokenCount + 1
if the number of tokens found in inputStr
exceeds the expectedTokenCount
. This is obviously more efficient.
Here is my dummy version:
size_t tokenize( const std::string_view inputStr,
const std::span< std::string_view > foundTokens_OUT,
const size_t expectedTokenCount )
{
if ( inputStr.empty( ) )
{
return 0;
}
size_t start { inputStr.find_first_not_of( ' ' ) };
size_t end { start };
size_t foundTokensCount { };
while ( start != std::string_view::npos && foundTokensCount < expectedTokenCount )
{
end = inputStr.find( ' ', start );
foundTokens_OUT[ foundTokensCount++ ] = inputStr.substr( start, end - start );
start = inputStr.find_first_not_of( ' ', end );
}
return foundTokensCount;
}
Note: The ranges library does not have proper support yet (at least on GCC) so I'm trying to avoid that.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(2)
如果您想支持使用空格和制表符进行拆分,则可以使用
find_first_not_of
的另一个重载:它将查找第一个字符不等于
s
。因此,您的实现只需将
find_first_not_of(' ')
和find(' ')
更改为find_first_not_of(" \t")
和find_first_of(" \t")
.演示
If you want to support splitting with spaces and tabs, then you can use another overload of
find_first_not_of
:which will finds the first character equal to none of characters in string pointed to by
s
.So your implementation only needs to change
find_first_not_of(' ')
andfind(' ')
tofind_first_not_of(" \t")
andfind_first_of(" \t")
.Demo
这是我的实现(我之前写过),它可以处理诸如以一个或多个分隔符开头、有重复分隔符并以一个或多个分隔符结尾的输入之类的事情:
它对所有内容都使用 string_views,因此没有内存分配,但要小心不要过早丢弃输入字符串。 string_views 毕竟是非拥有的。
在线演示:https://onlinegdb.com/tytGlOVnk
This is my implementation (which I wrote earlier), that can handle things like inputs that start with one or more delimiters, have repeated delimiters and ends one or more delimiters :
It uses string_views for everything, so no memory allocation, but be careful you don't throw away the input strings too early. string_views are after all non-owning.
online demo : https://onlinegdb.com/tytGlOVnk