更好的计数方式?

发布于 2024-08-13 14:55:02 字数 884 浏览 2 评论 0原文

在我的一个学校程序中,我使用以下函数来计算字符串中标识符的频率,以换行符和 # 分隔:

输入:

dog
cat
mouse
#
rabbit
snake
#

函数:

//assume I have the proper includes, and am using namespace std
vector< pair<string,int> > getFreqcounts(string input) {
    vector<string> items = splitString(input,"\n");
    map<string,int> counts;

    for (int i=0; i<items.size(); i++) {
        if (items[i] == "#") continue;
        counts[items[i]] = 0;
    }
    for (int i=0; i<items.size(); i++) {
        if (items[i] == "#") continue;
        counts[items[i]]++;
    }

    return vector< pair<string,int> > (counts.begin(),counts.end());
}

我想至少

  • 删除双 for 循环,
  • 找到更好的方法得到一个向量对<字符串,整数> >

有什么想法吗?

顺便说一句,这不是家庭作业。真正的作业会使用这个功能,但这纯粹是出于我自己的好奇心和渴望拥有“更好”的代码。

In one of my programs for school, I use the following function to count the frequency of identifiers in a string, separated by newlines and #:

Input:

dog
cat
mouse
#
rabbit
snake
#

Function:

//assume I have the proper includes, and am using namespace std
vector< pair<string,int> > getFreqcounts(string input) {
    vector<string> items = splitString(input,"\n");
    map<string,int> counts;

    for (int i=0; i<items.size(); i++) {
        if (items[i] == "#") continue;
        counts[items[i]] = 0;
    }
    for (int i=0; i<items.size(); i++) {
        if (items[i] == "#") continue;
        counts[items[i]]++;
    }

    return vector< pair<string,int> > (counts.begin(),counts.end());
}

I would like to at the very least

  • remove the double for loop
  • find a better way to get a vector< pair<string,int> >

Any ideas?

BTW, this is NOT homework. The real homework will use this function, but this is purely out of my own curiosity and desire to have "better" code.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

离不开的别离 2024-08-20 14:55:02

我对 std::map 的理解是,整个第一个循环可以简单地消除。第一次尝试访问不存在的节点时,map 将为您默认创建它,并将初始计数设置为零(内置类型的默认构造函数行为)。是您需要对代码进行的所有更改,并且行为应该是相同的。

更新:在您提供的代码中的旁注中,counts 将根据为 std::string 定义的 operator< 进行排序 (地图的键类型),它将按字典顺序对 map 节点进行排序。无需通过向量抽取结果并对向量进行排序 - 地图会自动为您处理此操作。

My understanding of std::map is such that the entire first loop can simply be eliminated. The first time you try to access a node that does not exist the map will default-create it for you, setting the initial count to zero (the default-ctor behavior of a builtin type.) That should be all the changes you need to make to your code, and the behavior should be the same.

Update: On a side note in the code you have provided counts will be sorted according to operator< defined for std::string (the key type for your map), which will sort the map nodes lexicographically. There's no need to pump the results through a vector and sort the vector - the map is handling this for you automatically.

甜心小果奶 2024-08-20 14:55:02

您可以通过简单地删除第一个 for 循环来摆脱它。它没有完成任何有用的事情。当/如果映射中的下标创建一个新项目时,该项目将具有所选的键,并且关联的 int 将自动初始化为零。

就我个人而言,我可能会做一些不同的事情,使用 stringstream 而不是 SplitString()。我对发布代码犹豫不决,但我想我会相信你...

typedef vector<pair<string, int> > count_vec;

count_vec GetFreqCounts(string const &input) { 
    istringstream in(input);
    string line;
    map<string, int> counts;

    while (getline(in, line))
        if (line != "#")
            ++counts[line];
    return count_vec(counts.begin(), counts.end());
}

编辑:老实说,在我写这篇文章时,我并没有过多关注效率,但我认为 Steve Jessop 对此的评论非常漂亮准确的。只要输入很小,就不会产生任何真正的差异。如果输入确实很大,那么一次仅使用一个单词的额外副本这一事实可以节省足够的内存,从而变得有意义。

史蒂夫在回复中给出的解决方案看起来也很不错。由于它还会在生成单词时对其进行处理,因此我希望它具有与上面的代码类似的特征。如果您可以比 stringstream 更快地将字符串分解为单词,那么毫无疑问会更快。考虑到妨碍 iostream 的虚拟函数的数量,出现这种情况的可能性相当大——但除非您处理大量文本,否则产生重大影响的可能性不大。当然,究竟什么才是重要的,还有待商榷。为了正确地看待它,我在手边的单词列表中运行了一些类似的代码。使用与上面非常接近的代码,它以每秒 10 兆多一点的速度处理文本。

You can get rid of the first for loop by simply deleting it. It accomplishes nothing useful. When/if the subscript into the map creates a new item, that item will have the chosen key, and your associated int will be initialized to zero automatically.

Personally, I'd probably do things a bit differently, using a stringstream instead of your SplitString(). I'm hesitant about posting code, but I guess I'll trust you...

typedef vector<pair<string, int> > count_vec;

count_vec GetFreqCounts(string const &input) { 
    istringstream in(input);
    string line;
    map<string, int> counts;

    while (getline(in, line))
        if (line != "#")
            ++counts[line];
    return count_vec(counts.begin(), counts.end());
}

Edit: I honestly didn't pay a whole lot of attention to efficiency as I was writing this, but I think Steve Jessop's comment on it is pretty accurate. As long as the input is small, it won't make any real difference. If the input is really big, the fact that this only uses an extra copy of one word at a time could save enough memory to be meaningful.

The solution Steve gave in his reply looks pretty nice too though. Since it also processes words as they're produced, I'd expect it to have characteristics similar to the code above. If you can break the string into words faster than stringstream does, it'll undoubtedly be faster. Given the number of virtual functions that get in the way with iostreams, there's a pretty good chance of that -- but unless you're dealing with a lot of text there's not much chance of it making a significant difference. Of course, exactly what qualifies as significant is open to question. To put it in perspective, I ran some similar code across a word list I had handy. Using code pretty close to what's above, it processes text at a little over 10 megabytes a second.

花落人断肠 2024-08-20 14:55:02

在对杰里回答的评论中,我提到使用函子。这就是我的意思(未经测试的代码):

struct StringCounter {
    std::map<std::string, int> counts;
    void operator()(const std::string &s) {
        ++counts[s];
    }
};

template <typename Output>
void splitString(const string &input, const string &separator, Output &out) {
    // do whatever you currently do to get each string, call it "s"...
    out(s);
    // lather, rinse, repeat
}

vector< pair<string,int> > getFreqcounts(const string &input) {
    StringCounter sc;
    splitString(input,"\n",sc);
    return vector< pair<string,int> > (sc.counts.begin(),sc.counts.end());
}

In a comment to Jerry's answer I mentioned using a functor. Here's the sort of thing I mean (untested code):

struct StringCounter {
    std::map<std::string, int> counts;
    void operator()(const std::string &s) {
        ++counts[s];
    }
};

template <typename Output>
void splitString(const string &input, const string &separator, Output &out) {
    // do whatever you currently do to get each string, call it "s"...
    out(s);
    // lather, rinse, repeat
}

vector< pair<string,int> > getFreqcounts(const string &input) {
    StringCounter sc;
    splitString(input,"\n",sc);
    return vector< pair<string,int> > (sc.counts.begin(),sc.counts.end());
}
过去的过去 2024-08-20 14:55:02

如果您想保留当前逻辑但将所有逻辑包含在一个循环中,您可以使用 find(),如下所示:

map<string,int> counts;
loop on curr ..
map<string,int>::iterator it = counts.find(curr);
if (it != counts.end())
  ++it->second;
else
  counts[curr] = 1;

但这不是我会做的。第一次不需要显式初始化映射条目,因为它无论如何都会默认为零,所以您可以使用:

map<string,int> counts;
loop on curr ..
  if (..
    ++map[curr];

我要做的另一件事是使用 std::getline() 以 '\n' 和 '#' 为分隔符。

If you want to keep the current logic but contain all of it in a single loop, you can use find(), like so:

map<string,int> counts;
loop on curr ..
map<string,int>::iterator it = counts.find(curr);
if (it != counts.end())
  ++it->second;
else
  counts[curr] = 1;

But that's not what I would do. There's no need to explicitly initialize the map entry in the first time as it will default to zero anyway, so you can use:

map<string,int> counts;
loop on curr ..
  if (..
    ++map[curr];

Another thing i would do is use std::getline() with '\n' and '#' being the delimiters.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文