如何使我的字符缓冲区性能更高？

发布于 2024-08-06 02:02:44 字数 2331 浏览 1 评论 0原文

我必须读取大量数据：

vector<char>

第三方库多次读取这些数据。每次它都会调用我的回调函数，其签名如下：

CallbackFun ( int CBMsgFileItemID,
              unsigned long CBtag,
              void* CBuserInfo,
              int CBdataSize,
              void* CBdataBuffer,
              int CBisFirst,
              int CBisLast )
{

   ...

}

目前，我已经使用 STL 容器实现了一个缓冲区容器，其中我的方法 insert() 和 getBuff 被提供给插入一个新的缓冲区并获取存储的缓冲区。但我仍然想要性能更好的代码，以便我可以最大限度地减少分配和解除分配：

template<typename T1>
class buffContainer
{
private:
        class atomBuff
        {
        private:
            atomBuff(const atomBuff& arObj);
            atomBuff operator=(const atomBuff& arObj);
            public:
            int len;
            char *buffPtr;
            atomBuff():len(0),buffPtr(NULL)
            {}
            ~atomBuff()
            {
                if(buffPtr!=NULL)
                    delete []buffPtr;
            }
        };
public :
    buffContainer():_totalLen(0){}
void insert(const char const *aptr,const  unsigned long  &alen);
unsigned long getBuff(T1 &arOutObj);

private:
    std::vector<atomBuff*> moleculeBuff;
    int _totalLen;
};
template<typename T1>
void buffContainer< T1>::insert(const char const *aPtr,const  unsigned long  &aLen)
{
    if(aPtr==NULL,aLen<=0)
        return;
    atomBuff *obj=new atomBuff();
    obj->len=aLen;
    obj->buffPtr=new char[aLen];
    memcpy(obj->buffPtr,aPtr,aLen);
    _totalLen+=aLen;
    moleculeBuff.push_back(obj);

}
template<typename T1>
unsigned long buffContainer<T1>::getBuff(T1 &arOutObj)
{
    std::cout<<"Total Lenght of Data is: "<<_totalLen<<std::endl;
    if(_totalLen==0)
        return _totalLen;
    // Note : Logic pending for case size(T1) > T2::Value_Type
    int noOfObjRqd=_totalLen/sizeof(T1::value_type);
    arOutObj.resize(noOfObjRqd);
    char *ptr=(char*)(&arOutObj[0]);
    for(std::vector<atomBuff*>::const_iterator itr=moleculeBuff.begin();itr!=moleculeBuff.end();itr++)
    {
        memcpy(ptr,(*itr)->buffPtr,(*itr)->len);
        ptr+= (*itr)->len;
    }
    std::cout<<arOutObj.size()<<std::endl;

    return _totalLen;
}

如何使其性能更高？

原文

I have to read a lot of data into:

vector<char>

A 3rd party library reads this data in many turns. In each turn it calls my callback function whose signature is like this:

CallbackFun ( int CBMsgFileItemID,
              unsigned long CBtag,
              void* CBuserInfo,
              int CBdataSize,
              void* CBdataBuffer,
              int CBisFirst,
              int CBisLast )
{

   ...

}

Currently I have implemented a buffer container using an STL Container where my method insert() and getBuff are provided to insert a new buffer and getting stored buffer. But still I want better performing code, so that I can minimize allocations and de-allocations:

template<typename T1>
class buffContainer
{
private:
        class atomBuff
        {
        private:
            atomBuff(const atomBuff& arObj);
            atomBuff operator=(const atomBuff& arObj);
            public:
            int len;
            char *buffPtr;
            atomBuff():len(0),buffPtr(NULL)
            {}
            ~atomBuff()
            {
                if(buffPtr!=NULL)
                    delete []buffPtr;
            }
        };
public :
    buffContainer():_totalLen(0){}
void insert(const char const *aptr,const  unsigned long  &alen);
unsigned long getBuff(T1 &arOutObj);

private:
    std::vector<atomBuff*> moleculeBuff;
    int _totalLen;
};
template<typename T1>
void buffContainer< T1>::insert(const char const *aPtr,const  unsigned long  &aLen)
{
    if(aPtr==NULL,aLen<=0)
        return;
    atomBuff *obj=new atomBuff();
    obj->len=aLen;
    obj->buffPtr=new char[aLen];
    memcpy(obj->buffPtr,aPtr,aLen);
    _totalLen+=aLen;
    moleculeBuff.push_back(obj);

}
template<typename T1>
unsigned long buffContainer<T1>::getBuff(T1 &arOutObj)
{
    std::cout<<"Total Lenght of Data is: "<<_totalLen<<std::endl;
    if(_totalLen==0)
        return _totalLen;
    // Note : Logic pending for case size(T1) > T2::Value_Type
    int noOfObjRqd=_totalLen/sizeof(T1::value_type);
    arOutObj.resize(noOfObjRqd);
    char *ptr=(char*)(&arOutObj[0]);
    for(std::vector<atomBuff*>::const_iterator itr=moleculeBuff.begin();itr!=moleculeBuff.end();itr++)
    {
        memcpy(ptr,(*itr)->buffPtr,(*itr)->len);
        ptr+= (*itr)->len;
    }
    std::cout<<arOutObj.size()<<std::endl;

    return _totalLen;
}

How can I make this more performant?

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

懒的傷心 2024-08-13 02:02:45

如果我对你的回调函数的疯狂猜测是有道理的，那么你只需要一个向量：

std::vector<char> foo;
foo.reserve(MAGIC); // this is the important part. Reserve the right amount here.
                    // and you don't have any reallocs.
setup_callback_fun(CallbackFun, &foo);

CallbackFun ( int CBMsgFileItemID,
              unsigned long CBtag,
              void* CBuserInfo,
              int CBdataSize,
              void* CBdataBuffer,
              int CBisFirst,
              int CBisLast )
{
     std::vector<char>* pFoo = static_cast<std::vector<char>*>(CBuserInfo);

     char* data = static_cast<char*>CBdataBuffer;
     pFoo->insert(pFoo->end(), data, data+CBdataSize);
}

If my wild guess about your callback function makes sense, you don't need anything more than a vector:

std::vector<char> foo;
foo.reserve(MAGIC); // this is the important part. Reserve the right amount here.
                    // and you don't have any reallocs.
setup_callback_fun(CallbackFun, &foo);

CallbackFun ( int CBMsgFileItemID,
              unsigned long CBtag,
              void* CBuserInfo,
              int CBdataSize,
              void* CBdataBuffer,
              int CBisFirst,
              int CBisLast )
{
     std::vector<char>* pFoo = static_cast<std::vector<char>*>(CBuserInfo);

     char* data = static_cast<char*>CBdataBuffer;
     pFoo->insert(pFoo->end(), data, data+CBdataSize);
}

回复收藏 0 原文

绾颜 2024-08-13 02:02:45

根据您计划如何使用结果，您可以尝试将传入数据放入绳索数据结构而不是向量，特别是当您期望输入的字符串非常大时。追加到绳索上的速度非常快，但随后的逐个字符遍历速度会慢一个常数因子。这种权衡可能适合你，也可能不适合你，我不知道你需要对结果做什么。

编辑：从你的评论中我看到这是没有选择的。我认为，在一般情况下，当传入的数据大小完全是任意的时，您无法提高效率。否则，您可以尝试最初在向量中保留足够的空间，以便在平均情况下无需或最多一次重新分配即可容纳数据。

我注意到你的代码的一件事：

if(aPtr==NULL,aLen<=0)

我认为你的意思是

if(aPtr==NULL || aLen<=0)

Depending on how you plan to use the result, you might try putting the incoming data into a rope datastructure instead of vector, especially if the strings you expect to come in are very large. Appending to the rope is very fast, but subsequent char-by-char traversal is slower by a constant factor. The tradeoff might work out for you or not, I don't know what you need to do with the result.

EDIT: I see from your comment this is no option, then. I don't think you can do much more efficient in the general case when the size of the data coming in is totally arbitrary. Otherwise you could try to initially reserve enough space in the vector so that the data will fit without or at most one reallocation in the average case or so.

One thing I noticed about your code:

if(aPtr==NULL,aLen<=0)

I think you mean

if(aPtr==NULL || aLen<=0)

回复收藏 0 原文

伤痕我心 2024-08-13 02:02:45

您可以做的主要事情是避免进行大量数据复制。现在，当调用 insert() 时，您正在将数据复制到缓冲区中。然后，当调用 getbuff() 时，您将数据复制到他们（希望）指定的缓冲区中。因此，为了从外部获取数据，您需要将每个字节复制两次。

这部分：

arOutObj.resize(noOfObjRqd);
char *ptr=(char*)(&arOutObj[0]);

似乎假设 arOutObj 实际上是一个向量。如果是这样，将 getbuff 重写为采用（引用）向量的普通函数而不是仅适用于一种类型的参数的模板会好得多。

从那时起，完全消除数据的一个副本就变得相当简单。在 insert() 中，不是手动分配内存并跟踪大小，而是将数据直接放入向量中。然后，当调用 getbuff() 时，无需将数据复制到缓冲区中，只需提供对现有向量的引用即可。

class buffContainer {
    std::vector<char> moleculeBuff;
public:
    void insert(char const *p, unsigned long len) { 
Edit: Here you really want to add:
        moleculeBuff.reserve(moleculeBuff.size()+len);
End of edit.
        std::copy(p, p+len, std::back_inserter(moleculeBuff));
    }

    void getbuff(vector<char> &output) { 
        output = moleculeBuff;
    }
};

请注意，我已将 getbuff 的结果更改为 void - 因为您给它们一个向量，所以它的大小是已知的，并且返回大小没有意义。实际上，您可能想要稍微更改签名，以仅返回缓冲区：

vector<char> getbuff() { 
    vector<char> temp;
    temp.swap(moleculeBuff);
    return temp;
}

由于它按值返回（可能很大）向量，因此这在很大程度上取决于您的编译器实现命名返回值优化（NRVO），但是 1 ）最坏的情况是它会做你之前所做的事情，并且 2）几乎所有当前合理的编译器都实现了 NRVO。

这还解决了原始代码没有（似乎）的另一个细节。事实上，getbuff返回一些数据，但是如果你再次调用它，它（显然不跟踪已经返回的数据，所以它会再次返回所有数据。它不断分配数据，但从不删除任何数据）这就是交换的目的：它创建一个空向量，然后将其与 buffContainer 维护的向量进行交换，因此 buffContainer 现在有一个空向量，而填充的向量将移交给名为 getbuff() 的函数。

另一种方法是进一步进行交换：基本上，您有两个缓冲区：

一个由 buffContainer 拥有，
一个由调用 getbuffer() 的任何对象拥有。

在正常情况下，我们可能期望缓冲区大小将快速达到某个最大尺寸。从那时起，我们真的很想简单地回收该空间：将一些数据读入一个空间，将其传递以进行处理，然后在发生这种情况时，将数据读入另一个空间。

事实上，这也很容易做到。将 getbuff() 更改为如下所示：

void getbuff(vector<char> &output) {
    swap(moleculeBuff, output);
    moleculeBuff.clear();
}

这应该会大大提高速度 - 而不是来回复制数据，它只是将一个向量的指向数据的指针与其他向量交换（以及一些其他细节，例如当前分配大小，以及向量的使用大小）。清除通常非常快——对于向量（或任何没有 dtor 的类型），它只会将向量中的项目数量设置为零（如果项目有 dtor，则必须当然，摧毁它们）。从那里开始，下次调用 insert() 时，新数据将被复制到向量已经拥有的内存中（直到/除非它需要比向量分配的空间更多的空间）。

The main thing you can do is avoid doing quite so much copying of the data. Right now, when insert() is called, you're copying the data into your buffer. Then, when getbuff() is called, you're copying the data out to a buffer they've (hopefully) specified. So, to get data from outside to them, you're copying each byte twice.

This part:

arOutObj.resize(noOfObjRqd);
char *ptr=(char*)(&arOutObj[0]);

Seems to assume that arOutObj is really a vector. If so, it would be a whole lot better to rewrite getbuff as a normal function taking a (reference to a) vector instead of being a template that really only works for one type of parameter.

From there, it becomes a fairly simple matter to completely eliminate one copy of the data. In insert(), instead of manually allocating memory and tracking the size, put the data directly into a vector. Then, when getbuff() is called, instead of copying the data into their buffer, just give then a reference to your existing vector.

class buffContainer {
    std::vector<char> moleculeBuff;
public:
    void insert(char const *p, unsigned long len) { 
Edit: Here you really want to add:
        moleculeBuff.reserve(moleculeBuff.size()+len);
End of edit.
        std::copy(p, p+len, std::back_inserter(moleculeBuff));
    }

    void getbuff(vector<char> &output) { 
        output = moleculeBuff;
    }
};

Note that I've changed the result of getbuff to void -- since you're giving them a vector, its size is known, and there's no point in returning the size. In reality, you might want to actually change the signature a bit, to just return the buffer:

vector<char> getbuff() { 
    vector<char> temp;
    temp.swap(moleculeBuff);
    return temp;
}

Since it's returning a (potentially large) vector by value, this depends heavily on your compiler implementing the named return value optimization (NRVO), but 1) the worst case is that it does about what you were doing before anyway, and 2) virtually all reasonably current compilers DO implement NRVO.

This also addresses one other detail your original code didn't (seem to). As it was, getbuff returns some data, but if you call it again, it (apparently doesn't keep track of what data has already been returned, so it returns it all again. It keeps allocating data, but never deletes any of it. That's what the swap is for: it creates an empty vector, and then swaps that with the one that's being maintained by buffContainer, so buffContainer now has an empty vector, and the filled one is handed over to whatever called getbuff().

Another way to do things would be to take the swap a step further: basically, you have two buffers:

one owned by buffContainer
one owned by whatever calls getbuffer()

In the normal course of things, we can probably expect that the buffer sizes will quickly reach some maximum size. From there on, we'd really like to simply re-cycle that space: read some data into one, pass it to be processed, and while that's happening, read data into the other.

As it happens, that's pretty easy to do too. Change getbuff() to look something like this:

void getbuff(vector<char> &output) {
    swap(moleculeBuff, output);
    moleculeBuff.clear();
}

This should improve speed quite a bit -- instead of copying data back and forth, it just swaps one vector's pointer to the data with the others (along with a couple other details like the current allocation size, and used size of the vector). The clear is normally really fast -- for a vector (or any type without a dtor) it'll just set the number of items in the vector to zero (if the items have dtors, it has to destroy them, of course). From there, the next time insert() is called, the new data will just be copied into the memory the vector already owns (until/unless it needs more space than the vector had allocated).

回复收藏 0 原文

~没有更多了~