可执行文件在 Wine 上的运行速度比 Windows 上快——为什么？

发布于 2024-12-14 18:50:13 字数 3583 浏览 1 评论 0原文

解决方案： 显然，罪魁祸首是使用了 floor()，其性能在 glibc 中与操作系统相关。

这是前一个问题的后续问题：相同的程序在 Linux 上运行速度比Windows -- 为什么？

我有一个小的 C++ 程序，当使用 nuwen gcc 编译时4.6.1，在 Wine 上的运行速度比 Windows XP 快得多（在同一台计算机上）。问题：为什么会发生这种情况？

对于 Wine 和 Windows，时间分别约为 15.8 和 25.9 秒。请注意，我谈论的是相同的可执行文件，而不仅仅是相同的 C++ 程序。

源代码位于帖子末尾。编译后的可执行文件位于此处（如果您足够信任我的话）。

这个特定的程序没有做任何有用的事情，它只是从我拥有的一个更大的程序中总结出来的一个最小的例子。请参阅这个其他问题，了解一些更精确的基准测试原始程序（重要！！）并排除了最常见的可能性（例如 Windows 上占用 CPU 的其他程序、进程启动惩罚、系统调用的差异（例如内存分配））。另请注意，虽然为了简单起见，我在这里使用了 rand() ，但在原始版本中，我使用了自己的 RNG，我知道它没有堆分配。

我就该主题提出一个新问题的原因是，现在我可以发布一个实际的简化代码示例来重现该现象。

代码：

#include <cstdlib>
#include <cmath>


int irand(int top) {
    return int(std::floor((std::rand() / (RAND_MAX + 1.0)) * top));
}

template<typename T>
class Vector {
    T *vec;
    const int sz;

public:
    Vector(int n) : sz(n) {
        vec = new T[sz];
    }

    ~Vector() {
        delete [] vec;
    }

    int size() const { return sz; }

    const T & operator [] (int i) const { return vec[i]; }
    T & operator [] (int i) { return vec[i]; }
};


int main() {
    const int tmax = 20000; // increase this to make it run longer
    const int m = 10000;
    Vector<int> vec(150);

    for (int i=0; i < vec.size(); ++i)
        vec[i] = 0;

    // main loop
    for (int t=0; t < tmax; ++t)
        for (int j=0; j < m; ++j) {
            int s = irand(100) + 1;
            vec[s] += 1;
        }

    return 0;
}

更新

看来，如果我用确定性的东西替换上面的 irand() ，那么

int irand(int top) {
    static int c = 0;
    return (c++) % top;
}

时间差异就会消失。我想指出，我的原始程序我使用了不同的 RNG，而不是系统 rand()。我现在正在挖掘它的根源。

更新 2

现在，我用原始程序中的等效函数替换了 irand() 函数。它有点长（算法来自 Numerical Recipes），但重点是表明没有系统库被显式调用（除了可能通过floor()）。但时间差异仍然存在！

也许 floor() 是罪魁祸首？或者编译器生成对其他东西的调用？

class ran1 {
    static const int table_len = 32;
    static const int int_max = (1u << 31) - 1;

    int idum;
    int next;
    int *shuffle_table;

    void propagate() {
        const int int_quo = 1277731;

        int k = idum/int_quo;
        idum = 16807*(idum - k*int_quo) - 2836*k;
        if (idum < 0)
            idum += int_max;
    }

public:
    ran1() {
        shuffle_table = new int[table_len];
        seedrand(54321);
    }
    ~ran1() {
        delete [] shuffle_table;
    }

    void seedrand(int seed) {
        idum = seed;
        for (int i = table_len-1; i >= 0; i--) {
            propagate();
            shuffle_table[i] = idum;
        }
        next = idum;
    }

    double frand() {
        int i = next/(1 + (int_max-1)/table_len);
        next = shuffle_table[i];
        propagate();
        shuffle_table[i] = idum;
        return next/(int_max + 1.0);
    }
} rng;


int irand(int top) {
    return int(std::floor(rng.frand() * top));
}

原文

Solution: Apparently the culprit was the use of floor(), the performance of which turns out to be OS-dependent in glibc.

This is a followup question to an earlier one: Same program faster on Linux than Windows -- why?

I have a small C++ program, that, when compiled with nuwen gcc 4.6.1, runs much faster on Wine than Windows XP (on the same computer). The question: why does this happen?

The timings are ~15.8 and 25.9 seconds, for Wine and Windows respectively. Note that I'm talking about the same executable, not only the same C++ program.

The source code is at the end of the post. The compiled executable is here (if you trust me enough).

This particular program does nothing useful, it is just a minimal example boiled down from a larger program I have. Please see this other question for some more precise benchmarking of the original program (important!!) and the most common possibilities ruled out (such as other programs hogging the CPU on Windows, process startup penalty, difference in system calls such as memory allocation). Also note that while here I used rand() for simplicity, in the original I used my own RNG which I know does no heap-allocation.

The reason I opened a new question on the topic is that now I can post an actual simplified code example for reproducing the phenomenon.

The code:

#include <cstdlib>
#include <cmath>


int irand(int top) {
    return int(std::floor((std::rand() / (RAND_MAX + 1.0)) * top));
}

template<typename T>
class Vector {
    T *vec;
    const int sz;

public:
    Vector(int n) : sz(n) {
        vec = new T[sz];
    }

    ~Vector() {
        delete [] vec;
    }

    int size() const { return sz; }

    const T & operator [] (int i) const { return vec[i]; }
    T & operator [] (int i) { return vec[i]; }
};


int main() {
    const int tmax = 20000; // increase this to make it run longer
    const int m = 10000;
    Vector<int> vec(150);

    for (int i=0; i < vec.size(); ++i)
        vec[i] = 0;

    // main loop
    for (int t=0; t < tmax; ++t)
        for (int j=0; j < m; ++j) {
            int s = irand(100) + 1;
            vec[s] += 1;
        }

    return 0;
}

UPDATE

It seems that if I replace irand() above with something deterministic such as

int irand(int top) {
    static int c = 0;
    return (c++) % top;
}

then the timing difference disappears. I'd like to note though that in my original program I used a different RNG, not the system rand(). I'm digging into the source of that now.

UPDATE 2

Now I replaced the irand() function with an equivalent of what I had in the original program. It is a bit lengthy (the algorithm is from Numerical Recipes), but the point was to show that no system libraries are being called explictly (except possibly through floor()). Yet the timing difference is still there!

Perhaps floor() could be to blame? Or the compiler generates calls to something else?

class ran1 {
    static const int table_len = 32;
    static const int int_max = (1u << 31) - 1;

    int idum;
    int next;
    int *shuffle_table;

    void propagate() {
        const int int_quo = 1277731;

        int k = idum/int_quo;
        idum = 16807*(idum - k*int_quo) - 2836*k;
        if (idum < 0)
            idum += int_max;
    }

public:
    ran1() {
        shuffle_table = new int[table_len];
        seedrand(54321);
    }
    ~ran1() {
        delete [] shuffle_table;
    }

    void seedrand(int seed) {
        idum = seed;
        for (int i = table_len-1; i >= 0; i--) {
            propagate();
            shuffle_table[i] = idum;
        }
        next = idum;
    }

    double frand() {
        int i = next/(1 + (int_max-1)/table_len);
        next = shuffle_table[i];
        propagate();
        shuffle_table[i] = idum;
        return next/(int_max + 1.0);
    }
} rng;


int irand(int top) {
    return int(std::floor(rng.frand() * top));
}

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

何以畏孤独 2024-12-21 18:50:22

据我所知，在这两种不同的场景中使用的 C 标准库会有所不同。这会影响 rand() 调用以及 Floor()。

来自 mingw 站点... MinGW 编译器提供对 Microsoft C 运行时和一些特定语言运行时的功能的访问。 在 XP 下运行，这将使用 Microsoft 库。看起来很简单。

然而，wine 下的模型要复杂得多。根据这张图，操作系统的libc开始发挥作用。这可能就是两者的区别。

回复收藏 0 原文

洋洋洒洒 2024-12-21 18:50:22

虽然 Wine 基本上是 Windows，但您仍然在将苹果与橘子进行比较。此外，不仅是苹果/橙子，运输这些苹果和橙子的底层车辆也完全不同。

简而言之，您的问题可以简单地改写为“此代码在 Mac OSX 上比在 Windows 上运行得更快”并得到相同的答案。

回复收藏 0 原文

年华零落成诗 2024-12-21 18:50:21

维基百科说：

Wine 是一个兼容层而不是模拟器。它重复功能
通过提供 Windows 计算机的替代实现
Windows 程序调用的 DLL，[需要引用] 以及一个进程
替代 Windows NT 内核。这种复制方法
与也可能被视为仿真的其他方法不同，
Windows 程序在虚拟机中运行。[2]酒是
主要使用黑盒测试逆向工程编写，
避免版权问题。

这意味着 wine 的开发人员可以用任何东西替换 api 调用，只要最终结果与本机 Windows 调用得到的结果相同。我想他们不会因为需要使其与 Windows 的其余部分兼容而受到限制。

回复收藏 0 原文

岁月静好 2024-12-21 18:50:20

编辑：事实证明，罪魁祸首是 floor() 而不是我怀疑的 rand() - 请参阅
OP问题顶部的更新。

程序的运行时间主要由对rand()的调用决定。

因此，我认为 rand() 是罪魁祸首。我怀疑底层功能是由 WINE/Windows 运行时提供的，并且两种实现具有不同的性能特征。

测试这个假设的最简单方法是简单地在循环中调用rand()，并在两个环境中对相同的可执行文件进行计时。

编辑我查看了 WINE 源代码，这是它的 rand() 实现：

/*********************************************************************
 *              rand (MSVCRT.@)
 */
int CDECL MSVCRT_rand(void)
{
    thread_data_t *data = msvcrt_get_thread_data();

    /* this is the algorithm used by MSVC, according to
     * http://en.wikipedia.org/wiki/List_of_pseudorandom_number_generators */
    data->random_seed = data->random_seed * 214013 + 2531011;
    return (data->random_seed >> 16) & MSVCRT_RAND_MAX;
}

我无权访问微软的源代码进行比较，但如果性能差异在于获取线程本地数据而不是 RNG 本身，我也不会感到惊讶。

edit: It turned out that the culprit was floor() and not rand() as I suspected - see
the update at the top of the OP's question.

The run time of your program is dominated by the calls to rand().

I therefore think that rand() is the culprit. I suspect that the underlying function is provided by the WINE/Windows runtime, and the two implementations have different performance characteristics.

The easiest way to test this hypothesis would be to simply call rand() in a loop, and time the same executable in both environments.

edit I've had a look at the WINE source code, and here is its implementation of rand():

/*********************************************************************
 *              rand (MSVCRT.@)
 */
int CDECL MSVCRT_rand(void)
{
    thread_data_t *data = msvcrt_get_thread_data();

    /* this is the algorithm used by MSVC, according to
     * http://en.wikipedia.org/wiki/List_of_pseudorandom_number_generators */
    data->random_seed = data->random_seed * 214013 + 2531011;
    return (data->random_seed >> 16) & MSVCRT_RAND_MAX;
}

I don't have access to Microsoft's source code to compare, but it wouldn't surprise me if the difference in performance was in the getting of thread-local data rather than in the RNG itself.

回复收藏 0 原文

~没有更多了~