应用程序性能的基线和基准之间的差异

发布于 2024-07-10 02:08:44 字数 57 浏览 8 评论 0原文

什么是基线,什么是基准? 这些的最佳定义是什么?如何确定一组数字的基准并对另一组数字进行基准测试?

what is a baseline and what is a benchmark? what is the best definition for these and how do you baseline a set of numbers and benchmark another set?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(4

书信已泛黄 2024-07-17 02:08:44

来自 SPR(软件生产力研究)的有趣定义

基线和基准是相似但不同的活动。

打个比方,基线是组织的一条“沙线”,它可以衡量重要的绩效特征以供将来参考。

这不一定是“良好”状态”,只是一个参考。

最好通过该词本身的原始派生来理解基准:

从事重复性任务的商人,例如将木材锯成一致的长度,通常会在工作台上放置凹口以指示切割前木板的位置。 从字面上看,基准成为比较标准和过去成功的指标

基本上:

  • 基线是关于重要状态的识别,这意味着您的一组数字达到了公开认可的批准状态。
  • 基准测试是关于评估应用程序的相对性能。

Interesting definitions from SPR (Software Productivity Research)

Baseline and benchmark are similar but distinct activities.

Figuratively, a baseline is a "line in the sand" for an organization whereby it measures important performance characteristics for future reference.

This is not necessarily a "good" state", just a reference.

A benchmark is best understood by way of the original derivation of the word itself:

Tradesmen engaged in repetitive tasks, such as sawing lumber to consistent lengths, often placed notches on their workbenches to indicate placement of boards prior to cutting. Literally, a benchmark became a standard for comparison and an indicator of past success.

Basically:

  • baseline is about identification of a significant state, meaning your set of numbers met an approval status, publicly recognized.
  • a benchmark is about assessing the relative performance of an application.
极度宠爱 2024-07-17 02:08:44

您好,Gagneet,我是 Windows 性能团队的成员:以下是我们如何使用这些术语。

基线是对已知配置的测量,用作后续测量的参考。 对于基线,我们描述被测量的事物的特征:以冷启动时间为例。 在这里,我们拥有一组特征良好的机器 - 这意味着我们知道它们如何工作,我们为它们提供了良好的驱动程序,并且硬件没有损坏或有缺陷。

在这个硬件上,我们有几个“基线”测量,例如 XP-RTM、XP-SP2、Vista-RTM、Vista-SP1、Vista-SP2 等。

对于每个基线,我们都有一组良好表征和理解的测量,包括启动的所有阶段、CPU 数量、磁盘和内存利用率、DLL 加载数量等。

建立基线后,我们可以采取其他测量并将它们与基础进行比较线。 例如,我们目前正在开发 Window-7。 对于每个构建(每天),我们都会运行一组启动时间测试。 我们将每个 Win-7 版本的所有特征与基线测量值进行比较。 这包括所有以前的 Win-7 版本。 这让我们看到差异在哪里,并帮助我们深入研究问题领域。 以下是更多详细信息

HI Gagneet, I'm on the Windows performance team: here is how we use these terms.

A baseline is a measurement of a known configuration that is used as a reference for subsequent measurements. For base line, we characterize the thing being measured: lets take cold boot time for example. Here we have a set of machines that are well characterized - this means we know how they work, that we have good drivers for them, and that the hardware isn't broken or flawed.

On this hardware, we have several "base line" measurements such as XP-RTM, XP-SP2, Vista-RTM, Vista-SP1, Vista-SP2, etc. etc.

For each of these base lines, we have a set of well characterized and understood measurements including all the phases of boot, the amount of CPU, disk and memory utilization, the number of DLL loads, etc. etc.

After a baseline is established, we can then take other measurements and compare them to the base line. For example, we are currently working on Window-7. For each build (daily) we run a set of boot time tests. We compare all the characteristics of each Win-7 build to the base line measurements. This includes all the previous Win-7 builds. This lets us see where the differences lie and helps us drill into the problem areas. Here are some more details.

十二 2024-07-17 02:08:44

在科学研究中,基准是一种测试,基线是一种结果。

让我们看一个基准测试的例子:我们可能收集 5,000 个英语句子,并使用实验室的四核戴尔机器,使用各种算法将它们翻译成西班牙语。 因为我们保持数据和机器不变,所以我们可以有意义地比较不同算法完成任务所需的时间,以及它们的相对准确性(根据黄金标准人工翻译来衡量)。

为了找到这个基准测试的基线,我们可能会编写一个非常简单的翻译算法,它只为每个单词找到最常见的翻译,而不考虑上下文。 根据我们的人工翻译来衡量该算法的准确性,让我们了解其他算法必须超越的最低分数(基线),并让我们了解什么水平的准确性才算“好”。

在基线的另一端,上限也是一个有用的尺度。 在翻译示例中,我们可以通过测量其中一个人工翻译相对于其他翻译的准确性来找到上限。 这让我们知道,在达到人类分歧的上限之前,我们的“准确性”衡量标准可以达到多高。 我们希望我们的机器翻译算法能够在基线和上限之间的水平上执行。

In scientific research, a benchmark is a kind of test and a baseline is a kind of result.

Let's look at an example of a benchmark test: we might take a collection of 5,000 sentences in English and use the lab's four-core Dell machine to translate them into Spanish using various algorithms. Because we've kept the data and the machine constant, we can meaningfully compare the time taken by the different algorithms to complete the task, as well as their relative accuracy (measured against gold-standard human translations).

To find a baseline for this benchmark test, we might write a very naive translation algorithm that just finds the commonest translation for each individual word, with no regard for the context. Measuring the accuracy of this algorithm against our human translations gives us an idea of the minimum score - the baseline - that the others must beat, and gives us a feel for what level of accuracy counts as "good".

At the other end of the scale from a baseline, an upper bound is a useful yardstick too. In the translation example, we might find the upper bound by measuring the accuracy of one of our human translations with respect to the others. This gives us an idea of how high it's possible to get on our "accuracy" measure before you hit the ceiling of human disagreement. We expect our machine translation algorithms to perform at a level between the baseline and the upper bound.

轮廓§ 2024-07-17 02:08:44

如果我错了,请纠正我,但我相信“基线”是指已知的良好状态,而“基准”是指当前状态。 您可以进行基准测试并将其与基线进行比较。

Correct me if I'm wrong, but I believe "baseline" refers to a known good state, while "benchmark" refers to the current state. You would do a benchmark and compare it to the baseline.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文