德国航天中心表现

发布于 2024-10-15 10:18:34 字数 2727 浏览 8 评论 0 原文

我打算创建一个 Web 服务,尽可能快地执行大量手动指定的计算,并且一直在探索 DLR 的使用。

很抱歉,如果本文很长,请随意浏览并了解总体要点。

我一直在使用 IronPython 库,因为它使计算变得非常容易指定。我的工作笔记本电脑每秒执行约 400,000 次计算,执行以下操作:

ScriptEngine py = Python.CreateEngine();
ScriptScope pys = py.CreateScope();

ScriptSource src = py.CreateScriptSourceFromString(@"
def result():
    res = [None]*1000000
    for i in range(0, 1000000):
        res[i] = b.GetValue() + 1
    return res
result()
");

CompiledCode compiled = src.Compile();
pys.SetVariable("b", new DynamicValue());

long start = DateTime.Now.Ticks;
var res = compiled.Execute(pys);
long end = DateTime.Now.Ticks;

Console.WriteLine("...Finished. Sample data:");

for (int i = 0; i < 10; i++)
{
    Console.WriteLine(res[i]);
}

Console.WriteLine("Took " + (end - start) / 10000 + "ms to run 1000000 times.");

其中 DynamicValue 是一个从预构建数组(在运行时播种和构建)返回随机数的类。

当我创建一个 DLR 类来执行相同的操作时,我获得了更高的性能(每秒约 10,000,000 次计算)。该类如下:

class DynamicCalc : IDynamicMetaObjectProvider
{
    DynamicMetaObject IDynamicMetaObjectProvider.GetMetaObject(Expression parameter)
    {
        return new DynamicCalcMetaObject(parameter, this);
    }

    private class DynamicCalcMetaObject : DynamicMetaObject
    {
        internal DynamicCalcMetaObject(Expression parameter, DynamicCalc value) : base(parameter, BindingRestrictions.Empty, value) { }

        public override DynamicMetaObject BindInvokeMember(InvokeMemberBinder binder, DynamicMetaObject[] args)
        {
            Expression Add = Expression.Convert(Expression.Add(args[0].Expression, args[1].Expression), typeof(System.Object));
            DynamicMetaObject methodInfo = new DynamicMetaObject(Expression.Block(Add), BindingRestrictions.GetTypeRestriction(Expression, LimitType));
            return methodInfo;
        }
    }
}

并通过执行以下操作以相同的方式进行调用/测试:

dynamic obj = new DynamicCalc();
long t1 = DateTime.Now.Ticks;
for (int i = 0; i < 10000000; i++)
{
    results[i] = obj.Add(ar1[i], ar2[i]);
}
long t2 = DateTime.Now.Ticks;

其中 ar1 和 ar2 是预先构建的运行时随机数种子数组。

这种方式速度很快,但是指定计算并不容易。我基本上会考虑创建自己的词法分析器和解析器,而 IronPython 已经拥有我需要的一切。

我原以为我可以从 IronPython 获得更好的性能,因为它是在 DLR 之上实现的,而且我可以做得比我得到的更好。

我的示例是否充分利用了 IronPython 引擎?是否有可能从中获得更好的性能?

(编辑)与第一个示例相同,但使用 C# 中的循环,设置变量并调用 python 函数:

ScriptSource src = py.CreateScriptSourceFromString(@"b + 1");

CompiledCode compiled = src.Compile();

double[] res = new double[1000000];

for(int i=0; i<1000000; i++)
{
    pys.SetVariable("b", args1[i]);
    res[i] = compiled.Execute(pys);
}

其中 pys 是来自 py 的 ScriptScope,args1 是预先构建的随机双精度数组。此示例的执行速度比在 Python 代码中运行循环并传入整个数组要慢。

I'm intending to create a web service which performs a large number of manually-specified calculations as fast as possible, and have been exploring the use of DLR.

Sorry if this is long but feel free to skim over and get the general gist.

I've been using the IronPython library as it makes the calculations very easy to specify. My works laptop gives a performance of about 400,000 calculations per second doing the following:

ScriptEngine py = Python.CreateEngine();
ScriptScope pys = py.CreateScope();

ScriptSource src = py.CreateScriptSourceFromString(@"
def result():
    res = [None]*1000000
    for i in range(0, 1000000):
        res[i] = b.GetValue() + 1
    return res
result()
");

CompiledCode compiled = src.Compile();
pys.SetVariable("b", new DynamicValue());

long start = DateTime.Now.Ticks;
var res = compiled.Execute(pys);
long end = DateTime.Now.Ticks;

Console.WriteLine("...Finished. Sample data:");

for (int i = 0; i < 10; i++)
{
    Console.WriteLine(res[i]);
}

Console.WriteLine("Took " + (end - start) / 10000 + "ms to run 1000000 times.");

Where DynamicValue is a class that returns random numbers from a pre-built array (seeded and built at run time).

When I create a DLR class to do the same thing, I get much higher performance (~10,000,000 calculations per second). The class is as follows:

class DynamicCalc : IDynamicMetaObjectProvider
{
    DynamicMetaObject IDynamicMetaObjectProvider.GetMetaObject(Expression parameter)
    {
        return new DynamicCalcMetaObject(parameter, this);
    }

    private class DynamicCalcMetaObject : DynamicMetaObject
    {
        internal DynamicCalcMetaObject(Expression parameter, DynamicCalc value) : base(parameter, BindingRestrictions.Empty, value) { }

        public override DynamicMetaObject BindInvokeMember(InvokeMemberBinder binder, DynamicMetaObject[] args)
        {
            Expression Add = Expression.Convert(Expression.Add(args[0].Expression, args[1].Expression), typeof(System.Object));
            DynamicMetaObject methodInfo = new DynamicMetaObject(Expression.Block(Add), BindingRestrictions.GetTypeRestriction(Expression, LimitType));
            return methodInfo;
        }
    }
}

and is called/tested in the same way by doing the following:

dynamic obj = new DynamicCalc();
long t1 = DateTime.Now.Ticks;
for (int i = 0; i < 10000000; i++)
{
    results[i] = obj.Add(ar1[i], ar2[i]);
}
long t2 = DateTime.Now.Ticks;

Where ar1 and ar2 are pre-built, runtime seeded arrays of random numbers.

The speed is great this way, but it's not easy to specify the calculation. I'd basically be looking at creating my own lexer & parser, whereas IronPython has everything I need already there.

I'd have thought I could get much better performance from IronPython since it is implemented on top of the DLR, and I could do with better than what I'm getting.

Is my example making best use of the IronPython engine? Is it possible to get significantly better performance out of it?

(Edit) Same as first example but with the loop in C#, setting variables and calling the python function:

ScriptSource src = py.CreateScriptSourceFromString(@"b + 1");

CompiledCode compiled = src.Compile();

double[] res = new double[1000000];

for(int i=0; i<1000000; i++)
{
    pys.SetVariable("b", args1[i]);
    res[i] = compiled.Execute(pys);
}

where pys is a ScriptScope from py, and args1 is a pre-built array of random doubles. This example executes slower than running the loop in the Python code and passing in the entire arrays.

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(2

薔薇婲 2024-10-22 10:18:34

德尔南的评论会让您发现这里的一些问题。但我将具体说明这里的差异。在 C# 版本中,您已经删除了 Python 版本中的大量动态调用。对于初学者来说,您的循环类型为 int ,听起来 ar1 和 ar2 是强类型数组。因此,在 C# 版本中,唯一的动态操作是对 obj.Add 的调用(这是 C# 中的 1 个操作),如果未将结果键入到对象(这似乎不太可能),则可能会对结果进行赋值。另请注意,所有这些代码都是无锁的。

在 Python 版本中,您首先要分配列表 - 这似乎也是在计时器期间进行的,而在 C# 中则不然。然后你就可以动态调用范围,幸运的是这只发生一次。但这再次在内存中创建了一个巨大的列表 - delnan 对 xrange 的建议是这里的一个改进。然后你就有了循环计数器 i,它在循环的每次迭代中都被装箱到一个对象中。然后,您可以调用 b.GetValue(),这实际上是 2 个动态调用 - 首先是一个 get 成员来获取“GetValue”方法,然后是对该绑定方法对象的调用。这再次为循环的每次迭代创建一个新对象。然后,您将得到 b.GetValue() 的结果,它可能是每次迭代时装箱的另一个值。然后将该结果加 1,并且每次迭代时都会进行另一个装箱操作。最后,将其存储到列表中,这是另一个动态操作 - 我认为最终操作需要锁定以确保列表保持一致(同样,delnan 使用列表理解的建议改进了这一点)。

总而言之,在循环过程中,我们得到:

                            C#       IronPython
Dynamic Operations           1           4
Allocations                  1           4
Locks Acquired               0           1

基本上,与 C# 相比,Python 的动态行为确实是有代价的。如果您想要两全其美,您可以尝试平衡 C# 和 Python 中的操作。例如,您可以用 C# 编写循环,并让它调用作为 Python 函数的委托(您可以执行scope.GetVariable> 来将函数作为委托从作用域中获取)。如果您确实需要获得最后一点性能,您还可以考虑为结果分配 .NET 数组,因为它可能会通过不保留一堆装箱值来减少工作集和 GC 复制。

要进行委托,您可以让用户编写:

def computeValue(value):
    return value + 1

然后在 C# 代码中执行以下操作:

CompiledCode compiled = src.Compile();
compiled.Execute(pys);
var computer = pys.GetVariable<Func<object,object>>("computeValue");

现在您可以执行以下操作:

for (int i = 0; i < 10000000; i++)
{
    results[i] = computer(i);
}

delnan's comment leads you to some of the problems here. But I'll just get specific about what the differences are here. In the C# version you've cut out a significant amount of the dynamic calls that you have in the Python version. For starters your loop is typed to int and it sounds like ar1 and ar2 are strongly typed arrays. So in the C# version the only dynamic operations you have are the call to obj.Add (which is 1 operation in C#) and potentially the assignment to results if it's not typed to object which seems unlikely. Also note all of this code is lock free.

In the Python version you first have the allocation of the list - this also appears to be during your timer where as in C# it doesn't look like it is. Then you have the dynamic call to range, luckily that only happens once. But that again creates a gigantic list in memory - delnan's suggestion of xrange is an improvement here. Then you have the loop counter i which is getting boxed to an object for every iteration through the loop. Then you have the call to b.GetValue() which is actually 2 dynamic invocatiosn - first a get member to get the "GetValue" method and then an invoke on that bound method object. This is again creating one new object for every iteration of the loop. Then you have the result of b.GetValue() which may be yet another value that's boxed on every iteration. Then you add 1 to that result and you have another boxing operation on every iteration. Finally you store this into your list which is yet another dynamic operation - I think this final operation needs to lock to ensure the list remains consistent (again, delnan's suggestion of using a list comprehension improves this).

So in summary during the loop we have:

                            C#       IronPython
Dynamic Operations           1           4
Allocations                  1           4
Locks Acquired               0           1

So basically Python's dynamic behavior does come at a cost vs C#. If you want the best of both worlds you can try and balance what you do in C# vs what you do in Python. For example you could write the loop in C# and have it call a delegate which is a Python function (you can do scope.GetVariable> to get a function out of the scope as a delegate). You could also consider allocating a .NET array for the results if you really need to get every last bit of performance as it may reduce working set and GC copying by not keeping around a bunch of boxed values.

To do the delegate you could have the user write:

def computeValue(value):
    return value + 1

Then in the C# code you'd do:

CompiledCode compiled = src.Compile();
compiled.Execute(pys);
var computer = pys.GetVariable<Func<object,object>>("computeValue");

Now you can do:

for (int i = 0; i < 10000000; i++)
{
    results[i] = computer(i);
}
若水微香 2024-10-22 10:18:34

如果您关心计算速度,那么查看低级计算规范是否更好? Python和C#都是高级语言,其实现运行时可能会花费大量时间进行卧底工作。

查看此 LLVM 包装器库:http://www.llvmpy.org

  • 使用以下命令安装它:pip安装 llvmpy ply
  • 或在 Debian Linux 上:apt install python-llvmpy python-ply

您仍然需要编写一些小型编译器(您可以使用 PLY 库),并将其与 LLVM JIT 调用绑定(请参阅 LLVM 执行引擎),但这种方法可能更有效(生成的代码更接近真实的代码) CPU 代码),以及与 .NET Jail 相比的多平台

LLVM 已经准备好使用优化编译器基础设施,包括大量的优化器阶段模块以及庞大的用户和开发者社区。

另请参阅此处: http://gmarkall.github.io/tutorials/llvm-cauldron- 2016

PS:如果您对此感兴趣,我可以帮助您编写编译器,同时为我的项目手册做出贡献。但它不会是快速启动的,这个主题对我来说也是新的。

If you concerned about computation speed, is it better to look at lowlevel computation specification? Python and C# are high-level languages, and its implementation runtime can spend a lot of time for undercover work.

Look on this LLVM wrapper library: http://www.llvmpy.org

  • Install it using: pip install llvmpy ply
  • or on Debian Linux: apt install python-llvmpy python-ply

You still need to write some tiny compiler (you can use PLY library), and bind it with LLVM JIT calls (see LLVM Execution Engine), but this approach can be more effective (generated code much closer to real CPU code), and multiplatform comparing to .NET jail.

LLVM has ready to use optimizing compiler infrastructure, including a lot of optimizer stage modules, and big user and developer community.

Also look here: http://gmarkall.github.io/tutorials/llvm-cauldron-2016

PS: If you interested in it, I can help you with a compiler, contributing to my project's manual in parallel. But it will not be jumpstart, this theme is new to me too.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文