在实时(未保存)Excel 数据和 C# 对象之间进行交互的最快方法
我想知道从打开的 Excel 工作簿到 C# 对象读取和写入数据的最快方法是什么。背景是我想开发从Excel使用的c#应用程序并使用excel中保存的数据。
业务逻辑将驻留在 C# 应用程序中,但数据将驻留在 Excel 工作簿中。用户将使用 Excel 并单击 Excel 工作簿上的按钮(或执行类似操作)来启动 C# 应用程序。然后,C# 应用程序将从 Excel 工作簿中读取数据,处理数据,然后将数据写回 Excel 工作簿。
可能有许多数据块需要读取并写回 Excel 工作簿,但它们通常尺寸相对较小,例如 10 行和 20 列。有时可能需要处理大量数据,大约 50,000 行和 40 列。
我知道使用 VSTO 来说这相对容易做到,但我想知道最快(但仍然强大且优雅)的解决方案是什么,并了解速度。我不介意该解决方案是否建议使用第三方产品或使用 C++。
显而易见的解决方案是使用 VSTO 或互操作,但我不知道与我当前用于读取数据的 VBA 相比,性能如何,或者是否还有其他解决方案。
这是在专家交流上发布的,说 VSTO 比 VBA 慢得多,但那是几年前的事了,我不知道性能是否有所提高。
http://www.experts-exchange.com/Microsoft/Development/VSTO /Q_23635459.html
谢谢。
I want to know what the fastest way is of reading and writing data to and from an open Excel workbook to c# objects. The background is that I want to develop a c# application that is used from Excel and uses data held in excel.
The business logic will reside in the c# application but the data will reside in an Excel workbook. The user will be using Excel and will click a button (or do something similar) on the excel workbook to initiate the c# application. The c# application will then read data off the excel workbook, process the data, and then write data back to the excel workbook.
There may be numerous blocks of data that are required to be read off and written back to the excel workbook but they will normally be of a relatively small size, say 10 rows and 20 columns. Occasionally a large list of data may need to be processed, of the order of 50,000 rows and 40 columns.
I know that this is relatively easy to do say using VSTO but I want to know what the fastest (but still robust and elegant) solution is and get an idea of the speed. I don't mind if the solution recommends using third party products or uses C++.
The obvious solution is using VSTO or interop but I don't know what the performance is like versus VBA which I'm currently using to read in the data, or if there are any other solutions.
This was posted on experts exchange saying that VSTO was dramatically slower than VBA but that was a couple of years ago and I don't know if the performance has improved.
http://www.experts-exchange.com/Microsoft/Development/VSTO/Q_23635459.html
Thanks.
如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。
绑定邮箱获取回复消息
由于您还没有绑定你的真实邮箱,如果其他用户或者作者回复了您的评论,将不能在第一时间通知您!
发布评论
评论(6)
我将以此为挑战,并打赌在 Excel 和 C# 之间调整数据的最快方法是使用 Excel-DNA - http://excel-dna.net。
(免责声明:我开发了 Excel-DNA。但它仍然是正确的...)
因为它使用本机 .xll 接口,所以它跳过了使用 VSTO 或其他基于 COM 的加载项方法时所产生的所有 COM 集成开销。使用 Excel-DNA,您可以创建一个连接到菜单或功能区按钮的宏,用于读取范围、处理它,并将其写回 Excel 中的范围。所有这些都使用 C# 的本机 Excel 界面 - 不是 COM 对象。
我制作了一个小型测试函数,它将当前选择放入一个数组中,对数组中的每个数字进行平方,然后将结果写入从单元格 A1 开始的 Sheet 2 中。您只需添加(免费)Excel-DNA 运行时,您可以从 http://excel-dna.net< 下载/a>.
我读入 C#,在一秒内处理一百万个单元格范围并将其写回 Excel。这对你来说够快吗?
我的函数如下所示:
I'll take this as a challenge, and will bet the fastest way to shuffle your data between Excel and C# is to use Excel-DNA - http://excel-dna.net.
(Disclaimer: I develop Excel-DNA. But it's still true...)
Because it uses the native .xll interface it skips all the COM integration overhead that you'd have with VSTO or another COM-based add-in approach. With Excel-DNA you could make a macro that is hooked up to a menu or ribbon button which reads a range, processes it, and writes it back to a range in Excel. All using the native Excel interface from C# - not a COM object in sight.
I've made a small test function that takes the current selection into an array, squares every number in the array, and writes the result into Sheet 2 starting from cell A1. You just need to add the (free) Excel-DNA runtime which you can download from http://excel-dna.net.
I read into C#, process and write back to Excel a million-cell range in under a second. Is this fast enough for you?
My function looks like this:
如果 C# 应用程序是独立应用程序,那么您将始终涉及跨进程封送,这将压倒您可以通过将语言从 C# 切换到 C++ 来进行的任何优化。在这种情况下,请坚持使用您最喜欢的语言,听起来像是 C#。
但是,如果您愿意制作一个在 Excel 中运行的加载项,那么您的操作将避免跨进程调用,并且运行速度提高约 50 倍。
如果您在 Excel 中作为加载项运行,则 VBA 是最快的选项之一,但它仍然涉及 COM,因此使用 XLL 加载项的 C++ 调用将是最快的。但 VBA 在调用 Excel 对象模型方面仍然相当快。然而,就实际计算速度而言,VBA 作为 pcode 运行,而不是作为完全编译的代码运行,因此执行速度比本机代码慢约 2-3 倍。这听起来很糟糕,但这并不是因为典型 Excel 加载项或应用程序所花费的绝大多数执行时间都涉及对 Excel 对象模型的调用,因此 VBA 与完全编译的 COM 加载项相比,例如使用本机编译的 VB 6.0 只会慢 5-15% 左右,这一点并不明显。
VB 6.0 是一种编译的 COM 方法,对于非 Excel 相关的调用,其运行速度比 VBA 快 2-3 倍,但 VB 6.0 目前已有大约 12 年历史,并且不会在 64 位模式下运行,例如,如果安装 Office 2010,可以安装以运行 32 位或 64 位。目前 64 位 Excel 的使用量很小,但使用量将会增加,因此我会因此避免使用 VB 6.0。
C#,如果作为 Excel 加载项在进程内运行,则执行对 Excel 对象模型的调用的速度与 VBA 一样快,并且执行非 Excel 调用的速度比 VBA 快 2-3 倍(如果不运行)。然而,Microsoft 推荐的方法是完全填充运行,例如,通过使用 COM 填充向导。通过填充,Excel 可以免受您的代码(如果有错误)的影响,并且您的代码可以完全免受其他第 3 方加载项的影响,否则可能会导致问题。然而,这样做的缺点是,经过填充的解决方案在单独的 AppDomain 中运行,这需要跨 AppDomain 封送,从而导致执行速度损失约 40 倍——这在许多情况下非常明显。
使用 Visual Studio Tools for Office (VSTO) 的加载项会自动加载到填充程序中,并在单独的 AppDomain 中执行。如果使用 VSTO,就无法避免这种情况。因此,对 Excel 对象模型的调用也会导致大约 40 倍的执行速度下降。 VSTO 是一个华丽的系统,可用于制作非常丰富的 Excel 加载项,但对于像您这样的应用程序来说,执行速度是它的弱点。
ExcelDna 是一个免费的开源项目,允许您使用 C# 代码,然后将其转换为使用 C++ 代码的 XLL 加载项。也就是说,ExcelDna 解析您的 C# 代码并为您创建所需的 C++ 代码。我自己没用过,但是流程很熟悉,印象深刻。 ExcelDna 得到了使用者的好评。 [编辑:请注意下面 Govert 评论中的以下更正:“嗨,Mike - 我想添加一个小更正来澄清 Excel-Dna 实现:所有托管到 Excel 的粘合都在运行时从托管程序集使用反射 - 没有额外的预编译步骤或 C++ 代码生成 此外,即使 Excel-Dna 使用 .NET,与 Excel 交互时也不需要涉及任何 COM 互操作 - 作为 .xll,可以直接使用本机接口。来自 .NET(尽管您也可以根据需要使用 COM),这使得高性能 UDF 和宏成为可能。” – Govert]
您可能还想看看 Add-in Express。它不是免费的,但它允许您用 C# 进行编码,虽然它将您的解决方案填充到一个单独的 AppDomain 中,但我相信它的执行速度非常出色。如果我正确理解其执行速度,那么我不确定 Add-in Express 是如何做到这一点的,但它可能利用了称为 FastPath AppDomain 封送处理的功能。不过,请不要引用我的话,因为我对 Add-in Express 不太熟悉。不过,您应该检查一下并进行自己的研究。 [编辑:阅读 Charles Williams 的回答,看起来 Add-in Express 可以同时支持 COM 和 C API 访问。 Govert 表示 Excel DNA 还支持 COM 和更快的 C API 访问。因此,您可能想检查两者并将它们与 ExcelDna 进行比较。]
我的建议是研究 Add-in Express 和 ExcelDna。这两种方法都允许您使用您似乎最熟悉的 C# 进行编码。
另一个主要问题是您如何拨打电话。例如,Excel 在处理作为数组来回传递的整个数据范围时速度非常快。这比单独循环单元格要高效得多。例如,以下代码使用 Excel.Range.set_Value 访问器方法一次性将 10 x 10 的值数组分配给 10 x 10 的单元格区域:
同样可以使用 Excel.Range.get_Value访问器方法,用于一步从某个范围读取一组值。执行此操作然后循环遍历数组内的值比单独循环遍历范围单元格内的值要快得多。
If the C# application is a stand-alone application, then you will always have cross-process marshaling involved that will overwhelm any optimizations you can do by switching languages from, say, C# to C++. Stick to your most preferred language in this situation, which sounds like is C#.
If you are willing to make an add-in that runs within Excel, however, then your operations will avoid cross-process calls and run about 50x faster.
If you run within Excel as an add-in, then VBA is among the fastest options, but it does still involve COM and so C++ calls using an XLL add-in would be fastest. But VBA is still quite fast in terms of calls to the Excel object model. As for actual calculation speed, however, VBA runs as pcode, not as fully compiled code, and so executes about 2-3x slower than native code. This sounds very bad, but it isn't because the vast majority of the execution time taken with a typical Excel add-in or application involves calls to the Excel object model, so VBA vs. a fully compiled COM add-in, say using natively compiled VB 6.0, would only be about 5-15% slower, which is not noticeable.
VB 6.0 is a compiled COM approach, and runs 2-3x faster than VBA for non-Excel related calls, but VB 6.0 is about 12 years old at this point and won't run in 64 bit mode, say if installing Office 2010, which can be installed to run 32 bit or 64 bit. Usage of 64 bit Excel is tiny at the moment, but will grow in usage, and so I would avoid VB 6.0 for this reason.
C#, if running in-process as an Excel add-in would execute calls to the Excel object model as fast as VBA, and execute non-Excel calls 2-3x faster than VBA -- if running unshimmed. The approach recommended by Microsoft, however, is to run fully shimmed, for example, by making use of the COM Shim Wizard. By being shimmed, Excel is protected from your code (if it's faulty) and your code is fully protected from other 3rd party add-ins that could otherwise potentially cause problems. The down-side to this, however, is that a shimmed solution runs within a separate AppDomain, which requires cross-AppDomain marshaling that incurrs an execution speed penalty of about 40x -- which is very noticeable in many contexts.
Add-ins using Visual Studio Tools for Office (VSTO) are automatically loaded within a shim and executes within a separate AppDomain. There is no avoiding this if using VSTO. Therefore, calls to the Excel object model would also incur an approximately 40x execution speed degradation. VSTO is a gorgeous system for making very rich Excel add-ins, but execution speed is its weakness for applications such as yours.
ExcelDna is a free, open source project that allows you to use C# code, which is then converted for you to an XLL add-in that uses C++ code. That is, ExcelDna parses your C# code and creates the required C++ code for you. I've not used it myself, but I am familiar with the process and it's very impressive. ExcelDna gets very good reviews from those that use it. [Edit: Note the following correction as per Govert's comments below: "Hi Mike - I want add a small correction to clarify the Excel-Dna implementation: all the managed-to-Excel glue works at runtime from your managed assembly using reflection - there is no extra pre-compilation step or C++ code generation. Also, even though Excel-Dna uses .NET, there need not be any COM interop involved when talking to Excel - as an .xll the native interface can be used directly from .NET (though you can also use COM if you want). This makes high-performance UDFs and macros possible." – Govert]
You also might want to look at Add-in Express. It's not free, but it would allow you to code in C# and although it shims your solution into a separate AppDomain, I believe that it's execution speed is outstanding. If I am understanding its execution speed correctly, then I'm not sure how Add-in Express doing this, but it might be taking advantage of something called FastPath AppDomain marshaling. Don't quote me on any of this, however, as I'm not very familiar with Add-in Express. You should check it out though and do your own research. [Edit: Reading Charles Williams' answer, it looks like Add-in Express enables both COM and C API access. And Govert states that Excel DNA also enables both COM and the fastrer C API access. So you'd probably want to check out both and compare them to ExcelDna.]
My advice would be to research Add-in Express and ExcelDna. Both approaches would allow you to code using C#, which you seem most familiar with.
The other main issue is how you make your calls. For example, Excel is very fast when handling an entire range of data passed back-and-forth as an array. This is vastly more efficient than looping through the cells individually. For example, the following code makes use of the Excel.Range.set_Value accessor method to assign a 10 x 10 array of values to a 10 x 10 range of cells in one shot:
One can similarly make use of the Excel.Range.get_Value accessor method to read an array of values from a range in one step. Doing this and then looping through the values within the array is vastly faster than looping trough the values within the cells of the range individually.
除了 Mike Rosenblum 对数组使用的评论之外,我想补充一点,我一直在使用这种方法(VSTO + 数组),当我测量它时,实际读取速度本身在毫秒内。只需记住在读/写之前禁用事件处理和屏幕更新,并记住在操作完成后重新启用。
使用 C#,您可以创建从 1 开始的数组,就像 Excel VBA 本身一样。这非常有用,特别是因为即使在 VSTO 中,当您从 Excel.Range 对象中提取数组时,该数组也是从 1 开始的,因此保持面向 Excel 的数组从 1 开始可以帮助您避免需要始终检查是否数组是从一开始或从零开始的。
(如果数组中的列位置对您很重要,则必须处理基于 0 和基于 1 的数组可能会很痛苦)。
通常将 Excel.Range 读入数组会看起来像像这样的东西:
我对 Mike Rosenblum 的数组写入的变体使用基于 1 的数组,如下所示:
Further to Mike Rosenblum's comments on the use of arrays, I'd like to add that I've been using the very approach (VSTO + arrays) and when I measured it, the actual read speed itself was within milliseconds. Just remember to disable event handling and screen updating prior to the read/write, and remember to re-enable after the operation is complete.
Using C#, you can create 1-based arrays exactly the same as Excel VBA itself does. This is pretty useful, especially because even in VSTO, when you extract the array from an Excel.Range object, the array is 1-based, so keeping the Excel-oriented arrays 1-based helps you avoid needing to always check for whether the array is one-based or zero-based.
(If the column position in the array has significance to you, having to deal with 0-based and 1-based arrays can be a real pain).
Generally reading the Excel.Range into an array would look something like this:
My variation of Mike Rosenblum's array-write uses a 1-based array like this:
Excel 数据最快的接口是 C API。有许多产品使用此接口将 .NET 链接到 Excel。
我喜欢执行此操作的 2 个产品是 Excel DNA(免费且开源)和 Addin Express(商业产品,同时提供 C API 和 COM 接口)。
The fastest interface to Excel data is the C API. There are a number of products out there that link .NET to Excel using this interface.
2 products I like that do this are Excel DNA (which is free and open source) and Addin Express (which is a commercial product and has both the C API and COM interface available).
首先,您的解决方案不能是 Excel UDF(用户定义函数)。在我们的手册中,我们给出了以下定义:“Excel UDF 用于在 Excel 中构建自定义函数,以便最终用户在公式中使用它们。”如果您提出更好的定义,我不会介意:)
该定义表明 UDF 无法向 UI 添加按钮(我知道 XLL 可以修改 CommandBar UI)或拦截键盘快捷键以及 Excel 事件。
也就是说,ExcelDNA 超出了范围,因为它旨在开发 XLL 加载项。这同样适用于 Add-in Express 的 Excel 目标功能,因为它允许开发 XLL 加载项和 Excel 自动化加载项。
由于您需要处理 Excel 事件,因此您的解决方案可以是独立的应用程序,但这种方法有明显的局限性。唯一真正的方法是创建一个 COM 加载项;它允许处理 Excel 事件并向 Excel UI 添加自定义内容。您有三种可能性:
如果谈论开发 Excel COM 加载项,则上述 3 个工具提供不同的功能:视觉设计器、匀场等。但我认为它们在访问 Excel 对象模型的速度上没有区别。比如说,我不知道(也无法想象)为什么从默认 AppDomain 获取 COM 对象应该不同于从另一个 AppDomain 获取相同的 COM 对象。顺便说一句,您可以通过创建共享加载项然后使用 COM 填充向导对其进行填充来检查填充是否会影响操作速度。
速度二.正如我昨天写给您的:“加速读取和写入一系列单元格的最佳方法是创建引用该范围的 Excel.Range 类型的变量,然后从 Value 属性读取/写入数组的变量。”但与 Francesco 所说的相反,我并不将此归因于 VSTO;这是 Excel 对象模型的一个功能。
速度三.最快的 Excel UDF 是用本机 C++ 编写的,而不是用任何 .NET 语言编写的。我还没有比较 ExcelDNA 和 Add-in Express 生成的 XLL 插件的速度;我认为您不会在这里发现任何实质性差异。
总结。我确信您的方法是错误的:基于 Add-in Express、VSTO 或共享加载项的 COM 加载项应该以相同的速度读取和写入 Excel 单元格。如果有人反驳这一说法,我会很高兴(真诚地)。
现在回答你的其他问题。 VSTO 不允许开发支持 Office 2000-2010 的 COM 加载项。它需要三个不同的代码库和至少两个版本的 Visual Studio 才能完全支持 Office 2003-2010;您需要有坚强的勇气和一定的运气才能为 Excel 2003 部署基于 VSTO 的加载项。使用 Add-in Express,您可以使用单个代码库为所有 Office 版本创建 COM 加载项; Add-in Express 为您提供了一个安装项目,可用于在 Excel 2000-2010(32 位和 64 位)中安装您的加载项; ClickOnce 部署也已上线。
VSTO 在一个方面胜过 Add-in Express:它允许创建所谓的文档级加载项。想象一个工作簿或模板,其背后有一些 .NET 代码;然而,如果部署这些东西是一场噩梦,我不会感到惊讶。
关于 Excel 事件。所有 Excel 事件均在 MSDN 中列出,例如,请参阅 Excel 2007 年活动
来自白俄罗斯的问候 (GMT+2),
Andrei Smolin
插件快速团队负责人
First off, your solution cannot be an Excel UDF (user-defined function). In our manuals, we give the following definition: "Excel UDFs are used to build custom functions in Excel for the end user to use them in formulas." I wouldn't mind if you suggest a better definition :)
That definition shows that a UDF cannot add a button to the UI (I know that XLLs can modify the CommandBar UI) or intercept keyboard shortcuts as well as Excel events.
That is, ExcelDNA is out of scope because it is purposed for developing XLL add-ins. The same applies to Excel-targeted functionality of Add-in Express since it allows developing XLL add-ins and Excel Automation add-ins.
Because you need to handle Excel events, your solution can be a standalone application but there are obvious limitations of such approach. The only real way is to create a COM add-in; it allows handling Excel events and adding custom things to the Excel UI. You have three possibilities:
If talking about developing an Excel COM add-in, the 3 tools above provide different features: visual designers, shimming, etc. But I don't think they differ in the speed of accessing the Excel Object Model. Say, I don't know (and cannot imagine) why getting a COM object from the Default AppDomain should differ from getting the same COM object from another AppDomain. BTW, you can check if shimming influences the speed of operation by creating a shared add-in and then using the COM Shim Wizard to shim it.
Speed II. As I wrote to you yesterday: "The best way to speed up reading and writing to a range of cells is to create a variable of the Excel.Range type referring to that range and then read/write an array from/to the Value property of the variable." But contrary to what Francesco says, I don't attribute this to VSTO; this is a feature of the Excel object model.
Speed III. The fastest Excel UDFs are written in native C++, not in any .NET language. I haven't compared the speed of an XLL add-in produced by ExcelDNA and Add-in Express; I don't think you'll find any substantial difference here.
To sum up. I am convinced you are on a wrong way: COM add-ins based on Add-in Express, VSTO or Shared Add-in should read and write Excel cells at the same speed. I will be glad (sincerely) if someone disproves this statement.
Now on your other questions. VSTO doesn't allow developing a COM add-in supporting Office 2000-2010. It requires three different codebases and at least two versions of Visual Studio to completely support Office 2003-2010; you need to have strong nerves and a portion of good luck to deploy a VSTO-based add-in for Excel 2003. With Add-in Express, you create a COM add-in for all Office versions with a single codebase; Add-in Express provides you with a setup project, which is ready to install your add-in in Excel 2000-2010 (32-bit and 64-bit); ClickOnce deployment is on board too.
VSTO beats Add-in Express in one area: it allows creating so-called document-level add-ins. Imagine a workbook or template with some .NET code behind it; I wouldn't be surprised, however, if deployment of such things is a nightmare.
On Excel events. All Excel events are listed in MSDN, for instance, see Excel 2007 events
Regards from Belarus (GMT+2),
Andrei Smolin
Add-in Express Team Leader
我使用 VBA 代码(宏)来收集和收集数据。压缩数据,并通过一次 C# 调用获取该数据,反之亦然。这可能是最有效的方法。
使用 C#,您总是需要使用一些编组。使用 VSTO 或 COM Interop,底层通信层(编组开销)是相同的。
在 VBA (Visual Basic For Application) 中,您可以直接处理 Excel 中的对象。因此,对这些数据的访问总是会更快。
但是...一旦您有了 C# 中的数据,对该数据的操作就会快得多。
如果您使用VB6或C++,您也会通过COM接口,并且您还将面临跨进程编组。
因此,您正在寻找一种最小化跨进程调用和编组的方法。
I've used VBA code (macro) to gather & compact the data, and get this data in one call to C#, and vice versa. This will probably be the most performant approach.
Using C#, you will always need to use some marshalling. Using VSTO or COM Interop, the underlaying communication layer (marshalling overhead) is the same.
In VBA (Visual Basic For Application) you work directly on the objects in Excel. So the access to this data will always be faster.
But.... Once you have the data in C#, the manipulation of this data can be a lot faster.
If you are using VB6 or C++, you also go through a COM interface, and you will also be facing cross process marshalling.
So you are looking for a method to minimize cross process calls and marshalling.