如何使这个自定义工作表初始化更快？

发布于 2024-10-20 04:08:19 字数 4051 浏览 2 评论 0原文

摘要

这个问题在某种程度上是这个问题的后续问题：
如何从索引实现列自命名？

在测试了上述链接问题的答案中提供的代码后，我终于遇到了严重的性能问题。

性能问题

性能问题发生在工作表初始化时，即当我初始化工作表的单元格时。

    ''' <summary>
    ''' Initialize an instance of the Company.Project.Sheet class.
    ''' </summary>
    ''' <param name="nativeSheet">The native worksheet from which to initialize.</param>
    Friend Sub New(ByVal nativeSheet As Microsoft.Office.Interop.Excel.Worksheet)
        _nativeSheet = nativeSheet
        Dim cells As IDictionary(Of String, ICell) = New Dictionary(Of String, ICell)()

        'These iterations hurt the performance of the API...'
        For rowIndex As Integer = 1 To _nativeSheet.Rows.Count Step 1
            For colIndex As Integer = 1 To _nativeSheet.Columns.Count Step 1
                Dim c As ICell = New Cell(_nativeSheet.Cells(rowIndex, colIndex))
                cellules.Add(c.Name, c)
            Next
        Next

        _cellules = New ReadOnlyDictionary(Of String, ICell)(cells)
    End Sub

ReadOnlyDictionary(Of TKey, TValue)：
一个自定义只读字典，它简单地包装 IDictionary(Of TKey, TValue) 以防止修改。
讨论

我正在以这种方式工作，因为基础电子表格工作表中的每个单元格从工作表初始化开始直到结束，即工作表被处置或最终确定时进行初始化。因此，我希望以同样的方式初始化工作表的单元格，但我也希望保持使用索引单元格相对于命名（“A1”）单元格的性能提升，同时保持 API 用户的易用性用其名称引用单元格，这就是我打算使用字典的方式，这样当我引用单元格“A1”时，我会将此键访问到我的字典中并相应地寻址单元格 (1, 1)。

此外，我知道有一种更快的方法，可以使用 Worksheet.UsedRange 属性从工作表中读取数据，该属性将所有使用的单元格返回到二维矩阵中。

如果无论如何，对于我可以用来初始化 Cell 类的多个实例的单元格集合来说，存在相同或大致相同的单元格，那就太好了，并且性能良好！
我还考虑过在内存中只初始化一个 100 x 100 矩阵单元，同时使用我的字典映射它们，因为人们很少会使用整个工作表的单元。因此，我仍在考虑一种必须访问尚未初始化的单元格的方法，例如 Cells(120, 120)。理想情况下，我认为程序必须初始化最大初始初始化 Cell(100, 100) 到 Cell (120, 120) 之间的所有单元格。我在这里说得够清楚了吗？请随时要求澄清！ =)

另一个选择可能是我只将单元格的名称初始化到字典中并将行和列索引保留在内存中，而不是使用其 nativeCell 初始化 Cell 实例，例如 Range。这是我的 Cell 类的代码来说明我的意思。

''' ''' 代表工作表中的一个单元格。 ''' ''' 朋友级细胞实现 ICell

私有 _nativeCell 作为 Microsoft.Office.Interop.Excel.Range
私有 _name 作为字符串

''' <摘要>
''' 初始化 Company.Project.Cell 类的新实例。
''' 
''' 要换行的 Microsoft.Office.Interop.Excel.Range。
Friend Sub New(ByVal nativeCell As Microsoft.Office.Interop.Excel.Range)
    _nativeCell=nativeCell
结束子

公共只读属性 NativeCell() 作为 Microsoft.Office.Interop.Excel.Range 实现 ICellule.NativeCell
    得到
        返回_nativeCell 
    结束获取
结束财产

Public ReadOnly 属性 Column() As Integer 实现 ICell.Column
    得到
        返回_nativeCell.Column
    结束获取
结束财产

Public ReadOnly 属性 Row() As Integer 实现 ICell.Row
    得到
        返回_nativeCell.Row
    结束获取
结束财产

Public ReadOnly Property Name() As String 实现 ICellule.Name
    得到
        如果 (String.IsNullOrEmpty(_name) OrElse _name.Trim().Length = 0) 那么 _
            _name = 获取列名称()

        返回_nom
    结束获取
结束财产

公共属性 Value() 作为对象实现 ICellule.Value
    得到
        返回_nativeCell.Value2
    结束获取
    Set(ByVal 值作为对象)
        _nativeCell.Value2 = 值
    结束组
结束财产

Public ReadOnly 属性 FormattedValue() As String 实现 ICellule.FormattedValue
    得到
        返回_nativeCell.Text
    结束获取
结束财产

公共只读属性 NumericValue() 作为 Double？实现 ICellule.NumericValue
    得到
        返回值
    结束获取
结束财产

问题

我还有哪些其他选择？
还有其他的步行方式吗？
有没有一种方法可以使实际方法在性能问题上可行？

供您参考，此问题在测试中超时，因此测试从未在可接受的时间范围内结束，这实际上需要几个世纪......

欢迎任何想法！我对其他解决方案或方法持开放态度，这些解决方案或方法将帮助我在解决此性能问题的同时实现此目标。

谢谢大家！ =)

编辑#1

感谢Maxim Gueivandov，他的解决方案解决了我在这个问题。

除此之外，此解决方案还产生了另一个问题：SystemOutOfMemoryException，这将在另一个问题中解决。

我衷心感谢马克西姆·格万多夫。

原文

Summary

This question is somehow the follow-up to this question:
How to implement column self-naming from its index?

Having tested the code provided in this above-linked question's answers, I finally encountered a serious performance issue.

Performance issue

The performance issue occurs upon a Sheet initialization, that is, when I initialize the Sheet's Cells.

    ''' <summary>
    ''' Initialize an instance of the Company.Project.Sheet class.
    ''' </summary>
    ''' <param name="nativeSheet">The native worksheet from which to initialize.</param>
    Friend Sub New(ByVal nativeSheet As Microsoft.Office.Interop.Excel.Worksheet)
        _nativeSheet = nativeSheet
        Dim cells As IDictionary(Of String, ICell) = New Dictionary(Of String, ICell)()

        'These iterations hurt the performance of the API...'
        For rowIndex As Integer = 1 To _nativeSheet.Rows.Count Step 1
            For colIndex As Integer = 1 To _nativeSheet.Columns.Count Step 1
                Dim c As ICell = New Cell(_nativeSheet.Cells(rowIndex, colIndex))
                cellules.Add(c.Name, c)
            Next
        Next

        _cellules = New ReadOnlyDictionary(Of String, ICell)(cells)
    End Sub

ReadOnlyDictionary(Of TKey, TValue) :
A custom read-only dictionary that simply wraps a IDictionary(Of TKey, TValue) to prevent modifications.
Discussion

I'm working this way since each cell in an underlying spreadsheet worksheet is initialized from the initialization of the worksheet until the end, that is, when the worksheet is disposed or finalized. Hence, the same way I wish to initialize the cells of a Sheet, but I also wish to keep the performance boost of using the indexed cells over the named ("A1") cells, while keeping the ease of use to the API user to refer to a cell with its name, that is how I intend to use the dictionary, so that when I refer to cell "A1", I access this key into my dictionary and address the cell (1, 1) accordingly.

Aside, I know of an even faster way to read from a worksheet using the Worksheet.UsedRange property that returns all of the used cells into a 2D matrix.
If there was anyhow the same or about the same for the set of cells with which I could initialize multiple instances of my Cell class with, this would be great, and performant!
I also thought of initializing like only a 100 x 100 matrix cells in memory while mapping them with my dictionary, as one will rarely use the whole sheet's cells. As such, I am still thinking of a way where I would have to access a not yet initialized cell, let's say Cells(120, 120). Ideally, I think, the program would have to initialize all the cells between the maximum initially initialized Cell(100, 100) until Cell (120, 120). Am I clear enough here? Feel free to ask for clarification! =)

Another option could be that I only initialize the cells' names into the dictionary and keeping there row and column index in memory, not initializing a Cell instance with its nativeCell, say a Range. Here's the code of my Cell class to illustrate what I mean.

'''
''' Represents a cell in a worksheet.
'''
'''
Friend Class Cell
Implements ICell

Private _nativeCell As Microsoft.Office.Interop.Excel.Range
Private _name As String

''' <summary>
''' Initializes a new instance of the Company.Project.Cell class.
''' </summary>
''' <param name="nativeCell">The Microsoft.Office.Interop.Excel.Range to wrap.</param>
Friend Sub New(ByVal nativeCell As Microsoft.Office.Interop.Excel.Range)
    _nativeCell = nativeCell
End Sub

Public ReadOnly Property NativeCell() As Microsoft.Office.Interop.Excel.Range Implements ICellule.NativeCell
    Get
        Return _nativeCell 
    End Get
End Property

Public ReadOnly Property Column() As Integer Implements ICell.Column
    Get
        Return _nativeCell.Column
    End Get
End Property

Public ReadOnly Property Row() As Integer Implements ICell.Row
    Get
        Return _nativeCell.Row
    End Get
End Property

Public ReadOnly Property Name() As String Implements ICellule.Name
    Get
        If (String.IsNullOrEmpty(_name) OrElse _name.Trim().Length = 0) Then _
            _name = GetColumnName()

        Return _nom
    End Get
End Property

Public Property Value() As Object Implements ICellule.Value
    Get
        Return _nativeCell.Value2
    End Get
    Set(ByVal value As Object)
        _nativeCell.Value2 = value
    End Set
End Property

Public ReadOnly Property FormattedValue() As String Implements ICellule.FormattedValue
    Get
        Return _nativeCell.Text
    End Get
End Property

Public ReadOnly Property NumericValue() As Double? Implements ICellule.NumericValue
    Get
        Return Value
    End Get
End Property

Questions

What are my other options?
Are there any other ways to walk through?
Is there a way I can make the actual approach viable as for performance concerns?

For your information, this issue timed out on testing, so the test never ended within an acceptable time range which actually take centuries...

Any thoughts are welcome! I'm open minded to other solutions or approach that will help me achieve this objective while addressing this performance issue.

Thanks to you all! =)

EDIT #1

Thanks to Maxim Gueivandov, his solution solves the issue I have addressed in this question.

Aside, there's another problem that arose from this solution: SystemOutOfMemoryException, and that will be addressed in another question.

My Sincerest Thanks to Maxim Gueivandov.

分享到QQ

分享到微博

如果你对这篇内容有疑问，欢迎到本站社区发帖提问参与讨论，获取更多帮助，或者扫码二维码加入 Web 技术交流群。

发布评论

需要登录才能够评论，你可以免费注册一个本站的账号。

情何以堪。 2024-10-27 04:08:19

您可以尝试在一跳中获取已使用范围内的所有单元格，从而避免在迭代的每次迭代中调用 Cells(rowIndex, colIndex)（我猜 Cells 隐藏了互操作调用，这可能会影响性能）。

Dim usedRange As Range = nativeSheet.UsedRange
Dim cells(,) As Object = DirectCast(usedRange.get_Value( _
    XlRangeValueDataType.xlRangeValueDefault), Object(,))
[... do your row/col iterations ...]

您将在以下文章中找到我基于这些假设的一些性能提示：C# Excel 互操作使用。最值得注意的是，检查基准部分：

=== C# 中的 Excel 互操作基准 ===
单元格[]：30.0 秒
get_Range()，单元格[]：15.0 秒
UsedRange、get_Value()：1.5 秒
[最快]

You could try to get all cells in the used range in one hop, thus avoiding to call Cells(rowIndex, colIndex) on each iteration of iteration (I guess that Cells hides an interop call, which may have a performance impact).

Dim usedRange As Range = nativeSheet.UsedRange
Dim cells(,) As Object = DirectCast(usedRange.get_Value( _
    XlRangeValueDataType.xlRangeValueDefault), Object(,))
[... do your row/col iterations ...]

You'll find some performance tips on which I based these assumptions in the following article: C# Excel Interop Use. Most notably, check the benchmark part: