需要更好的优化代码吗?

发布于 2024-12-02 06:14:11 字数 2960 浏览 1 评论 0 原文

需要大量优化的代码。我有一个项目,并且成功地使其与 vba 一起工作(主要是由 stackoverflow 程序员提供的帮助,谢谢) 但今天我收到了反馈。它删除了记录中另外两个唯一的条目,但我不知道为什么要删除它们。

我应用的算法

我使用了 COUNTIF 函数,我在谷歌上找到了它

    ="countif(A$1:A2,A3)=0" A3 is the active cell, Checks A2,A1 for dupes

,如果 A 列中有重复项,则抛出 False;如果 A 列是唯一的,则抛出 True。我对 Countif 的理解是 它会检查该单元格中所有上述列的值,我的意思是让我们采用 A4。因此它会检查 A2、A1、A3 是否有重复项。类似地,A10 检查 A1 到 A9 并抛出 TRue 或 False。它正在工作,但我不知道出了什么问题。代码对某些条目不起作用。有时甚至对唯一条目显示 False。

由于我有更多的数据,因此应用这些公式需要更多的时间。我试图让它更干净、更优化。人们告诉我,它不是 ac 或其他语言来优化它,但我需要代码来使我的代码更加优化,

我需要这些条件的代码,任何人都可以帮助我,因为我的计数失败了.我这样做有点无奈。

1) 我有一列,我应该检查该列中是否有重复项,如果重复则删除该行

2) 我在该列中有 35000 个旧条目,每周都会附加 2000 个新条目。我需要从总共 37000 个条目中检查这 2000 个条目(当我们追加时,我们得到 35000+2000),这些删除操作只需要对新追加的 2000 个条目执行,但它应该检查整个列的重复项

让我清楚地解释一下我新添加了 2000 个条目,因此仅检查这些条目是否与 35000 个条目及其自身(2000 个条目)中的重复项进行检查如有重复则删除,且35000条旧数据不进行重复操作。

我找到了一些代码,但他们甚至删除了 35000 个条目的重复项。我已经设置了范围,但即使它不起作用。 任何人都可以帮助我编写花费更少时间的最佳代码吗?请谢谢您

用示例代码更新我的问题,我

   A       B            F       G        H       I              Y          
  PTY   39868.5         4       2       540      3      PTY39868.5425403 
  GTY   34446.1234      2       1       230      1      GTY34446.1234212301
  PTY   3945.678                2                2       PTY3945.67822
  GTY   34446.1234      2       1       230      1      GTY34446.1234212301
                  let us say these are old 35000 entries

对上面的示例进行了解释。

以上是35000个条目。我必须检查 A、B、F、G、H、I 列中的重复项,如果它们相同,我必须删除该行,我不应该担心其他列 c、d 等,所以我所做的是我已使用一个未使用的 Y 列,并使用这些将这 6 列值连接到 Y 列处的 1

  = A2 & B2 & F2 & G2 & H2 &I2 with the respective columns

现在检查 Y 列是否重复并删除整行。据我所知,2003 年仅支持一栏。

请注意,即使 35000 个条目也可能有重复项,但我不应该删除它们。例如,您可以看到我的示例代码中的第 2 行和最后一行是重复的,但我不应该删除 因为它是旧数据。

   A       B            F       G        H       I              Y          
  PTY   39868.5         4       2       540      3      PTY39868.5425403     'old 
  GTY   34446.1234      2       1       230      1      GTY34446.1234212301   'old
  PTY   3945.678                2                2       PTY3945.67822        'old
  GTY   34446.1234      2       1       230      1      GTY34446.1234212301    'old
  PTY    3945.678       1       1       230      2      PTY3945.678112302      'new
  PTY    39868.5        4       2       540      3      PTY39868.5425403       'new 
  PTY    3945.678       1       1       230      2      PTY3945.678112302      'new

现在请注意,新条目 PTY(从最后第二个)是原始记录(首先是 PTY)的重复项,因此我必须删除它。最后一个新条目是新条目本身的重复项,所以我什至应该删除它。所以在上面的代码中,我必须只删除最后两行,它们是原始记录的重复项,也从中删除。但不应删除 GTY,它是复制品,但位于原始记录中。

我想我现在已经给出了清晰的看法。正在将它们连接成一个单元格。这是更好的接近方式吗?因为连接 40000 个条目只需要 2 秒,我认为这并不重要,但任何更多的算法都值得赞赏

我听说 counif 将 45.00 和 45.00000 视为不同的,这可能是它的问题吗?因为我的数据中有小数点。我想我应该做什么

    = I2 & H2 & G2 & F2 & A2 & B2

更好地连接?这是我之前发布的还是其他的?

Need a much Optimized code.Well I Got a Project and I have Succefully made it work with the vba (Mostly helped by the stackoverflow programmers Thanks for that)
But Today I got a Feedback. Its deleting 2 more unique entries in the record But I dont know why its deleting Them.

The Algorithm I have applied

I have Used the COUNTIF function Which I found on google

    ="countif(A$1:A2,A3)=0" A3 is the active cell, Checks A2,A1 for dupes

It Throws False if there is a duplicate in The A column and True If it is a unique.What I have understood about Countif is that
It checks all the above columns values from that cell I mean let us take A4. SO it checks A2,A1,A3 for the duplicate. Similarly A10 checks for A1 to A9 and throws either TRue or False.Well It was working But I dont know what went wrong The code is not working for some entries.Its even showing False for the Unique entries sometimes.

And its taking more time to applye these formula as I have more amount of data. Im trying to make it cleaner and more Optimizing Way.People told me its not a c or some other Language to make it optimize but Im need of code that makes my code more optimized

I need code for these condtions can anyone help me as my countif failed.Im little helpless in doing so.

1)I have a column and I should check for duplicates in that column and delete that row if it is a duplicate

2) I have 35000 Old entries in the column and I have new entries 2000 everyweek these are appended. I need to check these 2000 entries from the total 37000 ( as we appened we get 35000+2000) and these delete operation need to be performed only on the newly appended 2000 entries but it should check the duplicates for entire column

Let me explain you clearly I have 2000 entries newly added,so Only these entries are to be checked for the duplicates from the 35000 entries and also from itself (2000 entries) and delete it if it is a duplicate and no duplicating operation should be performed on the 35000 entries old data.

I have found some codes but they are deleting even the duplicates of the 35000 entries. I have set the range but even though its not working.
Can anyone help me with the best code that takes less time?please thank you

Updating my question with the sample code I have

   A       B            F       G        H       I              Y          
  PTY   39868.5         4       2       540      3      PTY39868.5425403 
  GTY   34446.1234      2       1       230      1      GTY34446.1234212301
  PTY   3945.678                2                2       PTY3945.67822
  GTY   34446.1234      2       1       230      1      GTY34446.1234212301
                  let us say these are old 35000 entries

Explaination to the above example.

The above are the 35000 entries. I have to check A,B,F,G,H,I columns for the dupes, if they are same I have to delete the row, I should not bother about the other columns c,d etc. so what I did is I have used one unused column Y and concatenated these 6 columns values into 1 at Y column using these

  = A2 & B2 & F2 & G2 & H2 &I2 with the respective columns

Now checking the Y column for dupes and delete the entire row. as 2003 supports only for one column as far to my knowledge.

Notice that even the 35000 entries may have duplicates in it but I should not delete them. Example you can see the 2 and last row in my example code are dupes but I should not delete
as it is the old data.

   A       B            F       G        H       I              Y          
  PTY   39868.5         4       2       540      3      PTY39868.5425403     'old 
  GTY   34446.1234      2       1       230      1      GTY34446.1234212301   'old
  PTY   3945.678                2                2       PTY3945.67822        'old
  GTY   34446.1234      2       1       230      1      GTY34446.1234212301    'old
  PTY    3945.678       1       1       230      2      PTY3945.678112302      'new
  PTY    39868.5        4       2       540      3      PTY39868.5425403       'new 
  PTY    3945.678       1       1       230      2      PTY3945.678112302      'new

Now note that New entry PTY (from last 2nd) is a duplicate of the original record(PTY at first) So I hava to delete it.And the last new entry is a duplicate of the new entry itself so I should delete it even that . SO in the above code I have to delete only the last 2 rows which are dupes of original record and also from it . But should not delete the GTY which is the dupe but which is in orginal record.

I think I gave a clear view now. Is concatenating them into one cell . Is it better way to approach? as conactenatin for 40000 entries taking just 2 seconds i think that doesnt matter but any more algorithms to these is much aprreciated

I heard counif treats 45.00 and 45.00000 as different is that right may be that was the problem with it? since I have decimal points in my data. I think I should do

    = I2 & H2 & G2 & F2 & A2 & B2

which is better to concatenate? is this or the other i posted before?

如果你对这篇内容有疑问,欢迎到本站社区发帖提问 参与讨论,获取更多帮助,或者扫码二维码加入 Web 技术交流群。

扫码二维码加入Web技术交流群

发布评论

需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。

评论(7

听你说爱我 2024-12-09 06:14:12

好的,这是高级过滤方法。不知道是否比字典方法快。不过,知道这一点会很有趣,所以在你尝试后告诉我。我还包括了删除部分,因此如果您想进行真正的比较,则必须停止该部分。另外,您可以将其设为函数而不是子函数,然后放入变量中,但是您想更改它。

Sub DeleteRepeats()

    Dim d1 As Double
    Dim r1 As Range, rKeepers As Range
    Dim wks As Worksheet


    d1 = Timer
    Set wks = ActiveSheet
    Application.EnableEvents = False
    Application.ScreenUpdating = False

    'Make sure all rows are visible
    On Error Resume Next
    wks.ShowAllData
    wks.UsedRange.Rows.Hidden = False
    wks.UsedRange.Columns.Hidden = False
    On Error GoTo 0

    'Get concerned range
    Set r1 = wks.Range("A1:A35000")
    'Filter
    r1.AdvancedFilter Action:=xlFilterInPlace, Unique:=True

    'Get range of cells not to be deleted
    Set rKeepers = r1.SpecialCells(xlCellTypeVisible)
    On Error Resume Next
    wks.ShowAllData
    On Error GoTo 0
    rKeepers.EntireRow.Hidden = True

    'Delete all undesirables
    r1.SpecialCells(xlCellTypeVisible).EntireRow.Delete

    'show all rows
    On Error Resume Next
    wks.UsedRange.Rows.Hidden = False
    On Error GoTo 0

    Application.EnableEvents = False
    Application.ScreenUpdating = False

    Debug.Print Timer() - d1

End Sub

好的,这是 Doc 和 Issun 对词典的使用。之前我并不相信,但在查看并测试它并与高级过滤器进行比较之后,我确信字典更适合此应用程序。我不知道为什么 Excel 在这一点上没有更快,因为它们应该使用更快的算法,这不是隐藏、取消隐藏行,因为这种情况发生得非常快。所以如果有人知道,请告诉我。在我的慢速计算机上,此过程只需要 1 秒多一点的时间:

Sub FindDupesAndDelete()

    Dim d1 As Double
    Dim dict As Object
    Dim sh As Worksheet
    Dim v1 As Variant
'    Dim s1() As String
    Dim rDelete As Range
    Dim bUnion As Boolean

    d1 = Timer()
    bUnion = False
    Set dict = CreateObject("Scripting.Dictionary")
    Set sh = ActiveSheet
    v1 = Application.Transpose(sh.Range("A1", "A" _
          & sh.Cells.SpecialCells(xlCellTypeLastCell).row))

'    ReDim s1(1 To UBound(v1))

    Dim row As Long, value As String ', newEntry As Boolean
    For row = 1 To sh.Cells.SpecialCells(xlCellTypeLastCell).row
        value = v1(row)

        If dict.Exists(value) Then
'            newEntry = False
            If bUnion Then
                Set rDelete = Union(rDelete, sh.Range("A" & row))
            Else
                Set rDelete = sh.Range("A" & row)
                bUnion = True
            End If
        Else
'            newEntry = True
            dict.Add value, 1
        End If
'        s1(row) = newEntry

    Next
    rDelete.EntireRow.Delete
'    sh.Range("B1", "B" & UBound(v1)) = Application.Transpose(s1)
    Debug.Print Timer() - d1
End Sub

OK, here's the advancedfilter method. Don't know if it is faster than the dictionary method. It would be interesting to know though, so let me know after you try it. I also included the delete portion so you would have to stop that portion if you want to do a true comparison. Also, you can make this a function instead of a sub and put in your variables, however you want to change it.

Sub DeleteRepeats()

    Dim d1 As Double
    Dim r1 As Range, rKeepers As Range
    Dim wks As Worksheet


    d1 = Timer
    Set wks = ActiveSheet
    Application.EnableEvents = False
    Application.ScreenUpdating = False

    'Make sure all rows are visible
    On Error Resume Next
    wks.ShowAllData
    wks.UsedRange.Rows.Hidden = False
    wks.UsedRange.Columns.Hidden = False
    On Error GoTo 0

    'Get concerned range
    Set r1 = wks.Range("A1:A35000")
    'Filter
    r1.AdvancedFilter Action:=xlFilterInPlace, Unique:=True

    'Get range of cells not to be deleted
    Set rKeepers = r1.SpecialCells(xlCellTypeVisible)
    On Error Resume Next
    wks.ShowAllData
    On Error GoTo 0
    rKeepers.EntireRow.Hidden = True

    'Delete all undesirables
    r1.SpecialCells(xlCellTypeVisible).EntireRow.Delete

    'show all rows
    On Error Resume Next
    wks.UsedRange.Rows.Hidden = False
    On Error GoTo 0

    Application.EnableEvents = False
    Application.ScreenUpdating = False

    Debug.Print Timer() - d1

End Sub

OK, here's a take on Doc's and Issun's use of Dictionaries. Before I wasn't convinced but after looking at it and testing it and comparing to advanced filter, I am convinced, dictionaries are better for this application. I don't know why Excel isn't faster on this point since they should be using faster algorithms, it's not the hiding, unhiding of the rows since that happens very quickly. So if anyone knows, let me know. This procedure takes just over 1 second on my slow computer:

Sub FindDupesAndDelete()

    Dim d1 As Double
    Dim dict As Object
    Dim sh As Worksheet
    Dim v1 As Variant
'    Dim s1() As String
    Dim rDelete As Range
    Dim bUnion As Boolean

    d1 = Timer()
    bUnion = False
    Set dict = CreateObject("Scripting.Dictionary")
    Set sh = ActiveSheet
    v1 = Application.Transpose(sh.Range("A1", "A" _
          & sh.Cells.SpecialCells(xlCellTypeLastCell).row))

'    ReDim s1(1 To UBound(v1))

    Dim row As Long, value As String ', newEntry As Boolean
    For row = 1 To sh.Cells.SpecialCells(xlCellTypeLastCell).row
        value = v1(row)

        If dict.Exists(value) Then
'            newEntry = False
            If bUnion Then
                Set rDelete = Union(rDelete, sh.Range("A" & row))
            Else
                Set rDelete = sh.Range("A" & row)
                bUnion = True
            End If
        Else
'            newEntry = True
            dict.Add value, 1
        End If
'        s1(row) = newEntry

    Next
    rDelete.EntireRow.Delete
'    sh.Range("B1", "B" & UBound(v1)) = Application.Transpose(s1)
    Debug.Print Timer() - d1
End Sub
╰◇生如夏花灿烂 2024-12-09 06:14:12

好的,现在我们有更多信息,这是一个解决方案。它应该几乎立即执行。

该代码的工作原理是用连接公式填充 y 列。然后,它将所有 y 列添加到字典中,并使用字典将每一行标记为 z 列中的重复项。然后它会删除第 35000 行之后找到的所有重复项。最后它会清除 y 列和 z 列以删除冗余数据。

Sub RemoveDuplicates()
    Dim vData As Variant, vArray As Variant
    Dim lRow As Long

    '// Get used range of column A (excluding header) and offset to get column y 
    With ActiveSheet.Range("A2", Cells(Rows.Count, "A").End(xlUp)).Offset(, 24)
        '// Adds the concatenate formula to the sheet column (y)
        .FormulaR1C1 = "=RC[-24]&RC[-23]&RC[-19]&RC[-18]&RC[-17]&RC[-16]"
        '// Adds the formula results to an array
        vData = .Resize(, 1).value
    End With

    '// Re dimension the array to the correct size 
    ReDim vArray(1 To UBound(vData, 1), 0)

    '// Create a dictionary object using late binding
    With CreateObject("Scripting.Dictionary")
        '// Loop through each row in the array
        For lRow = 1 To UBound(vData, 1)
            '// Check if value exists in the array
            If Not .exists(vData(lRow, 1)) Then
                '// Value does not exist mark as non duplicate.
                vArray(lRow, 0) = "x"
                '//  Add value to dictionary
                .Add vData(lRow, 1), Nothing
            End If
        Next lRow
    End With

    '// Turn off screen updating to speed up code and prevent screen flicker
    Application.ScreenUpdating = False    

    With ActiveSheet
        '// Populate column z with the array
        .Range("Z2").Resize(UBound(vArray, 1)) = vArray
        '// Use error handling as speciallcells throws an error when none exist.
        On Error Resume Next
        '// Delete all blank cells in column z
        .Range("Y35001", .Cells(Rows.Count, "Y").End(xlUp)).Offset(, 1).SpecialCells(xlCellTypeBlanks).EntireRow.Delete
        '// Remove error handling
        On Error GoTo 0
        '// Clear columns y and z
        .Columns(25).Resize(, 2).ClearContents
    End With

   '// Turn screen updating back on.
   Application.ScreenUpdating = True
End Sub

注意:如果需要,您可以将所有引用“activesheet”更改为您的工作表代号。

注意2:它假设您有标题并且单独保留了第 1 行。

我已尽力使用您的专栏和测试数据。这是我使用的测试填充:

Sub TestFill()

    For i = 1 To 37000
        With Range("A" & i)
            .value = Choose(Int(2 * Rnd + 1), "PTY", "GTY")
            .Offset(, 1).value = Round((40000 * (Rnd + 1)), Choose(Int(4 * Rnd + 1), 1, 2, 3, 4))
            .Offset(, 5).value = Int(4 * Rnd + 1)
            .Offset(, 6).value = Int(2 * Rnd + 1)
            .Offset(, 7).value = Choose(Int(2 * Rnd + 1), "230", "540")
            .Offset(, 8).value = Int(3 * Rnd + 1)
        End With
    Next i

End Sub

Okay so now we have some more info here is a solution. It should execute almost instantly.

The code works by filling column y with your concatenate formula. It then adds all of column y to a dictionary and using the dictionary marks each row as a duplicate in column z. It then removes all the duplicates found after row 35000. Then finally it clears both column y and column z to remove the redundant data.

Sub RemoveDuplicates()
    Dim vData As Variant, vArray As Variant
    Dim lRow As Long

    '// Get used range of column A (excluding header) and offset to get column y 
    With ActiveSheet.Range("A2", Cells(Rows.Count, "A").End(xlUp)).Offset(, 24)
        '// Adds the concatenate formula to the sheet column (y)
        .FormulaR1C1 = "=RC[-24]&RC[-23]&RC[-19]&RC[-18]&RC[-17]&RC[-16]"
        '// Adds the formula results to an array
        vData = .Resize(, 1).value
    End With

    '// Re dimension the array to the correct size 
    ReDim vArray(1 To UBound(vData, 1), 0)

    '// Create a dictionary object using late binding
    With CreateObject("Scripting.Dictionary")
        '// Loop through each row in the array
        For lRow = 1 To UBound(vData, 1)
            '// Check if value exists in the array
            If Not .exists(vData(lRow, 1)) Then
                '// Value does not exist mark as non duplicate.
                vArray(lRow, 0) = "x"
                '//  Add value to dictionary
                .Add vData(lRow, 1), Nothing
            End If
        Next lRow
    End With

    '// Turn off screen updating to speed up code and prevent screen flicker
    Application.ScreenUpdating = False    

    With ActiveSheet
        '// Populate column z with the array
        .Range("Z2").Resize(UBound(vArray, 1)) = vArray
        '// Use error handling as speciallcells throws an error when none exist.
        On Error Resume Next
        '// Delete all blank cells in column z
        .Range("Y35001", .Cells(Rows.Count, "Y").End(xlUp)).Offset(, 1).SpecialCells(xlCellTypeBlanks).EntireRow.Delete
        '// Remove error handling
        On Error GoTo 0
        '// Clear columns y and z
        .Columns(25).Resize(, 2).ClearContents
    End With

   '// Turn screen updating back on.
   Application.ScreenUpdating = True
End Sub

NOTE: you can change all references "activesheet" to your sheet codename if you want.

NOTE2: it assumes you have headers and has left row 1 alone.

I have used your columns and test data as best I can. Here is the test fill I used:

Sub TestFill()

    For i = 1 To 37000
        With Range("A" & i)
            .value = Choose(Int(2 * Rnd + 1), "PTY", "GTY")
            .Offset(, 1).value = Round((40000 * (Rnd + 1)), Choose(Int(4 * Rnd + 1), 1, 2, 3, 4))
            .Offset(, 5).value = Int(4 * Rnd + 1)
            .Offset(, 6).value = Int(2 * Rnd + 1)
            .Offset(, 7).value = Choose(Int(2 * Rnd + 1), "230", "540")
            .Offset(, 8).value = Int(3 * Rnd + 1)
        End With
    Next i

End Sub
Oo萌小芽oO 2024-12-09 06:14:12

假设您的条目位于 A 列中,并且您希望公式的结果位于 B 列中(但速度要快得多)。这个 VBA 宏应该可以解决这个问题:(

Option Explicit
Sub FindDupes()
    Dim dict As Object
    Dim sh As Worksheet
    Set dict = CreateObject("Scripting.Dictionary")
    Set sh = ActiveSheet

    Dim row As Long, value As String
    For row = 1 To sh.Cells.SpecialCells(xlCellTypeLastCell).row
        value = sh.Range("A" & row).Text
        If dict.Exists(value) Then
            sh.Range("B" & row) = "False"
        Else
            sh.Range("B" & row) = "True"
            dict.Add value, 1
        End If
    Next
End Sub

使用字典在这里给出几乎线性的运行时间,对于 35.0000 行来说,这应该是几秒钟的事情,其中​​你的原始公式具有二次运行时间复杂性)。

编辑:根据您的评论:您必须首先通过阅读每个条目至少一次来填充字典,这是您无法轻易避免的事情。您可以避免的是在 B 列的行已经填充时再次填充它们:

Option Explicit
Sub FindDupes()
    Dim dict As Object
    Dim sh As Worksheet
    Set dict = CreateObject("Scripting.Dictionary")
    Set sh = ActiveSheet

    Dim row As Long, value As String, newEntry As Boolean
    For row = 1 To sh.Cells.SpecialCells(xlCellTypeLastCell).row
        value = sh.Range("A" & row).Text

        If dict.Exists(value) Then
            newEntry = False
        Else
            newEntry = True
            dict.Add value, 1
        End If
        If Trim(sh.Range("B" & row)) = "" Then sh.Range("B" & row) = newEntry
    Next
End Sub

但我怀疑这不会比我的第一个解决方案快很多。

Lets say you have your entries in column A, and you want the result of your formula in column B (but much faster). This VBA macro should do the trick:

Option Explicit
Sub FindDupes()
    Dim dict As Object
    Dim sh As Worksheet
    Set dict = CreateObject("Scripting.Dictionary")
    Set sh = ActiveSheet

    Dim row As Long, value As String
    For row = 1 To sh.Cells.SpecialCells(xlCellTypeLastCell).row
        value = sh.Range("A" & row).Text
        If dict.Exists(value) Then
            sh.Range("B" & row) = "False"
        Else
            sh.Range("B" & row) = "True"
            dict.Add value, 1
        End If
    Next
End Sub

(Using a dictionary gives here almost linear running time, which should be a matter of seconds for 35.0000 rows, where your original formula had quadratic running time complexity).

Edit: due to your comment: you will have to fill the dictionary first by reading each entry at least once, that is something you cannot avoid easily. What you can avoid is to fill the rows of column B again when they are already filled:

Option Explicit
Sub FindDupes()
    Dim dict As Object
    Dim sh As Worksheet
    Set dict = CreateObject("Scripting.Dictionary")
    Set sh = ActiveSheet

    Dim row As Long, value As String, newEntry As Boolean
    For row = 1 To sh.Cells.SpecialCells(xlCellTypeLastCell).row
        value = sh.Range("A" & row).Text

        If dict.Exists(value) Then
            newEntry = False
        Else
            newEntry = True
            dict.Add value, 1
        End If
        If Trim(sh.Range("B" & row)) = "" Then sh.Range("B" & row) = newEntry
    Next
End Sub

But I suspect this won't be much faster than my first solution.

梦里°也失望 2024-12-09 06:14:12

现在您已经更新了您希望删除整个行并且允许前 35000 行有重复项,这里有一个函数可以为您完成此操作。我想我想出了一个聪明的方法,而且速度也非常快:

Sub RemoveNewDupes()

Application.ScreenUpdating = False
Dim lastRow As Long
Dim varray As Variant
Dim oldDict As Object, newDict As Object
Set oldDict = CreateObject("scripting.dictionary")
Set newDict = CreateObject("scripting.dictionary")

On Error Resume Next
lastRow = Range("A" & Rows.Count).End(xlUp).Row

'Add old entries to dictionary
varray = Range("A1:A35000").Value
For i = 1 To UBound(varray, 1)
    oldDict.Add varray(i, 1), 1
Next

'Check for dupes
varray = Range("A35001:A" & lastRow).Value
For i = 35000 + UBound(varray, 1) To 35001 Step -1
    If oldDict.exists(varray(i - 35000, 1)) = True Or _
       newDict.exists(varray(i - 35000, 1)) = True Then
        Range("A" & i).EntireRow.Delete
    Else
        newDict.Add varray(i - 35000, 1), 1
    End If
Next

Application.ScreenUpdating = True

'A status message at the end for finishing touch
MsgBox UBound(varray, 1) - newDict.Count & _
" duplicate row(s) found and deleted."

End Sub

工作原理

首先,我将 35000 个单元格存储到一个字典文件中。然后,我获取从 35001 开始的每个单元格的变体数组,并向后循环它们,看看它是否在 35k 字典中,或者我们在循环中尚未遇到该值。如果它发现它是一个骗局,它就会删除该行。

它执行行删除的最酷的(如果我可以说)方式是,当您创建变量时,例如 A35001 - A37000,它将它们存储为 (1, 1) (2, 1) (...)。因此,如果将“i”设置为数组的 Ubound + 35000 并返回到 35001,则将从 A37000 到 A35001 向后循环所有加法。然后,当您想要删除该行时,“i”完美地设置为找到该值的行号,因此您可以删除它。而且由于它是倒退的,所以不会搞砸循环!

Now that you have updated that you want the entire rows deleted and that the first 35000 rows are allowed to have dupes, here is a function that will do that for you. I think I came up with a clever method and it's blazing fast, too:

Sub RemoveNewDupes()

Application.ScreenUpdating = False
Dim lastRow As Long
Dim varray As Variant
Dim oldDict As Object, newDict As Object
Set oldDict = CreateObject("scripting.dictionary")
Set newDict = CreateObject("scripting.dictionary")

On Error Resume Next
lastRow = Range("A" & Rows.Count).End(xlUp).Row

'Add old entries to dictionary
varray = Range("A1:A35000").Value
For i = 1 To UBound(varray, 1)
    oldDict.Add varray(i, 1), 1
Next

'Check for dupes
varray = Range("A35001:A" & lastRow).Value
For i = 35000 + UBound(varray, 1) To 35001 Step -1
    If oldDict.exists(varray(i - 35000, 1)) = True Or _
       newDict.exists(varray(i - 35000, 1)) = True Then
        Range("A" & i).EntireRow.Delete
    Else
        newDict.Add varray(i - 35000, 1), 1
    End If
Next

Application.ScreenUpdating = True

'A status message at the end for finishing touch
MsgBox UBound(varray, 1) - newDict.Count & _
" duplicate row(s) found and deleted."

End Sub

How it works:

First I store the 35000 cells into a dictionary file. Then I take a variant array of every cell 35001 onward and loop through them backwards to see if it's in the 35k dictionary or not, or that we haven't come across the value yet in the loop. If it finds that it's a dupe, it deletes the row.

The cool (if I may say) way that it does the row deletion is that when you create the varray, for say A35001 - A37000, it stores them as (1, 1) (2, 1) (...). So if you set "i" to the Ubound of the array + 35000 and step back to 35001, you will loop through all the additions backwardsfrom A37000 to A35001. Then when you want to delete the row, "i" is perfectly set to the row number the value was found in, so you can delete it. And since it goes backwards, it does not screw up the loop!

狼性发作 2024-12-09 06:14:11

大更新

它认为原来的问题让我失望 - 问题中的逻辑可能有问题。以下假设您要删除重复条目的单元格,而不是整行。

  • 如果 35000 条旧记录不包含重复项,那么您所需要做的就是删除整列中的所有重复项 - 只要从第 1 行开始,就不会存在删除任何“旧”行的风险,因为没有重复项存在于他们之中。

这是一种方法:

Sub UniqueList()

Application.ScreenUpdating = False
Dim vArray As Variant
Dim i As Long, j As Long, lastrow As Long
Dim dictionary As Object
Set dictionary = CreateObject("scripting.dictionary")

lastrow = Range("A" & Rows.Count).End(xlUp).Row
vArray = Range("A1:A" & lastrow).Value

On Error Resume Next
For i = 1 To UBound(vArray, 1)
    For j = 1 To UBound(vArray, 2)
        If Len(vArray(i, j)) <> 0 Then
            dictionary(vArray(i, j)) = 1
        End If
    Next
Next

Columns("A:A").ClearContents
Range("A1").Resize(dictionary.Count).Value = _
Application.Transpose(dictionary.keys)

Application.ScreenUpdating = True

End Sub
  • 如果由于某种奇怪的原因,35000 条旧记录确实包含重复项,而您只想允许这 35000 条记录这样做,那么您可以使用 2 个字典,但这将是一种不寻常的情况,因为您将处理旧记录与新记录不同......
Sub RemoveNewDupes()

Application.ScreenUpdating = False
Dim lastRow As Long
Dim varray As Variant
Dim oldDict As Object, newDict As Object
Set oldDict = CreateObject("scripting.dictionary")
Set newDict = CreateObject("scripting.dictionary")

On Error Resume Next
lastRow = Range("A" & Rows.Count).End(xlUp).Row

'Add old entries to dictionary
varray = Range("A1:A35000").Value
For i = 1 To UBound(varray, 1)
    oldDict.Add varray(i, 1), 1
Next

'Check for dupes
varray = Range("A35001:A" & lastRow).Value
For i = 1 To UBound(varray, 1)
    If oldDict.exists(varray(i, 1)) = False Then
        newDict.Add varray(i, 1), 1
    End If
Next

'Delete and slap back on the unique list
Range("A35001", "A" & Rows.Count).ClearContents
Range("A35001").Resize(newDict.Count).Value = _
Application.Transpose(newDict.keys)

Application.ScreenUpdating = True
End Sub

感谢 Reafidy 的建议并让我重新审视这一点。

BIG UPDATE:

It think the original questions threw me off - there may be a problem with the logic in the question. The following assumes you want to delete the cell, not entire row, for the duplicate entries.

  • If the 35000 old records do not include duplicates, then all you need to do is remove all duplicates from the entire column - so long as you start from row 1, you run no risk of deleting any of the 'old' rows since no duplicates exist in them.

Here is one way:

Sub UniqueList()

Application.ScreenUpdating = False
Dim vArray As Variant
Dim i As Long, j As Long, lastrow As Long
Dim dictionary As Object
Set dictionary = CreateObject("scripting.dictionary")

lastrow = Range("A" & Rows.Count).End(xlUp).Row
vArray = Range("A1:A" & lastrow).Value

On Error Resume Next
For i = 1 To UBound(vArray, 1)
    For j = 1 To UBound(vArray, 2)
        If Len(vArray(i, j)) <> 0 Then
            dictionary(vArray(i, j)) = 1
        End If
    Next
Next

Columns("A:A").ClearContents
Range("A1").Resize(dictionary.Count).Value = _
Application.Transpose(dictionary.keys)

Application.ScreenUpdating = True

End Sub
  • If for some odd reason the 35000 old records DO include dupes and you only want to allow these 35000 records to do so, then you can use 2 dictionaries, but this would be an unusual case since you'd be treating the old records differently than new...
Sub RemoveNewDupes()

Application.ScreenUpdating = False
Dim lastRow As Long
Dim varray As Variant
Dim oldDict As Object, newDict As Object
Set oldDict = CreateObject("scripting.dictionary")
Set newDict = CreateObject("scripting.dictionary")

On Error Resume Next
lastRow = Range("A" & Rows.Count).End(xlUp).Row

'Add old entries to dictionary
varray = Range("A1:A35000").Value
For i = 1 To UBound(varray, 1)
    oldDict.Add varray(i, 1), 1
Next

'Check for dupes
varray = Range("A35001:A" & lastRow).Value
For i = 1 To UBound(varray, 1)
    If oldDict.exists(varray(i, 1)) = False Then
        newDict.Add varray(i, 1), 1
    End If
Next

'Delete and slap back on the unique list
Range("A35001", "A" & Rows.Count).ClearContents
Range("A35001").Resize(newDict.Count).Value = _
Application.Transpose(newDict.keys)

Application.ScreenUpdating = True
End Sub

Thanks to Reafidy for the advice and getting me to relook at this.

我爱人 2024-12-09 06:14:11

这也是对其他成员的一些评论和解决方案的回应,如果没有立即回答您的问题,我们深表歉意。

首先我认为在数据库场景中使用excel应该将原始数据和演示数据分开。这通常意味着包含原始数据的单个工作表和包含演示数据的多个其他工作表。然后在必要时删除原始数据或存档。

当速度测试时,很难在 Excel 中获得公平的竞争环境,因为有很多因素会影响结果。计算机规格、可用 RAM 等。在运行任何程序之前必须首先编译代码。在考虑重复项时,测试数据也很重要 - 有多少重复项与多少行。此子加载一些测试数据,改变行数与随机数(重复)的范围将为您的代码提供非常不同的结果。我不知道您的数据是什么样的,所以我们有点盲目工作,您的结果可能会非常不同。

'// This is still not very good test data, but should suffice for this situation.
Sub TestFill()
    '// 300000 rows
    For i = 1 To 300000
        '// This populates a random number between 1 & 10000 - adjust to suit
        Cells(i, "A").value = Int((100000 + 1) * Rnd + 1)
    Next
End Sub

如果我们谈论的是高级过滤器与数组&字典方法则行数较少的高级过滤器会更快,但一旦行数超过一定数量,则数组方法会更快。然后看看当您更改重复项的数量时会发生什么......:)
作为指导原则或一般规则,使用 Excel 内置函数会更快,我建议始终尝试使用这些内置函数进行开发,但是通常会出现例外情况,例如上面删除重复项时的情况。 :)

如果使用不当,循环时删除行可能会很慢。如果使用循环,那么在循环之外保持代码和工作簿之间的同步非常重要。这通常意味着将数据读取到数组,循环访问数据,然后将数据从数组加载回演示工作表,本质上是删除不需要的数据。

Sub RemoveDuplicatesA()

    '// Copy raw data to presentation sheet
    Range("A1", Cells(Rows.Count, "A").End(xlUp)).AdvancedFilter _
        Action:=xlFilterCopy, CopyToRange:=Sheet2.Range("B1"), Unique:=True

End Sub

这将是最快的方法:

Sub RemoveDuplicatesB()        
    Dim vData As Variant, vArray As Variant
    Dim lCnt As Long, lRow As Long

    vData = ActiveSheet.UsedRange.Columns(1).value
    ReDim vArray(0 To UBound(vData, 1), 0)
    lCnt = 0

    With CreateObject("Scripting.Dictionary")
        For lRow = 1 To UBound(vData, 1)
            If Not .Exists(vData(lRow, 1)) Then
                vArray(lCnt, 0) = vData(lRow, 1): lCnt = lCnt + 1
                .Add vData(lRow, 1), Nothing
            End If
        Next lRow
    End With

    '// Copy raw data to presentation sheet
    Sheet2.Range("B1").Resize(lCnt).value = vArray

End Sub

应用程序转置有 65536 行的限制,但由于您使用的是 2003,您应该可以很好地使用它,因此您可以使用以下命令简化上面的代码:

Sub RemoveDuplicatesC()
    Dim vData As Variant
    Dim lRow As Long

    vData = ActiveSheet.UsedRange.Columns(1).value

    With CreateObject("Scripting.Dictionary")
        For lRow = 1 To UBound(vData, 1)
            If Not .exists(vData(lRow, 1)) Then
                .Add vData(lRow, 1), Nothing
            End If
        Next lRow

        '// Copy raw data to presentation sheet or replace raw data
        Sheet2.Columns(2).ClearContents
        Sheet2.Columns(2).Resize(.Count).value = Application.Transpose(.keys)
    End With

End Sub 

编辑

好的,@Issun 有提到您希望删除整行。我的建议是通过原始数据和演示文稿来改进电子表格布局,这意味着您不需要删除任何内容,因此这将是最快的方法。如果您不想这样做并想直接编辑原始数据,请尝试以下操作:

 Sub RemoveDuplicatesD()
    Dim vData As Variant, vArray As Variant
    Dim lRow As Long       

    vData = ActiveSheet.UsedRange.Columns(1).value
    ReDim vArray(1 To UBound(vData, 1), 0)     

    With CreateObject("Scripting.Dictionary")
        For lRow = 1 To UBound(vData, 1)
            If Not .exists(vData(lRow, 1)) Then
                varray(lRow, 0) = "x"
                .Add vData(lRow, 1), Nothing
            End If
        Next lRow
    End With

    Application.ScreenUpdating = False

    '// Modify the raw data
    With ActiveSheet
        .Columns(2).Insert
        .Range("B1").Resize(lRow).value = vArray
        .Columns(2).SpecialCells(xlCellTypeBlanks).EntireRow.Delete
        .Columns(2).Delete
    End With

    Application.ScreenUpdating = True
End Sub

This is also a response to some of the comments and solutions made by other members so sorry if it does not straight away answer your question.

Firstly I believe that using excel in a database scenario that raw data and presentation data should be separated. This usually means a single worksheet with raw data and multiple other worksheets with presentation data. Then delete the raw data when necessary or archive.

When speed testing it is very difficult to get a level playing field in excel as there are many things that affect the results. Computer specs, available RAM etc.. Code must first be compiled before running any of the procedures. The test data is also important, when considering duplicates - how many duplicates vs how many rows. This sub loads some test data, altering the amount of rows vs the range of random numbers (duplicates) will give very different results for your code. I don't know what your data looks like so we are kind of working blind and your results may be very different.

'// This is still not very good test data, but should suffice for this situation.
Sub TestFill()
    '// 300000 rows
    For i = 1 To 300000
        '// This populates a random number between 1 & 10000 - adjust to suit
        Cells(i, "A").value = Int((100000 + 1) * Rnd + 1)
    Next
End Sub

If we are talking about advanced filter vs an array & dictonary method then advanced filter will be quicker with a lower amount of rows but once you get above a certain amount of rows then the array method will be quicker. Then see what happens when you change the amount of duplicates.... :)
As a guideline or as a general rule using excels built in functions will be faster and I recommend always develop attempting to use these inbuilt functions, however there are often exceptions, like above when removing duplicates. :)

Deleting rows can be slow when looping if used incorrectly. If looping is used then it is important to keep synchronisation between code and the workbook out of the loop. This usually means read data to an array, loop through the data, then load the data from the array back to the presentation worksheet essentially deleting the unwanted data.

Sub RemoveDuplicatesA()

    '// Copy raw data to presentation sheet
    Range("A1", Cells(Rows.Count, "A").End(xlUp)).AdvancedFilter _
        Action:=xlFilterCopy, CopyToRange:=Sheet2.Range("B1"), Unique:=True

End Sub

This will be the fastest method:

Sub RemoveDuplicatesB()        
    Dim vData As Variant, vArray As Variant
    Dim lCnt As Long, lRow As Long

    vData = ActiveSheet.UsedRange.Columns(1).value
    ReDim vArray(0 To UBound(vData, 1), 0)
    lCnt = 0

    With CreateObject("Scripting.Dictionary")
        For lRow = 1 To UBound(vData, 1)
            If Not .Exists(vData(lRow, 1)) Then
                vArray(lCnt, 0) = vData(lRow, 1): lCnt = lCnt + 1
                .Add vData(lRow, 1), Nothing
            End If
        Next lRow
    End With

    '// Copy raw data to presentation sheet
    Sheet2.Range("B1").Resize(lCnt).value = vArray

End Sub

Application transpose has a limitation of 65536 rows but as you are using 2003 you should be fine using it, therefore you can simplify the above code with:

Sub RemoveDuplicatesC()
    Dim vData As Variant
    Dim lRow As Long

    vData = ActiveSheet.UsedRange.Columns(1).value

    With CreateObject("Scripting.Dictionary")
        For lRow = 1 To UBound(vData, 1)
            If Not .exists(vData(lRow, 1)) Then
                .Add vData(lRow, 1), Nothing
            End If
        Next lRow

        '// Copy raw data to presentation sheet or replace raw data
        Sheet2.Columns(2).ClearContents
        Sheet2.Columns(2).Resize(.Count).value = Application.Transpose(.keys)
    End With

End Sub 

EDIT

Okay so @Issun has mentioned you want the entire row deleted. My suggestion was to improve your spreadsheet layout by having a raw data and presentation sheet which means you dont need to delete anything hence it would have been the fastest method. If you dont want to do that and would like to edit the raw data directly then try this:

 Sub RemoveDuplicatesD()
    Dim vData As Variant, vArray As Variant
    Dim lRow As Long       

    vData = ActiveSheet.UsedRange.Columns(1).value
    ReDim vArray(1 To UBound(vData, 1), 0)     

    With CreateObject("Scripting.Dictionary")
        For lRow = 1 To UBound(vData, 1)
            If Not .exists(vData(lRow, 1)) Then
                varray(lRow, 0) = "x"
                .Add vData(lRow, 1), Nothing
            End If
        Next lRow
    End With

    Application.ScreenUpdating = False

    '// Modify the raw data
    With ActiveSheet
        .Columns(2).Insert
        .Range("B1").Resize(lRow).value = vArray
        .Columns(2).SpecialCells(xlCellTypeBlanks).EntireRow.Delete
        .Columns(2).Delete
    End With

    Application.ScreenUpdating = True
End Sub
吻风 2024-12-09 06:14:11

在从头开始编写整个代码之前,您可以尝试以下一些操作:

优化您的 VBA
网上有一些关于优化 vba 的技巧。特别是,您可以执行以下操作:

'turn off some Excel functionality so your code runs faster
'these two are especially very efficient
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
'use these if you really need to
Application.DisplayStatusBar = False
Application.EnableEvents = False

'code goes here

'at the end, restore the default behavior
'calculate the formulas
Application.Calculate
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
Application.DisplayStatusBar = True
Application.EnableEvents = True

请参阅此处了解更多信息

优化您的算法
特别是当您插入 COUNTIF 公式时,您可以尝试填写而不是在每行中插入公式。

在删除行部分,您应该尝试我在上一个线程中为您提供的解决方案: 删除 excel 2003 vba 列中的重复条目 以首先过滤True 值,然后删除可见单元格。这可能是最快的方法。

[编辑] 看来布朗博士的答案可能是处理这个问题的最佳方法(嘿,这是一个不是 Issun 编写的字典解决方案:))。无论如何,VBA 优化技巧仍然相关,因为这是一种语言。

Before starting again from scratch your whole code, here are a few things you can try:

Optimize your VBA
There are several tips on the web about optimizing vba. In particular, you can do:

'turn off some Excel functionality so your code runs faster
'these two are especially very efficient
Application.ScreenUpdating = False
Application.Calculation = xlCalculationManual
'use these if you really need to
Application.DisplayStatusBar = False
Application.EnableEvents = False

'code goes here

'at the end, restore the default behavior
'calculate the formulas
Application.Calculate
Application.ScreenUpdating = True
Application.Calculation = xlCalculationAutomatic
Application.DisplayStatusBar = True
Application.EnableEvents = True

See here for more information

Optimize your algorithm
Especially when your inserting your COUNTIF formula, you can try to fill in instead of inserting the formula in each row.

On the deleting row part, you should try the solution I gave you in your previous thread: Delete duplicate entries in a column in excel 2003 vba to filter first on the True values and then to delete the visible cells. It is probably the fastest way.

[EDIT] Seems like Doc Brown's answer would be probably the best way to handle this (hey, this is a dictionary solution that wasn't written by Issun :)). Anyway, the VBA optimization tips are still relevant because this is quite a slow language.

~没有更多了~
我们使用 Cookies 和其他技术来定制您的体验包括您的登录状态等。通过阅读我们的 隐私政策 了解更多相关信息。 单击 接受 或继续使用网站,即表示您同意使用 Cookies 和您的相关数据。
原文