VBA比較兩個Excel數據的異同


代碼背景:

  • 由於Excel本身無法簡單的比較兩個Excel數據的異同,所以用VBA編寫代碼的方式來實現。
  • 這里的比較條件是:數據行為單位,假設對應Sheet中沒有重復數據,對應數據行的所有列的數據都相等,即為此行數據相同。
  • 這里的兩個Sheet的數據行量級別大約為:50000 * 50000,數據列大約:50,對應Cell中的字符串大約100以內,中英文混合。
  • 如何在Excel中調出VBA的編寫工具,請參考如下鏈接: https://jingyan.baidu.com/article/63f236281f17650208ab3d97.html

 

整體來說,需求非常明確,若是不考慮效率的話,代碼邏輯比較簡單,循環比較即可。

相關代碼:

Sub CompareData()

    Dim i As Long
    Dim j As Long
    
    Dim fullSheetName As String
    fullSheetName = "Sheet1"
    Set fullSheet = Sheets(fullSheetName)
    Dim fullDataRange As Variant
    fullDataRange = fullSheet.Range("A1", "AT80000").CurrentRegion.Value
    Dim fullSheetRowMax As Long
    fullSheetRowMax = fullSheet.Range("A1", "AT80000").CurrentRegion.Rows.Count
    
    Dim partialSheetName As String
    partialSheetName = "Sheet2"
    Set partialSheet = Sheets(partialSheetName)
    Dim partialDataRange As Variant
    partialDataRange = partialSheet.Range("A1", "AT80000").CurrentRegion.Value
    Dim partialSheetRowMax As Long
    partialSheetRowMax = partialSheet.Range("A1", "AT80000").CurrentRegion.Rows.Count

    Dim columnMax As Integer
    columnMax = 46
    
    Dim columnMark As Integer
    columnMark = 48
    
    Dim sameRow As Boolean
    
    For i = 1 To fullSheetRowMax
        For j = 1 To partialSheetRowMax
            sameRow = True
            For columnIndex = 1 To columnMax
                If fullDataRange(i, columnIndex) <> partialDataRange(j, columnIndex) Then
                    sameRow = False
                    Exit For
                End If
            Next columnIndex
            
            If sameRow Then
                fullSheet.Cells(i, columnMark) = 1
                Exit For
            End If
        Next j
    Next i
    
    MsgBox "Successfully!"
End Sub
View Code

上述代碼實際運行大約用30分鍾完成此數量級的比較,大約1000億次的比較。

 

當然了我們需要更快的比較方式,那么就需要對大數據進行結構優化,即:將partial的sheet中的數據進行分組,比如每1000個row放到一個組里,然后用一個標志位標記這個組里1000個row是否都有相同的數據,如有都有的話,那么下次再比較的時候就可以跳過這個組,進行下一組的1000個row的循環遍歷。相同數量級,大約2分鍾比較完成。

注:實際數據是按照時間進行抽取出來的,所以partial的sheet數據 大致都在full的sheet的前半部分相同,如果數據無規律,非常混亂,那么還要對每一個row的數據進行結構優化,即:用另外一個標記為進行標記此row是否有相同的數據,判斷的時候先判斷這個標記位】

相關代碼如下:

【注:函數中的一些變量都是HardCode的,要根據具體數據進行修改】

Public Type PartialBasedModule
    IsAllSame As Boolean
    SheetDataRange As Variant
    SameCount As Long
End Type


Sub CompareData()

    Dim i As Long
    Dim j As Long
    Dim k As Long
    
    Dim fullSheetName As String
    fullSheetName = "Sheet1"
    Set fullSheet = Sheets(fullSheetName)
    Dim fullDataRange As Variant
    fullDataRange = fullSheet.Range("A1", "AT80000").CurrentRegion.Value
    Dim fullSheetRowMax As Long
    fullSheetRowMax = fullSheet.Range("A1", "AT80000").CurrentRegion.Rows.Count
    
    Dim partialSheetName As String
    partialSheetName = "Sheet2"
    Set partialSheet = Sheets(partialSheetName)
    Dim PartialDataRange As Variant
    PartialDataRange = partialSheet.Range("A1", "AT80000").CurrentRegion.Value
    Dim partialSheetRowMax As Long
    partialSheetRowMax = partialSheet.Range("A1", "AT80000").CurrentRegion.Rows.Count
    
    Dim partialSheetPages() As PartialBasedModule
    partialSheetPages = SeparatePartialSheet(PartialDataRange, partialSheetRowMax)

    Dim columnMax As Integer
    columnMax = 46
    
    Dim columnMark As Integer
    columnMark = 48
    
    Dim sameRow As Boolean
    
    For i = 1 To fullSheetRowMax
        For j = 1 To UBound(partialSheetPages)
            If partialSheetPages(j).SameCount < 1000 Then
                For k = 1 To UBound(partialSheetPages(j).SheetDataRange)
                    sameRow = True
                    For ColumnIndex = 1 To columnMax
                        If fullDataRange(i, ColumnIndex) <> partialSheetPages(j).SheetDataRange(k, ColumnIndex) Then
                            sameRow = False
                            Exit For
                        End If
                    Next ColumnIndex
                    
                    If sameRow Then
                        fullSheet.Cells(i, columnMark) = 1
                        partialSheetPages(j).SameCount = partialSheetPages(j).SameCount + 1
                        Exit For
                    End If
                Next k
            Else
                sameRow = False
            End If
            
            If sameRow Then
                Exit For
            End If
        Next j
    Next i
    
    MsgBox "Successfully!"
End Sub

Public Function SeparatePartialSheet(ByRef PartialDataRange As Variant, ByVal rowCount As Long) As PartialBasedModule()
    Dim eachPageCount As Long
    eachPageCount = 1000
    Dim pageCount As Integer
    pageCount = Int(rowCount / eachPageCount) + 1
    Dim pageIndex As Long
    
    Dim pageArr() As PartialBasedModule
    Dim startIndex As Long
    Dim endIndex As Long
    
    For pageIndex = 1 To pageCount
        Dim seperatedDataRange(1 To 1000, 1 To 46) As Variant
        Dim seperatedIndex As Long
        seperatedIndex = 1
        Dim colIndex As Integer
        
        If pageIndex < pageCount Then
            endIndex = pageIndex * eachPageCount
        Else
            endIndex = rowCount
        End If
        
        
        For startIndex = (pageIndex - 1) * eachPageCount + 1 To endIndex
            For colIndex = 1 To 46
                seperatedDataRange(seperatedIndex, colIndex) = PartialDataRange(startIndex, colIndex)
            Next colIndex
            seperatedIndex = seperatedIndex + 1
        Next startIndex
        
        Dim pageData As PartialBasedModule
        pageData.SheetDataRange = seperatedDataRange
        pageData.SameCount = 0
        pageData.IsAllSame = False
        
        ReDim Preserve pageArr(pageIndex)
        pageArr(pageIndex) = pageData
    Next pageIndex
    
    
    SeparatePartialSheet = pageArr
End Function
View Code

 

給每個Row都加上標記的代碼如下所示:【相同界別的數據,大約1分鍾完成比較】

Public Type RowModule
    IsSame As Boolean
    RowData As Variant
End Type

Public Type PartialBasedModule
    IsAllSame As Boolean
    SheetDataRange() As RowModule
    SameCount As Long
End Type


Sub CompareData()

    Dim i As Long
    Dim j As Long
    Dim k As Long
    
    Dim fullSheetName As String
    fullSheetName = "Sheet1"
    Set fullSheet = Sheets(fullSheetName)
    Dim fullDataRange As Variant
    fullDataRange = fullSheet.Range("A1", "AT80000").CurrentRegion.Value
    Dim fullSheetRowMax As Long
    fullSheetRowMax = fullSheet.Range("A1", "AT80000").CurrentRegion.Rows.Count
    
    Dim partialSheetName As String
    partialSheetName = "Sheet2"
    Set partialSheet = Sheets(partialSheetName)
    Dim PartialDataRange As Variant
    PartialDataRange = partialSheet.Range("A1", "AT80000").CurrentRegion.Value
    Dim partialSheetRowMax As Long
    partialSheetRowMax = partialSheet.Range("A1", "AT80000").CurrentRegion.Rows.Count
    
    Dim partialSheetPages() As PartialBasedModule
    partialSheetPages = SeparatePartialSheet(PartialDataRange, partialSheetRowMax)

    Dim columnMax As Integer
    columnMax = 46
    
    Dim columnMark As Integer
    columnMark = 48
    
    Dim sameRow As Boolean
    
    For i = 1 To fullSheetRowMax
        For j = 1 To UBound(partialSheetPages)
            If partialSheetPages(j).SameCount < 1000 Then
                For k = 1 To UBound(partialSheetPages(j).SheetDataRange)
                    sameRow = True
                    
                    If partialSheetPages(j).SheetDataRange(k).IsSame Then
                        sameRow = False
                    Else
                        For ColumnIndex = 1 To columnMax
                            If fullDataRange(i, ColumnIndex) <> partialSheetPages(j).SheetDataRange(k).RowData(ColumnIndex) Then
                                sameRow = False
                                Exit For
                            End If
                        Next ColumnIndex
                        
                        If sameRow Then
                            fullSheet.Cells(i, columnMark) = 1
                            partialSheetPages(j).SheetDataRange(k).IsSame = True
                            partialSheetPages(j).SameCount = partialSheetPages(j).SameCount + 1
                            Exit For
                        End If
                    End If
                Next k
            Else
                sameRow = False
            End If
            
            If sameRow Then
                Exit For
            End If
        Next j
    Next i
    
    MsgBox "Successfully!"
End Sub

Public Function SeparatePartialSheet(ByRef PartialDataRange As Variant, ByVal rowCount As Long) As PartialBasedModule()
    Dim eachPageCount As Long
    eachPageCount = 1000
    Dim pageCount As Integer
    pageCount = Int(rowCount / eachPageCount) + 1
    Dim pageIndex As Long
    
    Dim pageArr() As PartialBasedModule
    Dim startIndex As Long
    Dim endIndex As Long
    
    For pageIndex = 1 To pageCount
        Dim seperatedDataRange(1 To 1000) As RowModule
        Dim dataRows(1 To 1000) As Variant
        Dim seperatedIndex As Long
        seperatedIndex = 1
        Dim colIndex As Integer
        
        If pageIndex < pageCount Then
            endIndex = pageIndex * eachPageCount
        Else
            endIndex = rowCount
        End If
        
        
        For startIndex = (pageIndex - 1) * eachPageCount + 1 To endIndex
            Dim dataRow(1 To 46) As Variant
            For colIndex = 1 To 46
                dataRow(colIndex) = PartialDataRange(startIndex, colIndex)
            Next colIndex
            
            Dim currentRowModule As RowModule
            currentRowModule.RowData = dataRow
            currentRowModule.IsSame = False
            
            seperatedDataRange(seperatedIndex) = currentRowModule
            seperatedIndex = seperatedIndex + 1
        Next startIndex
        
        Dim pageData As PartialBasedModule
        pageData.SheetDataRange = seperatedDataRange
        pageData.SameCount = 0
        pageData.IsAllSame = False
        
        ReDim Preserve pageArr(pageIndex)
        pageArr(pageIndex) = pageData
    Next pageIndex
    
    
    SeparatePartialSheet = pageArr
End Function
View Code

 

最終的一個簡單的數據結構如下圖所示:

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM