官方网站建设银行2010年存款利息/电商怎么做新手入门
最近有个项目的抽样也是够恶心,原始数据表包含一张全的公司list(无重复公司)。交给Ops Team去根据它抽数。
接着对方返回一个excel 包含母list 和 多个data group sheet,每个data group里面就是抽到的,在指定时间段内的数据。
在被抽到的data group 中,总共抽取30个公司的数据。要求每个data group 都要random到两个记录。剩下的部分随机挑公司。
另外还要在过万的记录里面随机抽样30个公司,是没有任何数据的。去audit missing case。
各data group之间复制粘贴,还要考虑有些公司可能在各个data group都有数据。想来想去,觉得还是来个tool吧。
下图是拿到的raw data list样式
母list一万多条。在家测试共用去了26秒。每个页面
下图那些flag就是被抽中的了
下面上代码。
Option ExplicitSub Summarize()Dim WrkSht As WorksheetDim ShtNew As WorksheetDim ShtCom As WorksheetDim Rng As RangeDim k As IntegerDim k1 As IntegerDim k2 As IntegerDim kk As IntegerDim yy As IntegerDim Row1 As IntegerDim Zebra As IntegerDim Flag As IntegerFlag = 1Dim CID As StringDim arrDataDim dataCount As IntegerdataCount = 0Dim i As IntegerReDim arrData(1 To 1)Dim d As ObjectSet d = CreateObject("scripting.dictionary")Application.ScreenUpdating = FalseSet ShtNew = Worksheets.Add(After:=Worksheets(Worksheets.Count))ShtNew.Name = "AccuracyList"'Accuracy ListFor Each WrkSht In WorksheetsIf Not WrkSht Is ShtNew And WrkSht.Name <> "CompanyList" Thenkk = WrkSht.Range("A1").CurrentRegion.Columns.Count'MsgBox kkyy = WrkSht.Range("A1").CurrentRegion.Rows.Count'MsgBox yyFor k = 1 To kkIf WrkSht.Cells(1, k).Value = "Company Id" ThenWrkSht.Range(WrkSht.Cells(2, k), WrkSht.Cells(yy, k)).Copy ShtNew.Range("A" & ShtNew.Range("A56565").End(3).Row + 1)'note = note + yyExit ForEnd IfNext kEnd IfNextShtNew.Cells(1, 1).Value = "Company Id"ShtNew.Cells(1, 1).CurrentRegion.RemoveDuplicates Columns:=1, Header:=xlYesShtNew.Cells(1, 2).Value = "RAND"yy = ShtNew.Range("A1").CurrentRegion.Rows.CountFor k = 2 To yyShtNew.Cells(k, 2).Value = RndNext kShtNew.Range("A:B").Sort Columns(2), xlAscending, Header:=xlYes'End of Accuracy List'Completeness ListSet ShtCom = Worksheets.Add(After:=Worksheets(Worksheets.Count))ShtCom.Name = "CompleteList"For Each WrkSht In WorksheetsIf WrkSht.Name = "CompanyList" ThenWrkSht.Range("A1").CurrentRegion.Copy ShtCom.Range("A1")Exit ForEnd IfNextkk = ShtCom.Range("A1").CurrentRegion.Columns.Count ' The total number of columnsFor k = 1 To kkIf ShtCom.Cells(1, k).Value = "CompanyId" Thenk1 = k ' the column named Company IdExit ForEnd IfNext kFor k = 1 To ShtNew.Range("A1").CurrentRegion.Columns.CountIf ShtNew.Cells(1, k).Value = "Company Id" Thenk2 = k ' the column named Company IdExit ForEnd IfNextkk = ShtCom.Range("A1").CurrentRegion.Rows.Count ' The total number of rowsyy = ShtCom.Range("A1").CurrentRegion.Columns.Count 'The total number of columnsCells(1, yy + 1).Value = "Sequence"For Row1 = 2 To ShtNew.Range("A1").CurrentRegion.Rows.CountCID = ShtNew.Cells(Row1, k2).ValueSet Rng = ShtCom.Range(ShtCom.Cells(1, k1), ShtCom.Cells(kk, k1)).Find(CID, lookat:=xlWhole)If Not Rng Is Nothing ThenCells(Rng.Row, yy + 1).Value = 1End IfNext Row1ShtCom.Range(Cells(1, 1), Cells(kk, yy + 1)).Sort Columns(yy + 1), xlDescending, Header:=xlYesRange(Cells(2, yy + 1), Cells(Columns(yy + 1).End(xlDown).Row, yy + 1)).EntireRow.DeleteShtCom.Columns(yy + 1).DeleteShtCom.Cells(1, yy + 1).Value = "RAND"kk = ShtCom.Range("A1").CurrentRegion.Rows.Count ' The total number of rowsFor k = 2 To kkCells(k, yy + 1).Value = RndNext kShtCom.Range(Cells(1, 1), Cells(kk, yy + 1)).Sort Columns(yy + 1), xlAscending, Header:=xlYes'CompleteList End'Accuracy SamplingFor Each WrkSht In WorksheetsIf Not WrkSht Is ShtNew And WrkSht.Name <> "CompanyList" And Not WrkSht Is ShtCom Thenkk = WrkSht.Range("A1").CurrentRegion.Columns.Count'MsgBox kkyy = WrkSht.Range("A1").CurrentRegion.Rows.Count'MsgBox yyFor k = 1 To kkIf WrkSht.Cells(1, k).Value = "Company Id" Thenk1 = kExit ForEnd IfNext kIf k = kk ThenMsgBox "No Company Id Column!"Exit SubEnd IfIf yy > 3 ThenWrkSht.Cells(1, kk + 1).Value = "RAND"For k = 2 To yyWrkSht.Cells(k, kk + 1) = RndNext kWrkSht.Range(WrkSht.Cells(1, 1), WrkSht.Cells(yy, kk + 1)).Sort WrkSht.Columns(kk + 1), xlAscending, Header:=xlYesWith WrkShtdataCount = dataCount + 1ReDim Preserve arrData(1 To dataCount)arrData(dataCount) = .Cells(2, k1).Value'MsgBox arrData(dataCount)dataCount = dataCount + 1ReDim Preserve arrData(1 To dataCount)arrData(dataCount) = .Cells(3, k1).Value'MsgBox arrData(dataCount)End WithElseIf yy = 3 ThenWith WrkShtdataCount = dataCount + 1ReDim Preserve arrData(1 To dataCount)arrData(dataCount) = .Cells(2, k1).ValuedataCount = dataCount + 1ReDim Preserve arrData(1 To dataCount)arrData(dataCount) = .Cells(3, k1).ValueEnd WithElseIf yy = 2 ThenWith WrkShtdataCount = dataCount + 1ReDim Preserve arrData(1 To dataCount)arrData(dataCount) = .Cells(2, k1).ValueEnd WithElseMsgBox "Error"Exit SubEnd IfEnd IfNextFor i = 1 To UBound(arrData)d(arrData(i)) = d(arrData(i)) + 1
Next iZebra = 30 - d.CountReDim Preserve arrData(1 To 31)If Zebra <> 0 Thenk = 1While Zebra >= 0arrData(d.Count + 1) = ShtNew.Cells(k + 1, 1)For i = 1 To UBound(arrData)d(arrData(i)) = d(arrData(i)) + 1Next ik = k + 1Zebra = 30 - d.CountWend
End IfShtNew.Range("D2").Resize(UBound(arrData), 1).Value = WorksheetFunction.Transpose(arrData)'Accuracy Sampling Complete'Pick them upFor Each WrkSht In WorksheetsIf Not WrkSht Is ShtNew And WrkSht.Name <> "CompanyList" And Not WrkSht Is ShtCom Thenkk = WrkSht.Range("A1").CurrentRegion.Columns.Count ' The total number of columnsyy = WrkSht.Range("A1").CurrentRegion.Rows.Count ' The total number of rowsWrkSht.Cells(1, kk + 1).Value = "Flag"For k = 1 To kkIf ShtCom.Cells(1, k).Value = "Company Id" Thenk1 = k ' the column named Company IdExit ForEnd IfNext kFor i = 1 To UBound(arrData)For k = 2 To yyIf WrkSht.Cells(k, k1).Value = arrData(i) ThenWrkSht.Cells(k, kk + 1).Value = FlagEnd IfNext kFlag = Flag + 1Next iEnd IfWrkSht.Range(WrkSht.Cells(1, 1), WrkSht.Cells(yy, kk + 1)).Sort WrkSht.Columns(kk + 1), xlDescending, Header:=xlYesNext'End of pick-upSet d = NothingApplication.ScreenUpdating = TrueEnd Sub
结束语:再一次的感谢 组小牛同学,当我告诉他我用union range法去实现删除区域时,他提醒我应该去尝试排序再删除。
实验表明,同样家里的土冒机器的环境下,union_range法39秒,排序删除法19秒。
另外还有就是使用union 的时候,里面不可以有设置为nothing的区块或者没有定义的区块。