一、背景
這個數據庫的數據文件mdf大概有83G左右,當還原數據庫之后感覺可以做很多性能方面上的調優,合並數據后mdf數據文件大概有59G左右,行壓縮后mdf數據文件大概有39G左右,頁壓縮后mdf數據文件大概有34G左右,這里處於技術研究的目的,講講研究的成果分析,不用於商業目的;
二、優化項
我們可以從下面4個不同的方面來優化這兩個數據庫:
(一)對表進行分區;
(二)創建合適表索引;
(三)使用行壓縮,壓縮行數據;
(四)重新設計表結構,優化表空間;
三、附加數據庫
1.先把11個GroupData(群與成員的關系)數據庫附加到數據庫,下面的導入SQL語句在原來的基礎上做了些修改:統一數據庫名,這樣做的好處就是后面做處理的時候方便按照順序執行數據庫;
--附加數據庫 EXEC sp_attach_db "GroupData01", "D:\DBBackup\QunData\GroupData1_Data.MDF" EXEC sp_attach_db "GroupData02", "D:\DBBackup\QunData\GroupData2_Data.MDF" EXEC sp_attach_db "GroupData03", "D:\DBBackup\QunData\GroupData3_Data.MDF" EXEC sp_attach_db "GroupData04", "D:\DBBackup\QunData\GroupData4_Data.MDF" EXEC sp_attach_db "GroupData05", "D:\DBBackup\QunData\GroupData5_Data.MDF" EXEC sp_attach_db "GroupData06", "D:\DBBackup\QunData\GroupData6_Data.MDF" EXEC sp_attach_db "GroupData07", "D:\DBBackup\QunData\GroupData7_Data.MDF" EXEC sp_attach_db "GroupData08", "D:\DBBackup\QunData\GroupData8_Data.MDF" EXEC sp_attach_db "GroupData09", "D:\DBBackup\QunData\GroupData9_Data.MDF" EXEC sp_attach_db "GroupData10", "D:\DBBackup\QunData\GroupData10_Data.MDF" EXEC sp_attach_db "GroupData11", "D:\DBBackup\QunData\GroupData11_Data.MDF"
四、合並數據庫
2.修改各個數據庫中表的名字:把Group1統一修改為Group01這樣格式的,這樣做的好處就是在合並數據的時候讀取到的數據庫的數據是按照順序插入到表中的,不會造成數據頁的拆分;
--格式化表名 USE GroupData01 GO exec sp_rename 'Group1','Group01' exec sp_rename 'Group2','Group02' exec sp_rename 'Group3','Group03' exec sp_rename 'Group4','Group04' exec sp_rename 'Group5','Group05' exec sp_rename 'Group6','Group06' exec sp_rename 'Group7','Group07' exec sp_rename 'Group8','Group08' exec sp_rename 'Group9','Group09'
3.創建一個名為GroupData的數據庫,設置數據庫為簡單恢復模式;
4.在GroupData數據庫中創建一個臨時表:tables,用來保存所有的數據庫與表的信息,提供數據庫合並用;
--創建臨時表 CREATE TABLE [GroupData].[dbo].[tables]( [db_name] [sysname] NULL, [table_name] [sysname] NULL, [status] [bit] default 0 ) ON [PRIMARY] select db_name,table_name,status from [GroupData].[dbo].[tables] --生成數據庫名稱與表名稱的對應列表 EXEC sp_MSForEachDB 'USE [?]; --插入表信息 INSERT INTO [GroupData].[dbo].[tables]([table_name]) SELECT name from [?].sys.tables where name like ''Group%'' order by name --更新數據庫名稱 UPDATE [GroupData].[dbo].[tables] SET [db_name] = ''?'' WHERE [db_name] IS NULL'
五、優化數據庫
5.經過評估,11個GroupData數據庫的Group表數據的總和大概有15億,Group表中QunNum(群號)字段的最大值為100219998(可以通過QunInfo11數據庫的QunList110表查詢到:SELECT MAX(QunNum) FROM [QunInfo11].[dbo].[QunList110]),從業務的角度,可能需要查詢某群的信息,所以這里就以QunNum作為分區,每5百萬個群作為一個分區,這樣計算那就需要21個文件組,假設群成員都比較平均的話,那每個文件組里面就保存了大概7千萬左右的群成員關系;
6.下面是一個創建分區腳本的SQL腳本,執行下面的SQL會生成一個新的腳本,執行那個腳本就可以創建21個文件組、分區函數和分區方案;
--生成分區腳本 DECLARE @DataBaseName NVARCHAR(50)--數據庫名稱 DECLARE @TableName NVARCHAR(50)--表名稱 DECLARE @ColumnName NVARCHAR(50)--字段名稱 DECLARE @PartNumber INT--分區最大編號 DECLARE @PartNumberBegin INT--分區編號開始值 DECLARE @PartNumberBeginTemp INT--分區編號開始值臨時值 DECLARE @PartNumberStr NVARCHAR(50)--分區值字符串 DECLARE @Location NVARCHAR(50)--保存分區文件的路徑 DECLARE @Size NVARCHAR(50)--分區初始化大小 DECLARE @FileGrowth NVARCHAR(50)--分區文件增量 DECLARE @FunValue INT--分區分段值增量 DECLARE @FunValueBegin INT--分區分段值開始值 DECLARE @i INT--臨時變量 DECLARE @sql NVARCHAR(max) --設置下面變量 SET @DataBaseName = 'GroupData' SET @TableName = 'Group' SET @ColumnName = 'QunNum' SET @PartNumber = 21 SET @PartNumberBegin = 1 SET @Location = 'D:\DBBackup\FG_Group\' SET @Size = '4096MB' SET @FileGrowth = '1024MB' SET @FunValueBegin = 5000000 SET @FunValue = 5000000 SET @sql = 'USE ['+@DataBaseName +'] GO' PRINT @sql + CHAR(13) --1.創建文件組 SET @i = 1 SET @PartNumberBeginTemp = @PartNumberBegin PRINT '--1.創建文件組' WHILE @i <= @PartNumber BEGIN SET @PartNumberStr = RIGHT('0' + CONVERT(NVARCHAR,@PartNumberBeginTemp),2) SET @sql = 'ALTER DATABASE ['+@DataBaseName +'] ADD FILEGROUP [FG_'+@TableName+'_'+@ColumnName+'_'+@PartNumberStr+']' PRINT @sql + CHAR(13) SET @i=@i+1 SET @PartNumberBeginTemp = @PartNumberBeginTemp+1 END --2.創建文件 SET @i = 1 SET @PartNumberBeginTemp = @PartNumberBegin PRINT CHAR(13)+'--2.創建文件' WHILE @i <= @PartNumber BEGIN SET @PartNumberStr = RIGHT('0' + CONVERT(NVARCHAR,@PartNumberBeginTemp),2) SET @sql = 'ALTER DATABASE ['+@DataBaseName +'] ADD FILE (NAME = N''FG_'+@TableName+'_'+@ColumnName+'_'+@PartNumberStr+'_data'',FILENAME = N'''+@Location+'FG_'+@TableName+'_'+@ColumnName+'_'+@PartNumberStr+'_data.ndf'',SIZE = '+@Size+', FILEGROWTH = '+@FileGrowth+' ) TO FILEGROUP [FG_'+@TableName+'_'+@ColumnName+'_'+@PartNumberStr+'];' PRINT @sql + CHAR(13) SET @i=@i+1 SET @PartNumberBeginTemp = @PartNumberBeginTemp+1 END --3.創建分區函數 PRINT CHAR(13)+'--3.創建分區函數' DECLARE @FunValueStr NVARCHAR(MAX) DECLARE @PNB INT SET @i = 1 SET @PNB = 1 SET @FunValueStr = convert(NVARCHAR(50),@FunValueBegin) + ',' WHILE @i < @PartNumber-1 BEGIN SET @FunValueStr = @FunValueStr + convert(NVARCHAR(50),(@FunValueBegin+@PNB*@FunValue)) + ',' SET @i=@i+1 SET @PNB=@PNB+1 END SET @FunValueStr = substring(@FunValueStr,1,len(@FunValueStr)-1) SET @sql = 'CREATE PARTITION FUNCTION [Fun_'+@TableName+'_'+@ColumnName+'](INT) AS RANGE RIGHT FOR VALUES('+@FunValueStr+')' PRINT @sql + CHAR(13) --4.創建分區方案 PRINT CHAR(13)+'--4.創建分區方案' DECLARE @FileGroupStr NVARCHAR(MAX) SET @i = 1 SET @PartNumberBeginTemp = @PartNumberBegin SET @FileGroupStr = '' WHILE @i <= @PartNumber BEGIN SET @PartNumberStr = RIGHT('0' + CONVERT(NVARCHAR,@PartNumberBeginTemp),2) SET @FileGroupStr = @FileGroupStr + '[FG_'+@TableName+'_'+@ColumnName+'_'+@PartNumberStr+'],' SET @i=@i+1 SET @PartNumberBeginTemp = @PartNumberBeginTemp+1 END SET @FileGroupStr = substring(@FileGroupStr,1,len(@FileGroupStr)-1) SET @sql = 'CREATE PARTITION SCHEME [Sch_'+@TableName+'_'+@ColumnName+'] AS PARTITION [Fun_'+@TableName+'_'+@ColumnName+'] TO('+@FileGroupStr+')' PRINT @sql + CHAR(13) --5.分區函數的記錄數 PRINT CHAR(13)+'--5.分區函數的記錄數' SET @sql = 'SELECT $PARTITION.[Fun_'+@TableName+'_'+@ColumnName+']('+@ColumnName+') AS Partition_num, MIN('+@ColumnName+') AS Min_value,MAX('+@ColumnName+') AS Max_value,COUNT(1) AS Record_num FROM dbo.['+@TableName+'] GROUP BY $PARTITION.[Fun_'+@TableName+'_'+@ColumnName+']('+@ColumnName+') ORDER BY $PARTITION.[Fun_'+@TableName+'_'+@ColumnName+']('+@ColumnName+');' PRINT @sql + CHAR(13)
7.下面重新對Group表進行設計,涉及的內容如下:
1) 在GroupData數據庫中創建分區表Group,這里已經把原表的ID字段去掉了,這個字段並沒有太大的意義;
2) 以[QunNum]和[QQNum]作為聚集索引,而且是唯一的,這個需要開啟IGNORE_DUP_KEY = ON選項,這樣才可以在批量插入的時候忽略重復值;
3) 對原表[Age]、[Gender]、[Auth]3個字段的數據類型進行了修改,減少占用的空間,
4) 使用剛剛創建好的分區方案,之后創建的索引進行索引對齊;
5) 對表使用行壓縮,減少數據庫占用空間;
6) 對表進行頁壓縮會更節省空間?
--創建優化后的Group表 CREATE TABLE [dbo].[Group]( [QunNum] [int] NOT NULL, [QQNum] [int] NOT NULL, [Nick] [varchar](20) NULL, [Age] [tinyint] NULL, [Gender] [tinyint] NULL, [Auth] [tinyint] NULL, CONSTRAINT [PK_Group] PRIMARY KEY CLUSTERED ( [QunNum] ASC, [QQNum] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = ON, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = ROW) ON [Sch_Group_QunNum]([QunNum]) ) ON [Sch_Group_QunNum]([QunNum]) GO
(Figure:GroupData原表結構)
(Figure:GroupData新表結構)
8.把11個數據庫都合並到新創建的GroupData的Group表中;
--合並數據 DECLARE @tablename sysname DECLARE @dbname sysname DECLARE @sql NVARCHAR(max) --游標 DECLARE @itemCur CURSOR SET @itemCur = CURSOR FOR SELECT db_name,table_name from [GroupData].[dbo].[tables] OPEN @itemCur FETCH NEXT FROM @itemCur INTO @dbname,@tablename WHILE @@FETCH_STATUS=0 BEGIN SET @sql = ' INSERT INTO [GroupData].[dbo].[Group] ([QunNum] ,[QQNum] ,[Nick] ,[Age] ,[Gender] ,[Auth]) SELECT [QunNum] ,[QQNum] ,[Nick] ,[Age] ,[Gender] ,[Auth] FROM ['+@dbname+'].[dbo].['+@tablename+']' EXEC(@sql) UPDATE [GroupData].[dbo].[tables] SET status = 1 WHERE db_name = @dbname AND table_name = @tablename --返回SQL PRINT(@sql)PRINT('GO')+CHAR(13) FETCH NEXT FROM @itemCur INTO @dbname,@tablename END CLOSE @itemCur DEALLOCATE @itemCur
9.為Group表的QQNum字段創建一個索引,這個索引在進行表聯接的時候會用到;
--索引行壓縮 CREATE NONCLUSTERED INDEX [IX_Group_QQNum] ON [dbo].[Group] ( [QQNum] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = ROW) ON [Sch_Group_QunNum]([QunNum]) GO
(Figure:GroupData表分區記錄數)
(Figure:GroupData數據行壓縮前)
(Figure:GroupData數據行壓縮后)
(Figure:GroupData數據頁壓縮后)
(Figure:GroupData索引行壓縮前)
(Figure:GroupData索引行壓縮后)
怎么行壓縮后索引的占用空間比壓縮前的還要大呢?
--索引頁壓縮 CREATE NONCLUSTERED INDEX [IX_Group_QQNum] ON [dbo].[Group] ( [QQNum] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, DATA_COMPRESSION = PAGE) ON [Sch_Group_QunNum]([QunNum]) GO
(Figure:GroupData索引頁壓縮后)
怎么頁壓縮后索引的占用空間比壓縮前的還要大呢?