前言

本文主要是總結平時工作學習中遇到的使用Sql Server的去除重復的心得體會。

由於平時工作使用Sql並不多，此次在寫本文的測試過程中，就遇到了問題，如能有幸得到高手點播，將不勝感激。

高手可以直接看個開頭，直接跳過文章內容，點到后面的遇到的問題，辛苦！

准備

本文使用的工具是SQL SERVER 2008，使用的是微軟的案例Northwind，選取的數據集以Products表的前10條數據為例，如下圖：

Distinct

根據之后緊跟關鍵字distinct后的字段去除重復，而distinct只能放在所有要查詢字段的前面。distinct后的字段有一個不一樣即為不同。

示例：根據SupplierID,CategoryID去除重復的內容

Select distinct a.SupplierID,a.CategoryID from (SELECT TOP 10 [ProductID]
      ,[ProductName]
      ,[SupplierID]
      ,[CategoryID]
      ,[QuantityPerUnit]
      ,[UnitPrice]
      ,[UnitsInStock]
      ,[UnitsOnOrder]
      ,[ReorderLevel]
      ,[Discontinued]
  FROM [Northwind].[dbo].[Products]) a

獲得結果：

Note:使用distinct是針對其后面跟着的所有字段，而不是一個或兩個字段。

這直接導致如果查詢的時候需要查詢的字段比較多，去除重復只是根據其中的一兩個字段就無法獲得想要的結果。

Group by

指定由查詢 (SELECT) 表達式返回的對象要分入的組。使用group by時可以巧妙地使用聚合函數達到去除重復的目的。

Select Max(a.ProductID) as ID,a.CategoryID ,a.SupplierID from (SELECT TOP 10 [ProductID]
      ,[ProductName]
      ,[SupplierID]
      ,[CategoryID]
      ,[QuantityPerUnit]
      ,[UnitPrice]
      ,[UnitsInStock]
      ,[UnitsOnOrder]
      ,[ReorderLevel]
      ,[Discontinued]
  FROM [Northwind].[dbo].[Products]) a
  group by a.CategoryID ,a.SupplierID

獲得結果：

這次可以獲得去除重復過程中ID最大(獲取ID最小列可以使用Min函數)的數據行，有了ID唯一標識列就可以解決上面distinct遺留下來的問題。

內聯原來的表就可以獲取想要的任意字段的值了。

順帶附上Min函數的結果：

Row_Number() over()

over()里面有兩個參數

Partition by value_expression

將 FROM 子句生成的結果集划入應用了 ROW_NUMBER 函數的分區。 value_expression 指定對結果集進行分區所依據的列。如果未指定 PARTITION BY，則此函數將查詢結果集的所有行視為單個組。

也就是說partition by后面的字段是要去重復的字段。欲知詳情請點擊此處

Order by

ORDER BY 子句可確定在特定分區中為行分配唯一 ROW_NUMBER 的順序。它是必需的。

Select a.ProductID,a.SupplierID,a.CategoryID, ROW_NUMBER() over(partition by CategoryID ,SupplierID order by ProductID)as RowN from (
SELECT TOP 10 [ProductID]
      ,[ProductName]
      ,[SupplierID]
      ,[CategoryID]
      ,[QuantityPerUnit]
      ,[UnitPrice]
      ,[UnitsInStock]
      ,[UnitsOnOrder]
      ,[ReorderLevel]
      ,[Discontinued]
  FROM [Northwind].[dbo].[Products]) a

獲得結果：

Note:此處的數據稍微有點問題，最后會說到。

此次並沒有達到去除重復的結果，但稍微看下就發現了多了一行RowN。

這個是根據SupplierID,CategoryID分區並根據ProductID升序獲得的行號。所以去除重復也就非常容易了。

Select* from (
Select a.ProductID,a.SupplierID,a.CategoryID, ROW_NUMBER() over(partition by CategoryID ,SupplierID order by ProductID)as RowN from (
SELECT TOP 10 [ProductID]
      ,[ProductName]
      ,[SupplierID]
      ,[CategoryID]
      ,[QuantityPerUnit]
      ,[UnitPrice]
      ,[UnitsInStock]
      ,[UnitsOnOrder]
      ,[ReorderLevel]
      ,[Discontinued]
  FROM [Northwind].[dbo].[Products]) a) b where b.RowN=1

獲得結果：

順帶附上b.RowN=2結果：

面試問題

取出某年某月每一天的記錄的第一條

姑且認為每天第一條記錄是當天ID最小的那條，以下為測試使用數據集

SELECT  [OrderID]
      ,[CustomerID]
      ,[EmployeeID]
      ,[OrderDate]  
  FROM [Northwind].[dbo].[Orders]
  where DATEPART(YEAR,OrderDate)=1997 AND DATEPART(MONTH,OrderDate)=1

方法一：Group by

  with Dataset as (SELECT  [OrderID]
      ,[CustomerID]
      ,[EmployeeID]
      ,[OrderDate]  
  FROM [Northwind].[dbo].[Orders]
  where DATEPART(YEAR,OrderDate)=1997 AND DATEPART(MONTH,OrderDate)=1)
  Select a.* from Dataset a,
  (SELECT Min([OrderID]) as ID     
      ,DATEPART(DAYOFYEAR,OrderDate) as dayofOrder  
  FROM [Northwind].[dbo].[Orders]
  where DATEPART(YEAR,OrderDate)=1997 AND DATEPART(MONTH,OrderDate)=1
  group by DATEPART(DAYOFYEAR,OrderDate)) b
  where a.OrderID=b.ID

獲得結果：

方法二：Row_Number() over()

 with Dataset as (SELECT  [OrderID]
      ,[CustomerID]
      ,[EmployeeID]
      ,[OrderDate]  
  FROM [Northwind].[dbo].[Orders]
  where DATEPART(YEAR,OrderDate)=1997 AND DATEPART(MONTH,OrderDate)=1)
 select a.* from(Select *,ROW_NUMBER() over(Partition by DatePart(dayofyear,OrderDate) 
 order by OrderID) as RowN from Dataset) a where a.RowN=1