最近給某大學網站制作一個功能,需要給全校所有的學生提供就業單位發布職位的自動匹配,學生登陸就業網,就可以查看適合自己的職位,進而可以在線投遞。
全校有幾萬名學生,注冊企業發布的職位也有上萬,如何在很短的時間內(不影響學生訪問網站),通過建立好的匹配模型迅速的對學生——職位進行匹配?
- 建模篇
我以前給銀行開發過房地產自動估價軟件,按照標准做法是用歐幾里得貼近算法或者海明貼近度,但是那種算法太復雜,屬於應用數學的范疇,需要依靠精准的建模。我們就業辦的老師是在實戰上打拼的,沒有高深的理論基礎,所以建模必須簡單。根據調研,發現使用海明距離的結合加權的算法比較簡單,也容易轉化成匹配度的百分比。具體算法出自2010年3月《計算機工程》中的《模糊匹配中的匹配度計算方法》:
![]() |
![]() |
這種算法的基本思路是每個項目都有一定的權重分值,然后按照學生和職位每個項目的匹配度,乘以權重分值再除以總的權重,就是學生對於每個職位的匹配情況。
學生在就業網上設置過職位搜索器的,按照職位搜索器中設置的項目進行匹配,有這么些項目:

比如“薪資”一項的匹配方法如下:

如果學生沒有設置過搜索器的,則使用另一套維度,其中使用了就業數據的大數據分析方法,介紹略。
- 實現篇
網站基於.net C# + Sqlserver,每次要對所有的有效職位進行匹配,如此大的計算量使用傳統方法肯定會慢,甚至造成性能瓶頸。因此思考采用兩種方式提高計算效率:
- 使用MongoDB。
- 使用Sqlserver2014或者SqlServer2016的內存優化表。
MongoDB是典型的NoSQL數據庫,交換數據是json格式,這種數據庫存取的速度非常快,沒有Sqlserver那些復雜的權限、並發、鎖、存儲引擎,因此很適合作為高吞吐量的數據存儲方式;
微軟在Sqlserver2014和Sqlserver2016中開發了內存優化表和本地編譯存儲過程,兩者也有很好的性能表現
(順便吐槽一下網上有人說Oracle一句命令就可以把表升級為內存表,一句命令就可以把存儲過程升級成本地編譯存儲過程,而Sqlserver這方面限制太多,內存表不能建索引、不能建Check....(2016版可以),而本地編譯存儲過程的限制更是多得多,不能用function,不能用游標,不能用鏈接數據庫........。我想這是兩種數據庫不同的實現機制形成的,在《SQL編程風格》151頁中描述:“T-SQL是一個簡單的一遍掃描的編譯器,以C和Algol語言模型創建......Oracle中的PL/SQL是以ADA和SQL/PSM為模型創建的,它是一種復雜語言,可以用來開發應用程序。”所以Oralce的存儲過程要升級簡直易如反掌。)
我們還是先選擇SqlServer的內存表作為數據緩沖池,本來想使用本地編譯存儲過程實現模型匹配算法,但是限制實在太多,所以只好使用普通的存儲過程。每天有批處理把職位數據同步到內存表里,然后學生登錄后進行計算,每周還進行職位的推送。
應大家要求公布算法代碼:(因為和學校簽訂過保密協議,所以刪除部分行,請諒解)
基本的建表、初始化數據腳本:
1 --建表
2 CREATE DATABASE [DataAnalysis]
3 CONTAINMENT = NONE 4 ON PRIMARY
5 ( NAME = N'DataAnalysis', FILENAME = N'd:\DATA\DataAnalysis.mdf' , SIZE = 5120KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB ), 6 FILEGROUP [DataAnalysisFileGroup] CONTAINS MEMORY_OPTIMIZED_DATA DEFAULT
7 ( NAME = N'DataAnalysisContainer', FILENAME = N'd:\DATA\HashCollisionsContainer' , MAXSIZE = UNLIMITED) 8 LOG ON
9 ( NAME = N'DataAnalysis_log', FILENAME = N'd:\DATA\DataAnalysis_log.ldf' , SIZE = 2304KB , MAXSIZE = 2048GB , FILEGROWTH = 10%) 10 GO
11
12 ALTER DATABASE [DataAnalysis] SET COMPATIBILITY_LEVEL = 120
13 GO
14
15
16 Use [DataAnalysis]
17 GO
18
19 if exists(select * from sysobjects where id=object_id('EnterPrisePositions')) 20 DROP TABLE EnterPrisePositions 21 GO
22
23 CREATE TABLE EnterPrisePositions 24 ( 25 [ID] [int] IDENTITY(1,1) NOT NULL Primary Key NONCLUSTERED HASH WITH (BUCKET_COUNT = 4096), 26 [EntID] uniqueidentifier NOT NULL, 27 [EntUserID] uniqueidentifier NOT NULL, 28 [EntName] [nvarchar](80) NOT NULL, 29 [PosiID] [uniqueidentifier] NOT NULL, 30 [PosiName] [nvarchar](40) NULL, 31 [JobTypeID] [uniqueidentifier] NULL, 32 --.......
33 ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_ONLY) 34 GO
35
36 INSERT INTO [EnterPrisePositions] ([EntID], [EntUserID], [EntName], [OrgCode]..........) 37 SELECT e.[EntID], p.[EntUserID], e.[EntName], e.[OrgCode], e.[HyID], e.[SubHyID], e.[ThirdHyID],.......... 38 FROM .....[dbo].[Position] p INNER JOIN ......[dbo].[Enterprise] e ON p.[EntUserID] = e.[EntUserID]
39 WHERE p.DelFlag <> 1 AND p.EffectiveDate <= GetDate() AND GetDate() <= p.ExpiryDate AND e.[CheckFlag] = 1 AND e.[DelFlag] = 0 AND e.[IsBlack] = 0
40 GO
41
42
43 if exists(select * from sysobjects where id=object_id('UserPositionResult')) 44 DROP TABLE UserPositionResult 45 GO
46
47 CREATE TABLE UserPositionResult 48 ( 49 [ID] [int] IDENTITY(1,1) NOT NULL Primary Key NONCLUSTERED HASH WITH (BUCKET_COUNT = 4096), 50 [Score] FLOAT, 51 [PosiID] [uniqueidentifier] NOT NULL, 52 [PosiName] [nvarchar](40) NULL, 53 --............
54 ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_ONLY) 55 GO
56
57
58 --輔助函數
59 -- =============================================
60 -- Author: Ben
61 -- Create date: 2016-08-08
62 -- Description: 處理專業ID用
63 -- =============================================
64 CREATE FUNCTION [GetPositionSpecialty] ( 65 @String NVARCHAR(150) 66 ) RETURNS @temptable TABLE (ID INT IDENTITY(1,1), Specialty NVARCHAR(8)) AS
67 BEGIN
68
69 DECLARE @idx INT=1
70 DECLARE @slice NVARCHAR(150) 71 DECLARE @Delimiter NVARCHAR(1) = ','
72 IF LEN(@String) < 1 OR LEN(ISNULL(@String,'')) = 0
73 BEGIN
74 INSERT INTO @temptable(Specialty) VALUES('0') 75 RETURN
76 END
77 WHILE @idx != 0
78 BEGIN
79 SET @idx = CHARINDEX(@Delimiter,@String) 80 IF @idx != 0
81 SET @slice = LEFT(@String,@idx - 1) 82 ELSE
83 SET @slice = @String
84 IF LEN(@slice) > 0
85 INSERT INTO @temptable(Specialty) VALUES(@slice) 86 SET @String = RIGHT (@String, LEN(@String) - @idx) 87 IF LEN(@String) = 0
88 BREAK
89 END
90 RETURN
91 END
92
93 GO
94
95
96 -- =============================================
97 -- Author: Ben
98 -- Create date: 2016-08-08
99 -- Description: 處理職位搜索器用
100 -- =============================================
101 CREATE PROCEDURE [GetStudentSearch]
102 -- Add the parameters for the stored procedure here
103 @Ssqtj NVARCHAR(1000), @ProvinceId NCHAR(6) OUTPUT, @Zydm NVARCHAR(10) OUTPUT, @JobNature NVARCHAR(8) OUTPUT, @JobTypeID uniqueidentifier OUTPUT, @SubJobTypeID uniqueidentifier OUTPUT, 104 @Salary NVARCHAR(30) OUTPUT, @Computer NVARCHAR(50) OUTPUT, @Language NVARCHAR(50) OUTPUT, @Education NVARCHAR(4) OUTPUT, @HyID uniqueidentifier OUTPUT, @SubHyID uniqueidentifier OUTPUT, 105 @ThirdHyID uniqueidentifier OUTPUT 106 AS
107 BEGIN
108 DECLARE @idx INT=1, @StartPos INT
109 DECLARE @Delimiter NVARCHAR(1) = '|'
110
111 IF LEN(@Ssqtj) < 1 OR LEN(ISNULL(@Ssqtj, '')) = 0
112 RETURN
113
114 SET @idx = CHARINDEX(@Delimiter, @Ssqtj, @idx) 115 IF @idx != 0
116 SET @ProvinceId = LEFT(@Ssqtj, @idx - 1) 117
118
119 SET @StartPos = @idx + 1
120 SET @idx = CHARINDEX(@Delimiter, @Ssqtj, @StartPos) 121 IF @idx != 0
122 SET @Zydm = SUBSTRING(@Ssqtj, @StartPos, @idx - @StartPos) 175
176 SET @StartPos = @idx + 1
177 SET @ThirdHyID = SUBSTRING(@Ssqtj, @StartPos, 36) 178 END TRY 179 BEGIN CATCH 180 END CATCH 181 END
182 GO
183
184 --重要,參數表
185 if exists(select * from sysobjects where id=object_id('Parameters')) 186 DROP TABLE Parameters 187 GO
188 CREATE TABLE [Parameters]
189 ( 190 [ID] [int] IDENTITY(1,1) NOT NULL Primary Key NONCLUSTERED HASH WITH (BUCKET_COUNT = 1024), 191 [Type] NVARCHAR(50) NOT NULL, 192 [Set1] NVARCHAR(50) NULL, 193 [Set2] NVARCHAR(50) NULL, 194 [Value] FLOAT NOT NULL
195 ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA) 196 GO
197
198 INSERT INTO [Parameters] ([Type], [Set1], [Set2], [Value]) 199 SELECT 'Salary', '面議', '面議', 10
200 UNION ALL
201 SELECT 'Salary', '面議', '1500~2000', 10
202 UNION ALL
203 SELECT 'Salary', '面議', '2000~3000', 10
204 UNION ALL
205 SELECT 'Salary', '面議', '3000~4000', 10
206 UNION ALL
207 SELECT 'Salary', '面議', '4000~5000', 10
208 UNION ALL
209 SELECT 'Salary', '面議', '5000~6000', 10
210 UNION ALL
211 SELECT 'Salary', '面議', '6000~7000', 10
212 UNION ALL
213 SELECT 'Salary', '面議', '7000~8000', 10
214 UNION ALL
215 SELECT 'Salary', '面議', '8000以上', 10
216
217 UNION ALL
218 SELECT 'Salary', '1500~2000', '面議', 10
219 UNION ALL
220 SELECT 'Salary', '1500~2000', '1500~2000', 10
221 UNION ALL
222 SELECT 'Salary', '1500~2000', '2000~3000', 5
223 UNION ALL
224 SELECT 'Salary', '1500~2000', '3000~4000', 2
225 UNION ALL
226 SELECT 'Salary', '1500~2000', '4000~5000', 2
227 UNION ALL
228 SELECT 'Salary', '1500~2000', '5000~6000', 2
229 UNION ALL
230 SELECT 'Salary', '1500~2000', '6000~7000', 4
231 UNION ALL
232 SELECT 'Salary', '1500~2000', '7000~8000', 3
233 UNION ALL
234 SELECT 'Salary', '1500~2000', '8000以上', 2
235
236 UNION ALL
237 SELECT 'Salary', '2000~3000', '面議', 10
238 UNION ALL
239 SELECT 'Salary', '2000~3000', '1500~2000', 10
240 UNION ALL
241 SELECT 'Salary', '2000~3000', '2000~3000', 10
242 UNION ALL
243 SELECT 'Salary', '2000~3000', '3000~4000', 6
244 UNION ALL
245 SELECT 'Salary', '2000~3000', '4000~5000', 5
246 UNION ALL
247 SELECT 'Salary', '2000~3000', '5000~6000', 4
248 UNION ALL
249 SELECT 'Salary', '2000~3000', '6000~7000', 5
250 UNION ALL
251 SELECT 'Salary', '2000~3000', '7000~8000', 4
252 UNION ALL
253 SELECT 'Salary', '2000~3000', '8000以上', 3
254
255 UNION ALL
256 SELECT 'Salary', '3000~4000', '面議', 10
257 UNION ALL
258 SELECT 'Salary', '3000~4000', '1500~2000', 10
259 UNION ALL
260 SELECT 'Salary', '3000~4000', '2000~3000', 10
261 UNION ALL
262 SELECT 'Salary', '3000~4000', '3000~4000', 10
263 UNION ALL
264 SELECT 'Salary', '3000~4000', '4000~5000', 8
265 UNION ALL
266 SELECT 'Salary', '3000~4000', '5000~6000', 6
267 UNION ALL
268 SELECT 'Salary', '3000~4000', '6000~7000', 6
269 UNION ALL
270 SELECT 'Salary', '3000~4000', '7000~8000', 5
271 UNION ALL
272 SELECT 'Salary', '3000~4000', '8000以上', 4
273
274 --........
275
276 UNION ALL
277 SELECT 'Weight', 'HasNoSearch', 'Education', 20
278 UNION ALL
279 SELECT 'Weight', 'HasNoSearch', 'Profession', 20
280 UNION ALL
281 SELECT 'Weight', 'HasNoSearch', 'Industry', 8
282 UNION ALL
283 SELECT 'Weight', 'HasNoSearch', 'Enterprise', 12
284 GO
285
286
287 --其他臨時表,大數據分析
288 CREATE TABLE IndutryRanking 289 ( 290 Gzydm NVARCHAR(10) COLLATE Chinese_PRC_Stroke_90_BIN2 NOT NULL, SubIndustry uniqueidentifier NOT NULL, Ranking TINYINT NOT NULL, 291 CONSTRAINT [PK_IndutryRanking] PRIMARY KEY NONCLUSTERED HASH 292 ( 293 Gzydm , 294 SubIndustry 295 )WITH ( BUCKET_COUNT = 2048) 310 311 CREATE TABLE EnterpriseRanking 312 ( 313 Gzydm NVARCHAR(10) COLLATE Chinese_PRC_Stroke_90_BIN2 NOT NULL, Zzjgdm NVARCHAR(10) COLLATE Chinese_PRC_Stroke_90_BIN2 NOT NULL, Ranking FLOAT NOT NULL, 314 CONSTRAINT [PK_EnterpriseRanking] PRIMARY KEY NONCLUSTERED HASH 315 ( 316 Gzydm , 317 Zzjgdm 318 )WITH ( BUCKET_COUNT = 131072) 319 ) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_ONLY) 320 GO 321 322 INSERT INTO EnterpriseRanking (Gzydm, Zzjgdm, Ranking) 323 SELECT Gzydm, Zzjgdm, LOG(Count(*)+1, 1.8) 324 FROM ..........[dbo].[vm_AllSyEmployment] 325 WHERE Gzydm IS NOT NULL AND ZZJGDM IS NOT NULL 326 GROUP BY Gzydm,Zzjgdm 327 ORDER BY Count(*) DESC 328 329 GO 330 331 332 CREATE FUNCTION [dbo].[GetCurrentBynf]() 333 RETURNS char(4) 334 AS 335 BEGIN 336 -- Declare the return variable here 337 DECLARE @Bynf int 338 339 -- Add the T-SQL statements to compute the return value here 340 SELECT @Bynf = YEAR(GETDATE()) 341 342 IF MONTH(GETDATE()) >= 9 OR ( MONTH(GETDATE()) = 8 AND DAY(GETDATE()) >25 ) 343 BEGIN 344 SET @Bynf = @Bynf + 1 345 END 346 347 RETURN CAST(@Bynf AS char(4)) 348 END 349 350 GO
做個批處理,每次重啟數據庫的時候把數據加入,每天定時更新相關數據表
職位匹配腳本:
1 CREATE PROCEDURE RetirePositionsByXsxh 2 --參數
3 -- Add the parameters for the stored procedure here
4 @Xsxh NVARCHAR(20), @ResultType INT = 0, @PageSize INT = 99999, @StartPage INT = 0, @ReleaseDateRange SMALLINT = 9999
5 WITH ENCRYPTION 6 AS
7 BEGIN
8 -- SET NOCOUNT ON added to prevent extra result sets from
9 -- interfering with SELECT statements.
10 SET NOCOUNT ON; 11
12 --聲明變量
13 -- Insert statements for procedure here
14 DECLARE @Ssqtj NVARCHAR(1000), @TableRows int, @PositionRows int, 15 @ProvinceId NCHAR(6), @Zydm NVARCHAR(10), @JobNature NVARCHAR(8), @JobTypeID uniqueidentifier, @SubJobTypeID uniqueidentifier,@Salary NVARCHAR(30), 16 @Computer NVARCHAR(50), @Language NVARCHAR(50), @Education NVARCHAR(4), @HyID uniqueidentifier, @SubHyID uniqueidentifier, @ThirdHyID uniqueidentifier, 17 --........
18
19 DECLARE @StudentSsqtj TABLE (SsqtjID INT IDENTITY(1,1) NOT NULL Primary Key, Ssqtj NVARCHAR(1000)) 20 DECLARE @PositionSpecialty TABLE (ID INT NOT NULL Primary Key, Specialty NVARCHAR(1000)) 21 --...........
22
23 INSERT INTO @StudentSsqtj (Ssqtj) 24 SELECT Ssqtj 25 FROM .......[dbo].[PosiSearch] WITH (SNAPSHOT) 26 WHERE Xsxh = @Xsxh
27
28 SELECT @TableRows = Count(*) FROM @StudentSsqtj
29 SELECT @PositionRows = Count(*) FROM [EnterPrisePositions]
30
31 IF @TableRows > 0
32 BEGIN
33 SELECT @Salary_Weight = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Weight' AND [Set1] = 'HasSearch' AND [Set2] = 'Salary'
34 SELECT @JobLocation_Weight = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Weight' AND [Set1] = 'HasSearch' AND [Set2] = 'JobLocation'
35 SELECT @Education_Weight = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Weight' AND [Set1] = 'HasSearch' AND [Set2] = 'Education'
36 --........
37
38 SELECT @JobLocation_Same_Province = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Province' AND [Set1] = 'SameProvince'
39 SELECT @JobLocation_Same_City = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Province' AND [Set1] = 'SameCity'
40 SELECT @JobLocation_Not_Same = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Province' AND [Set1] = 'NotSame'
41 SELECT @Education_Same = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Education' AND [Set1] = 'Same'
42 SELECT @Education_Not_Same = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Education' AND [Set1] = 'NotSame'
43 --..........
44
45 WHILE @TableRows != 0
46 BEGIN
47 SELECT @Ssqtj = Ssqtj 48 FROM @StudentSsqtj
49 WHERE SsqtjID = @TableRows
50
51 --取職位搜索器
52 EXEC [GetStudentSearch] @Ssqtj, @ProvinceId OUTPUT, @Zydm OUTPUT, @JobNature OUTPUT, @JobTypeID OUTPUT , @SubJobTypeID OUTPUT, @Salary OUTPUT, 53 @Computer OUTPUT, @Language OUTPUT, @Education OUTPUT, @HyID OUTPUT, @SubHyID OUTPUT,@ThirdHyID OUTPUT 54
55
56 WHILE @PositionRows != 0
57 BEGIN
58
59 --取每個職位
60 SELECT @Pos_HyID = [HyID], @Pos_SubHyID = [SubHyID], @Pos_ThirdHyID = --.......
61 FROM [EnterPrisePositions]
62 WHERE ID = @PositionRows AND DATEDIFF(hour, ReleaseDate, GETDATE()) <= 24 * @ReleaseDateRange
63
64 IF @@ROWCOUNT <> 0
65 BEGIN
66 SET @Salary_Score = NULL
67 SET @JobLocation_Score = NULL
68 SET @Education_Score = NULL
69 SET @Profession_Score = NULL
70 SET @Industry_Score = NULL
71 SET @JobNature_Score = NULL
72 SET @JobType_Score = NULL
73 SET @Computer_Score = NULL
74 SET @Language_Score = NULL
75 SET @CurrentValue = 0
76 SET @WeightSummary = 0
77 SET @ParaValue = NULL
78 SET @CurrentScore = NULL
79
80 --計算薪資
81
82 SELECT @ParaValue = [Value] FROM [Parameters] WHERE [Type] = 'Salary' AND [Set1] = @Pos_Salary AND [Set2] = @Salary
83 IF @ParaValue IS NOT NULL
84 BEGIN
85 SET @Salary_Score = ABS(10 - @ParaValue)*@Salary_Weight
86 END
87
88
89 IF @ProvinceId <> ''
90 BEGIN
91 IF (LEFT(@Pos_ProvinceId,4) = LEFT(@ProvinceId, 4) OR ( 92 LEFT(@Pos_ProvinceId,2) = LEFT(@ProvinceId, 2) AND LEFT(@Pos_ProvinceId,2) IN ('10','12','31','50'))) 93 AND @JobLocation_Same_City IS NOT NULL
94 BEGIN
95 SET @JobLocation_Score = ABS(10 - @JobLocation_Same_City)*@JobLocation_Weight
96 END
97 ELSE IF LEFT(@Pos_ProvinceId,2) = LEFT(@ProvinceId, 2) AND @JobLocation_Same_Province IS NOT NULL
98 BEGIN
99 SET @JobLocation_Score = ABS(10 - @JobLocation_Same_Province)*@JobLocation_Weight
100 END
101 ELSE IF @JobLocation_Not_Same IS NOT NULL
102 BEGIN
103 SET @JobLocation_Score = ABS(10 - @JobLocation_Not_Same)*@JobLocation_Weight
104 END
105 END
106
107 --計算學歷
108 IF @Education NOT IN ('', '不限') 109 BEGIN
110 IF @Pos_Education = @Education
111 BEGIN
112 SET @Education_Score = ABS(10 - @Education_Same) * @Education_Weight
113 END
114 ELSE
115 BEGIN
116 SET @Education_Score = ABS(10 - @Education_Not_Same)*@Education_Weight
117 END
118 END
119
120 --計算專業
121 IF @Zydm <> ''
122 BEGIN
123 DELETE FROM @PositionSpecialty
124 INSERT INTO @PositionSpecialty (ID, Specialty) 125 SELECT ID, Specialty FROM [GetPositionSpecialty](@Pos_SpecialtyIds) 126 SELECT @Specialty_Rows = Count(*) FROM @PositionSpecialty
127 WHILE @Specialty_Rows <> 0
128 BEGIN
129 SELECT @Pos_SpecialtyId = Specialty FROM @PositionSpecialty WHERE ID = @Specialty_Rows
130 IF @Pos_SpecialtyId = '0' OR LEFT(@Pos_SpecialtyId,2) = LEFT(@Zydm,2) 131 BEGIN
132 SET @Profession_Score = ABS(10 - @Profession_Match) * @Profession_Weight
133 END
134 ELSE
135 BEGIN
136 IF @Profession_Score <> 0 OR @Profession_Score IS NULL
137 SET @Profession_Score = ABS(10 - @Profession_Not_Match) * @Profession_Weight
138 END
139 SET @Specialty_Rows = @Specialty_Rows - 1
140 END
141
142 END
143
144 --其他略
145
146
147 --計算匹配度,加到匹配度臨時表
148 IF @Salary_Score IS NOT NULL
149 BEGIN
150 SET @CurrentValue = @CurrentValue + @Salary_Score
151 SET @WeightSummary = @WeightSummary + @Salary_Weight
152 END
153 IF @JobLocation_Score IS NOT NULL
154 BEGIN
155 SET @CurrentValue = @CurrentValue + @JobLocation_Score
156 SET @WeightSummary = @WeightSummary + @JobLocation_Weight
157 END
158 IF @Education_Score IS NOT NULL
159 BEGIN
160 SET @CurrentValue = @CurrentValue + @Education_Score
161 SET @WeightSummary = @WeightSummary + @Education_Weight
162 END
163
164 IF @WeightSummary != 0
165 BEGIN
166 SET @CurrentScore = @CurrentValue / @WeightSummary
167 SET @ParaValue = NULL
168 SELECT @ParaValue = [Score] FROM @PositionScore WHERE ID = @PositionRows
169 BEGIN
170 IF @ParaValue IS NULL
171 BEGIN
172 INSERT INTO @PositionScore(ID, Score, Salary_Score, Province_Score, Education_Score , Profession_Score , Industry_Score , JobNature_Score , JobType_Score , Computer_Score , Language_Score) 173 VALUES (@PositionRows, @CurrentScore, @Salary_Score, @JobLocation_Score,@Education_Score , @Profession_Score , @Industry_Score , @JobNature_Score , @JobType_Score , @Computer_Score , @Language_Score) 174 END
175 ELSE
176 BEGIN
177 IF @CurrentScore < @ParaValue
178 BEGIN
179 UPDATE @PositionScore SET Score = @CurrentScore WHERE ID = @PositionRows
180 END
181 END
182 END
183 END
184 END
185 SET @PositionRows = @PositionRows - 1
186 END
187
188 --SELECT @ProvinceId, @Zydm, @JobNature, @JobTypeID, @SubJobTypeID,@Salary, @Computer, @Language , @Education, @HyID, @SubHyID, @ThirdHyID
189 SET @TableRows = @TableRows - 1
190 END
191 END
192 ELSE --學生沒有建立職位搜索器
193 BEGIN
194 SELECT @Top1_Industry = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Industry' AND [Set1] = 'Top1'
195 SELECT @Top2_Industry = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Industry' AND [Set1] = 'Top2'
196 --.......
197 SELECT @Enterprise_Weight = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Weight' AND [Set1] = 'HasNoSearch' AND [Set2] = 'Enterprise'
198
199 SELECT @Education = (CASE Xldm WHEN '01' THEN '研究生' WHEN '11' THEN '研究生' WHEN '31' THEN '本科' WHEN '61' THEN '高職' ELSE '' END), @Zydm = Gzydm 200 FROM .......[dbo].[StudentBasic] WITH (SNAPSHOT) 201 WHERE Xsxh = @Xsxh
202
203 WHILE @PositionRows != 0
204 BEGIN
205
206 --基本處理方法同前,也是一個個項目
207
208 SELECT @Pos_SubHyID = [SubHyID], @Pos_Education = [Education], @Pos_SpecialtyIds = [SpecialtyIds], @Pos_Zzjgdm = [OrgCode]
209 FROM [EnterPrisePositions] WITH(SNAPSHOT) 210 WHERE ID = @PositionRows
211 AND DATEDIFF(hour, ReleaseDate, GETDATE()) <= 24 * @ReleaseDateRange
212
213 IF @@ROWCOUNT <> 0
214 BEGIN
215 SET @Education_Score = NULL
216 SET @Profession_Score = NULL
217 SET @Industry_Score = NULL
218 SET @Enterprise_Score = NULL
219
220 --..........
221
222 IF @WeightSummary != 0
223 BEGIN
224 SET @CurrentScore = @CurrentValue / @WeightSummary
225 SET @ParaValue = NULL
226 SELECT @ParaValue = [Score] FROM @PositionScore WHERE ID = @PositionRows
227 BEGIN
228 IF @ParaValue IS NULL
229 BEGIN
230 INSERT INTO @PositionScore(ID, Score, Education_Score , Profession_Score , Industry_Score , Enterprise_Score) 231 VALUES (@PositionRows, @CurrentScore, @Education_Score , @Profession_Score , @Industry_Score , @Enterprise_Score) 232 END
233 ELSE
234 BEGIN
235 IF @CurrentScore < @ParaValue
236 BEGIN
237 UPDATE @PositionScore SET Score = @CurrentScore WHERE ID = @PositionRows
238 END
239 END
240 END
241 END
242 END
243
244 SET @PositionRows = @PositionRows - 1
245 END
246
247 --SELECT @Education, @Zydm
248
249 END
250
251
252 --根據各種輸入參數輸出
253 IF @ResultType = 1
254 SELECT s.*, p.*
255 FROM [EnterPrisePositions] P WITH (SNAPSHOT) LEFT OUTER JOIN @PositionScore S ON p.ID = s.ID 256 WHERE s.Score IS NOT NULL
257 ORDER BY Score 258 offset @PageSize*@StartPage rows fetch next @PageSize rows only --Sql2012的新的分頁特性,效率很高
259 ELSE IF @ResultType = 2
260 BEGIN
261 if exists(select * from sysobjects where id=object_id('UserPositionResult')) 262 DROP TABLE UserPositionResult 263
264 CREATE TABLE UserPositionResult 265 ( 266 [ID] [int] IDENTITY(1,1) NOT NULL Primary Key NONCLUSTERED HASH WITH (BUCKET_COUNT = 4096), 267 [Score] FLOAT, 268 [PosiID] [uniqueidentifier] NOT NULL, 269 [PosiName] [nvarchar](40) NULL, 270 [EntName] [nvarchar](80) NOT NULL, 271 --......
272
273 WHERE s.Score IS NOT NULL
274 ORDER BY Score 275 END
276 ELSE
277 SELECT s.Score, p.PosiID, p.PosiName, p.EntName, p.EntUserID, p.JobNature, p.Number, p.Salary, p.Education, p.Specialty 278 FROM [EnterPrisePositions] P WITH (SNAPSHOT) LEFT OUTER JOIN @PositionScore S ON p.ID = s.ID 279 WHERE s.Score IS NOT NULL
280 ORDER BY Score 281 offset @PageSize*@StartPage rows fetch next @PageSize rows only
282
283 --輸出總記錄數
284 SELECT Count(*) AS [TotalCount] FROM [EnterPrisePositions] P WITH (SNAPSHOT) LEFT OUTER JOIN @PositionScore S ON p.ID = s.ID 285 WHERE s.Score IS NOT NULL
286 END
287 GO
Sqlserver內存表的表現還是令人滿意的,所有的職位匹配都計算一遍1秒還不到。想想哈希桶的威力是比較大,以前是B樹索引,現在直接把時間復雜度近似降低到了O(1),對再大的數據量也是如此(需要設置合適的哈希桶數值)。
計算時的截圖:

加大數據樣本,到了上萬條,也是1秒鍾搞定。
心得:
- 對於不需要持久化的數據庫,Sqlserver的內存表是最佳選擇,建表的時候使用DURABILITY = SCHEMA_ONLY選項,不寫日志,讀寫速度扛扛的;
- 長的存儲過程,千萬不要使用游標,性能極其低下,而且或產生一大堆的鎖,影響其他進程。(改用While循環);
- 對內存表讀取使用SnapShot隔離級別;對普通讀取,實時數據准確度要求不高的情況下(比如數據分析)使用nolock隔離級別。
前台頁面顯示樣式如下:

我們還做了郵件推送,定期給學生推送職位。
希望這篇文章起到拋磚引玉的作用,能夠聽取大家的建議。



