加權中位數


問題描述為: 一個無序的數列,每個數有其對應的權重,權重為非負整數,代表數列中的數字出現的次數。要求找出這一無序數列中的中位數。

 

1. 直接解法,先對該數列和權重排序。然后找出累計權重為中位數的數字。 時間復雜度為排序的 O(nlog(n)+n)

 2 import numpy as np
 3 
 4 def weighted_median(data, weights):
 5     """
 6     Args:
 7       data (list or numpy.array): data
 8       weights (list or numpy.array): weights
 9     """
10     data, weights = np.array(data).squeeze(), np.array(weights).squeeze()
11     s_data, s_weights = map(np.array, zip(*sorted(zip(data, weights))))
12     midpoint = 0.5 * sum(s_weights)
13     if any(weights > midpoint):
14         w_median = (data[weights == np.max(weights)])[0]
15     else:
16         cs_weights = np.cumsum(s_weights)
17         idx = np.where(cs_weights <= midpoint)[0][-1]
18         if cs_weights[idx] == midpoint:
19             w_median = np.mean(s_data[idx:idx+2])
20         else:
21             w_median = s_data[idx+1]
22     return w_median
23 
24 def test_weighted_median():
25     print("hello, world")
26     data = [
27         [7, 1, 2, 4, 10],
28         [7, 1, 2, 4, 10],
29         [7, 1, 2, 4, 10, 15],
30         [1, 2, 4, 7, 10, 15],
31         [0, 10, 20, 30],
32         [1, 2, 3, 4, 5],
33         [30, 40, 50, 60, 35],
34         [2, 0.6, 1.3, 0.3, 0.3, 1.7, 0.7, 1.7, 0.4],
35     ]
36     weights = [
37         [1, 1/3, 1/3, 1/3, 1],
38         [1, 1, 1, 1, 1],
39         [1, 1/3, 1/3, 1/3, 1, 1],
40         [1/3, 1/3, 1/3, 1, 1, 1],
41         [30, 191, 9, 0],
42         [10, 1, 1, 1, 9],
43         [1, 3, 5, 4, 2],
44         [2, 2, 0, 1, 2, 2, 1, 6, 0],
45     ]
46     answers = [7, 4, 8.5, 8.5, 10, 2.5, 50, 1.7]
47     for datum, weight, answer in zip(data, weights, answers):
48         assert(weighted_median(datum, weight) == answer)
49 
50 if __name__ == "__main__":
51     test_weighted_median()

 

 2. 按照快速排序的思路,先找到一個數字,然后 按照該數字將數列划分成左右兩段,根據左右兩段的權重之和,遞歸調用左半側或者右半側數列。

 

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM