Java解決TopK問題（使用集合和直接實現）

本文轉載自查看原文 2017-04-14 18:00 4228 算法/ java/ 數據結構與算法/ 數據結構

在處理大量數據的時候，有時候往往需要找出Top前幾的數據，這時候如果直接對數據進行排序，在處理海量數據的時候往往就是不可行的了，而且在排序最好的時間復雜度為nlogn，當n遠大於需要獲取到的數據的時候，時間復雜度就顯得過高。
使用最小堆或者最大堆可以很好地解決Top大問題或者Top小問題。

Top大問題解決思路：使用一個固定大小的最小堆，當堆滿后，每次添加數據的時候與堆頂元素比較，若小於堆頂元素，則舍棄，若大於堆頂元素，則刪除堆頂元素，添加新增元素，對堆進行重新排序。
Top小問題解決思路：使用一個固定大小的最大堆，當堆滿后，每次添加數據到時候與堆頂元素進行比較，若大於堆頂元素，則舍棄，若小於堆頂元素，則刪除堆頂元素，添加新增元素，對堆進行重新排序。

對於n個數，取Top m個數，時間復雜度為O(nlogm)，這樣在n較大情況下，是優於nlogn的時間復雜度的。

比如10000個數據，取前100大的數，那么時間復雜度就是O(10000log100)。
因為在插入數據的時候需要遍歷元素時間復雜度達到了O(10000)，然后每次插入過程中進行調整的復雜度為O(log100)，所以總體時間復雜度為O(10000log100)。

使用Java類庫集合實現

Java集合中的PriorityQueue就可以實現最大堆或者最小堆，從名字可以知道該集合是優先隊列，數據結構中的優先隊列就是使用堆來實現的。

// 底層通過一個Object類型數據保存元素
transient Object[] queue;

// 通過Comparator制定比較方法
private final Comparator<? super E> comparator;


// 其中一個構造函數
public PriorityQueue(int initialCapacity,
                     Comparator<? super E> comparator) {
    // Note: This restriction of at least one is not actually needed,
    // but continues for 1.5 compatibility
    if (initialCapacity < 1)
        throw new IllegalArgumentException();
    this.queue = new Object[initialCapacity];
    this.comparator = comparator;
}

下面就使用PriorityQueue來實現最小堆和最大堆。

在構造PriorityQueue的時候需要傳入一個size和一個比較函數，制定堆中元素比較規則。
重寫compare(o1, o2)方法，最小堆使用o1 - o2，最大堆使用o2 - o1。

public class TopK<E extends Comparable> {
    private PriorityQueue<E> queue;
    private int maxSize; //堆的最大容量

    public TopK(int maxSize) {
        if (maxSize <= 0) {
            throw new IllegalStateException();
        }
        this.maxSize = maxSize;
        this.queue = new PriorityQueue<>(maxSize, new Comparator<E>() {
            @Override
            public int compare(E o1, E o2) {
                // 最大堆用o2 - o1，最小堆用o1 - o2
                return (o1.compareTo(o2));
            }
        });
    }

    public void add(E e) {
        if (queue.size() < maxSize) {
            queue.add(e);
        } else {
            E peek = queue.peek();
            if (e.compareTo(peek) > 0) {
                queue.poll();
                queue.add(e);
            }
        }
    }

    public List<E> sortedList() {
        List<E> list = new ArrayList<>(queue);
        Collections.sort(list);
        return list;
    }

    public static void main(String[] args) {
        int[] array = {4, 5, 1, 6, 2, 7, 3, 8};
        TopK pq = new TopK(4);
        for (int n : array) {
            pq.add(n);
        }
        System.out.println(pq.sortedList());
    }
}

運行結果：

使用Java實現

通過上述講述，基本了解最大堆和最小堆情況以及它們與TopK問題的關系，上面是使用集合實現，下面使用Java來實現最小堆，並解決TopK大問題。

限定數據大小。
若堆滿，則插入過程中與堆頂元素比較，並做相應操作。
每次刪除堆頂元素后堆做一次調整，保證最小堆特性。

public class TopK {
    int[] items;
    int currentSize = 0;

	// 初始化為size + 1，從下標1開始保存元素。
    public TopK(int size) {
        items = new int[size + 1];
    }

	// 插入元素
    public void insert(int x) {
        if (currentSize == items.length - 1) {
            if (compare(x, items[1]) < 0) {
                return;
            } else if (compare(x, items[1]) > 0) {
                deleteMin();
            }
        }

        int hole = ++currentSize;
        for (items[0] = x; compare(x, items[hole / 2]) < 0; hole /= 2) {
            items[hole] = items[hole / 2];
        }
        items[hole] = x;
    }

	// 刪除最小堆中最小元素
    public int deleteMin() {
        int min = items[1];
        items[1] = items[currentSize--];
        percolateDown(1);
        return min;
    }

	// 下濾
    public void percolateDown(int hole) {
        int child;
        int temp = items[1];

        for (; hole * 2 <= currentSize; hole = child) {
            child = 2 * hole;
            if (child != currentSize && compare(items[child + 1], items[child]) == -1) {
                child++;
            }
            if (compare(items[child], temp) < 0) {
                items[hole] = items[child];
            } else {
                break;
            }
        }
        items[hole] = temp;
    }

	// 制定比較規則
    public static int compare(int a, int b) {
        if (a < b) {
            return -1;
        } else if (a > b) {
            return 1;
        }
        return 0;
    }

    public static void main(String[] args) {
        TopK topK = new TopK(10);
        for (int i = 1; i <= 100; i++) {
            topK.insert(i);
        }
        for (int j = 1; j <= topK.currentSize; j++) {
            System.out.print(topK.items[j] + " ");
        }
        System.out.println();
    }
}

運行結果：

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 TopK問題：什么是TopK問題？用堆和快排這兩種方式來實現TopK 分治思想--快速排序解決TopK問題 TopK問題詳解 TopK 解決pip安裝pytorch緩慢問題（直接使用命令） Java集合如何實現排序？集合運算 - Java實現集合的交、並、差實用方法 - 解決360Doc文章不能復制的問題（實現不登錄直接復制） java~集合分組groupby的實現 Java實現List集合分割