Java面試-List中的sort詳細解讀


最近看了一些排序相關的文章,因此比較好奇,Java中的排序是如何做的。本片文章介紹的是JDK1.8,List中的sort方法。

先來看看List中的sort是怎么寫的:

    @SuppressWarnings({"unchecked", "rawtypes"})
    default void sort(Comparator<? super E> c) {
        Object[] a = this.toArray();
        Arrays.sort(a, (Comparator) c);
        ListIterator<E> i = this.listIterator();
        for (Object e : a) {
            i.next();
            i.set((E) e);
        }
    }

首先,你需要傳入一個比較器作為參數,這個好理解,畢竟你肯定要定一個比較標准。然后就是將list轉換成一個數組,再對這個數組進行排序,排序完之后,再利用iterator重新改變list。

接着,我們再來看看Arrays.sort:

    public static <T> void sort(T[] a, Comparator<? super T> c) {
        if (c == null) {
            sort(a);
        } else {
            if (LegacyMergeSort.userRequested)
                legacyMergeSort(a, c);
            else
                TimSort.sort(a, 0, a.length, c, null, 0, 0);
        }
    }

    public static void sort(Object[] a) {
        if (LegacyMergeSort.userRequested)
            legacyMergeSort(a);
        else
            ComparableTimSort.sort(a, 0, a.length, null, 0, 0);
    }

    static final class LegacyMergeSort {
        private static final boolean userRequested =
            java.security.AccessController.doPrivileged(
                new sun.security.action.GetBooleanAction(
                    "java.util.Arrays.useLegacyMergeSort")).booleanValue();
    }

這樣可以看出,其實排序的核心就是TimSort,LegacyMergeSort大致意思是表明如果版本很舊的話,就用這個,新版本是不會采用這種排序方式的。

我們再來看看TimSort的實現:

    private static final int MIN_MERGE = 32;
    static <T> void sort(T[] a, int lo, int hi, Comparator<? super T> c,
                         T[] work, int workBase, int workLen) {
        assert c != null && a != null && lo >= 0 && lo <= hi && hi <= a.length;

        int nRemaining  = hi - lo;
        if (nRemaining < 2)
            return;  // Arrays of size 0 and 1 are always sorted

        // If array is small, do a "mini-TimSort" with no merges
        if (nRemaining < MIN_MERGE) {
            // 獲得最長的遞增序列
            int initRunLen = countRunAndMakeAscending(a, lo, hi, c);
            binarySort(a, lo, hi, lo + initRunLen, c);
            return;
        }

        /**
         * March over the array once, left to right, finding natural runs,
         * extending short natural runs to minRun elements, and merging runs
         * to maintain stack invariant.
         */
        TimSort<T> ts = new TimSort<>(a, c, work, workBase, workLen);
        int minRun = minRunLength(nRemaining);
        do {
            // Identify next run
            int runLen = countRunAndMakeAscending(a, lo, hi, c);

            // If run is short, extend to min(minRun, nRemaining)
            if (runLen < minRun) {
                int force = nRemaining <= minRun ? nRemaining : minRun;
                binarySort(a, lo, lo + force, lo + runLen, c);
                runLen = force;
            }

            // Push run onto pending-run stack, and maybe merge
            ts.pushRun(lo, runLen);
            ts.mergeCollapse();

            // Advance to find next run
            lo += runLen;
            nRemaining -= runLen;
        } while (nRemaining != 0);

        // Merge all remaining runs to complete sort
        assert lo == hi;
        ts.mergeForceCollapse();
        assert ts.stackSize == 1;
    }

如果小於2個,代表不再不需要排序;如果小於32個,則采用優化的二分排序。怎么優化的呢?首先獲得最長的遞增序列:

    private static <T> int countRunAndMakeAscending(T[] a, int lo, int hi,
                                                    Comparator<? super T> c) {
        assert lo < hi;
        int runHi = lo + 1;
        if (runHi == hi)
            return 1;

        // Find end of run, and reverse range if descending
        if (c.compare(a[runHi++], a[lo]) < 0) { // Descending
            // 一開始是遞減序列,就找出最長遞減序列的最后一個下標
            while (runHi < hi && c.compare(a[runHi], a[runHi - 1]) < 0)
                runHi++;
            // 逆轉前面的遞減序列
            reverseRange(a, lo, runHi);
        } else {                              // Ascending
            while (runHi < hi && c.compare(a[runHi], a[runHi - 1]) >= 0)
                runHi++;
        }

        return runHi - lo;
    }

接着進行二分排序:

    private static <T> void binarySort(T[] a, int lo, int hi, int start,
                                       Comparator<? super T> c) {
        assert lo <= start && start <= hi;
        if (start == lo)
            start++;
        for ( ; start < hi; start++) {
            T pivot = a[start];

            // Set left (and right) to the index where a[start] (pivot) belongs
            int left = lo;
            int right = start;
            assert left <= right;
            /*
             * Invariants:
             *   pivot >= all in [lo, left).
             *   pivot <  all in [right, start).
             */
            // start位置是遞增序列后的第一個數的位置
            // 從前面的遞增序列中找出start位置的數應該處於的位置
            while (left < right) {
                // >>> 無符號右移
                int mid = (left + right) >>> 1;
                if (c.compare(pivot, a[mid]) < 0)
                    right = mid;
                else
                    left = mid + 1;
            }
            assert left == right;

            /*
             * The invariants still hold: pivot >= all in [lo, left) and
             * pivot < all in [left, start), so pivot belongs at left.  Note
             * that if there are elements equal to pivot, left points to the
             * first slot after them -- that's why this sort is stable.
             * Slide elements over to make room for pivot.
             */
            int n = start - left;  // The number of elements to move
            // Switch is just an optimization for arraycopy in default case
            // 比pivot大的數往后移動一位
            switch (n) {
                case 2:  a[left + 2] = a[left + 1];
                case 1:  a[left + 1] = a[left];
                         break;
                default: System.arraycopy(a, left, a, left + 1, n);
            }
            a[left] = pivot;
        }
    }

好了,待排序數量小於32個的講完了,現在來說說大於等於32個情況。首先,獲得一個叫minRun的東西,這是個啥含義呢:

    int minRun = minRunLength(nRemaining);
    private static int minRunLength(int n) {
        assert n >= 0;
        int r = 0;      // Becomes 1 if any 1 bits are shifted off
        while (n >= MIN_MERGE) {
            // 這里我沒搞懂的是為什么不直接將(n & 1)賦值給r,而要做一次邏輯或。
            r |= (n & 1);
            n >>= 1;
        }
        return n + r;
    }

各種位運算符,MIN_MERGE默認為32,如果n小於此值,那么返回n本身。否則會將n不斷地右移,直到小於MIN_MERGE,同時記錄一個r值,r代表最后一次移位n時,n最低位是0還是1。
其實看注釋比較容易理解:

Returns the minimum acceptable run length for an array of the specified length. Natural runs shorter than this will be extended with binarySort.
Roughly speaking, the computation is: If n < MIN_MERGE, return n (it's too small to bother with fancy stuff).
Else if n is an exact power of 2, return MIN_MERGE/2.
Else return an int k, MIN_MERGE/2 <= k <= MIN_MERGE, such that n/k is close to, but strictly less than, an exact power of 2. For the rationale, see listsort.txt.

返回結果其實就是用於接下來的合並排序中。

接下來就是一個while循環

        do {
            // Identify next run
            // 獲得一個最長遞增序列
            int runLen = countRunAndMakeAscending(a, lo, hi, c);

            // If run is short, extend to min(minRun, nRemaining)
            // 如果最長遞增序列
            if (runLen < minRun) {
                int force = nRemaining <= minRun ? nRemaining : minRun;
                binarySort(a, lo, lo + force, lo + runLen, c);
                runLen = force;
            }

            // Push run onto pending-run stack, and maybe merge
            // lo——runLen為將要被歸並的范圍
            ts.pushRun(lo, runLen);
            // 歸並
            ts.mergeCollapse();

            // Advance to find next run
            lo += runLen;
            nRemaining -= runLen;
        } while (nRemaining != 0);

這樣,假設你的每次歸並排序的兩個序列為r1和r2,r1肯定是有序的,r2也已經被排成遞增序列了,因此這樣的歸並排序就比較特殊了。

為什么要用歸並排序呢,因為歸並排序的時間復雜度永遠為O(nlogn),空間復雜度為O(n),以空間換取時間。

好了,以上就是針對Java中的排序做的一次總結,但具體的歸並代碼還沒有分析,其實我自己也沒有完全研究透,為什么minRun的取值是這樣的,這也和TimSort中的stackLen有關,有興趣的小伙伴可以在下方留言,我們可以一起探討。
有興趣的話可以關注我的公眾號,說不定會有意外的驚喜。

在這里插入圖片描述


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM