解密隨機數生成器（二）——從java源碼看線性同余算法

本文轉載自查看原文 2016-08-31 00:21 3626 Random compareAndSet AtomicInteger 同余算法

Random

Java中的Random類生成的是偽隨機數，使用的是48-bit的種子，然后調用一個linear congruential formula線性同余方程（Donald Knuth的編程藝術的3.2.1節）

如果兩個Random實例使用相同的種子，並且調用同樣的函數，那么生成的sequence是相同的

也可以調用Math.random()生成隨機數

Random實例是線程安全的，但是並發使用Random實例會影響效率，可以考慮使用java.util.concurrent.ThreadLocalRandom(jdk1.7)。

/**
 * A random number generator isolated to the current thread.  Like the
 * global {@link java.util.Random} generator used by the {@link
 * java.lang.Math} class, a {@code ThreadLocalRandom} is initialized
 * with an internally generated seed that may not otherwise be
 * modified. When applicable, use of {@code ThreadLocalRandom} rather
 * than shared {@code Random} objects in concurrent programs will
 * typically encounter much less overhead and contention.  Use of
 * {@code ThreadLocalRandom} is particularly appropriate when multiple
 * tasks (for example, each a {@link ForkJoinTask}) use random numbers
 * in parallel in thread pools.
 *
 * <p>Usages of this class should typically be of the form:
 * {@code ThreadLocalRandom.current().nextX(...)} (where
 * {@code X} is {@code Int}, {@code Long}, etc).
 * When all usages are of this form, it is never possible to
 * accidently share a {@code ThreadLocalRandom} across multiple threads.
 *
 * <p>This class also provides additional commonly used bounded random
 * generation methods.
 *
 * <p>Instances of {@code ThreadLocalRandom} are not cryptographically
 * secure.  Consider instead using {@link java.security.SecureRandom}
 * in security-sensitive applications. Additionally,
 * default-constructed instances do not use a cryptographically random
 * seed unless the {@linkplain System#getProperty system property}
 * {@code java.util.secureRandomSeed} is set to {@code true}.
 *
 * @since 1.7
 * @author Doug Lea
 */
public class ThreadLocalRandom extends Random {

 int nextInt = ThreadLocalRandom.current().nextInt(10);

Random實例不是安全可靠的加密，可以使用java.security.SecureRandom來提供一個可靠的加密。

Random implements Serializable 可序列化的

AtomicLong seed 原子變量

解密隨機數生成器（2）——從java源碼看線性同余算法

上篇博客中，我們了解了基於物理現象的真隨機數生成器，然而，真隨機數產生速度較慢，為了實際計算需要，計算機中的隨機數都是由程序算法，也就是某些公式函數生成的，只不過對於同一隨機種子與函數，得到的隨機數列是一定的，因此得到的隨機數可預測且有周期，不能算是真正的隨機數，因此稱為偽隨機數（Pseudo Random Number）。

不過，別看到偽字就瞧不起，這里面也是有學問的，看似幾個簡簡單單的公式可能是前輩們努力了幾代的成果，相關的研究可以寫好幾本書了！
順便提一下，亞裔唯一圖靈獎得主姚期智，研究的就是偽隨機數生成論（The pseudo random number generating theory）。
在這里，我重點介紹兩個常用的算法：同余法（Congruential method）和梅森旋轉算法（Mersenne twister）

1、同余法

同余法（Congruential method）是很常用的一種隨機數生成方法，在很多編程語言中有應用，最明顯的就是java了，java.util.Random類中用的就是同余法中的一種——線性同余法（Linear congruential method），除此之外還有乘同余法（Multiplicative congruential method）和混合同余法（Mixed congruential method）。好了，現在我們就打開java的源代碼，看一看線性同余法的真面目！

在Eclipse中輸入java.util.Random，按F3轉到Random類的源代碼：

首先，我們看到這樣一段說明：

翻譯過來是：

這個類的一個實現是用來生成一串偽隨機數。這個類用了一個48位的種子，被線性同余公式修改用來生成隨機數。（見Donald Kunth《計算機編程的藝術》第二卷，章節3.2.1）

顯然，java的Random類使用的是線性同余法來得到隨機數的。

接着往下看，我們找到了它的構造函數與幾個方法，里面包含了獲得48位種子的過程：

private static final AtomicLong seedUniquifier = new AtomicLong(8682522807148012L);
/**
 * Creates a new random number generator. This constructor sets
 * the seed of the random number generator to a value very likely
 * to be distinct from any other invocation of this constructor.
 */
public Random() {
    this(seedUniquifier() ^ System.nanoTime());
}

private static long seedUniquifier() {
    // L'Ecuyer, "Tables of Linear Congruential Generators of
    // Different Sizes and Good Lattice Structure", 1999
    for (;;) {
        long current = seedUniquifier.get();
        long next = current * 181783497276652981L;
        if (seedUniquifier.compareAndSet(current, next))
            return next;
    }
}

private static final AtomicLong seedUniquifier
    = new AtomicLong(8682522807148012L);
public Random(long seed) {
    if (getClass() == Random.class)
        this.seed = new AtomicLong(initialScramble(seed));
    else {
        // subclass might have overriden setSeed
        this.seed = new AtomicLong();
        setSeed(seed);
    }
}
private static long initialScramble(long seed) {
    return (seed ^ multiplier) & mask;
}

java.util.concurrent.atomic.AtomicLong
public final boolean compareAndSet(long expect,
                                   long update)
Atomically sets the value to the given updated value if the current value == the expected value.
Parameters:
expect - the expected value
update - the new value
Returns:
true if successful. False return indicates that the actual value was not equal to the expected value.

這里使用了System.nanoTime()方法來得到一個納秒級的時間量，參與48位種子的構成，然后還進行了一個很變態的運算——不斷乘以181783497276652981L，直到某一次相乘前后結果相同——來進一步增大隨機性，這里的nanotime可以算是一個真隨機數，不過有必要提的是，nanoTime和我們常用的currenttime方法不同，返回的不是從1970年1月1日到現在的時間，而是一個隨機的數——只用來前后比較計算一個時間段，比如一行代碼的運行時間，數據庫導入的時間等，而不能用來計算今天是哪一天。

    /**
     * Returns the current value of the running Java Virtual Machine's
     * high-resolution time source, in nanoseconds.
     *
     * <p>This method can only be used to measure elapsed time and is
     * not related to any other notion of system or wall-clock time.
     * The value returned represents nanoseconds since some fixed but
     * arbitrary <i>origin</i> time (perhaps in the future, so values
     * may be negative).  The same origin is used by all invocations of
     * this method in an instance of a Java virtual machine; other
     * virtual machine instances are likely to use a different origin.
     *
     * <p>This method provides nanosecond precision, but not necessarily
     * nanosecond resolution (that is, how frequently the value changes)
     * - no guarantees are made except that the resolution is at least as
     * good as that of {@link #currentTimeMillis()}.
     *
     * <p>Differences in successive calls that span greater than
     * approximately 292 years (2<sup>63</sup> nanoseconds) will not
     * correctly compute elapsed time due to numerical overflow.
     *
     * <p>The values returned by this method become meaningful only when
     * the difference between two such values, obtained within the same
     * instance of a Java virtual machine, is computed.
     *
     * <p> For example, to measure how long some code takes to execute:
     *  <pre> {@code
     * long startTime = System.nanoTime();
     * // ... the code being measured ...
     * long estimatedTime = System.nanoTime() - startTime;}</pre>
     *
     * <p>To compare two nanoTime values
     *  <pre> {@code
     * long t0 = System.nanoTime();
     * ...
     * long t1 = System.nanoTime();}</pre>
     *
     * one should use {@code t1 - t0 < 0}, not {@code t1 < t0},
     * because of the possibility of numerical overflow.
     *
     * @return the current value of the running Java Virtual Machine's
     *         high-resolution time source, in nanoseconds
     * @since 1.5
     */
    public static native long nanoTime();

好了，現在我不得不佩服這位工程師的變態了：到目前為止，這個程序已經至少進行了三次隨機：

1、獲得一個長整形數作為“初始種子”（系統默認的是8682522807148012L）

2、不斷與一個變態的數——181783497276652981L相乘（天知道這些數是不是工程師隨便滾鍵盤滾出來的-.-）得到一個不能預測的值，直到能把這個不能事先預期的值賦給Random對象的靜態常量seedUniquifier 。因為多線程環境下賦值操作可能失敗，就for(;;)來保證一定要賦值成功

3、與系統隨機出來的nanotime值作異或運算，得到最終的種子

再往下看，就是我們常用的得到隨機數的方法了，我首先找到了最常用的nextInt（）函數，代碼如下：

public int nextInt() {
    return next(32);
}

代碼很簡潔，直接跳到了next函數：

protected int next(int bits) {
    long oldseed, nextseed;
    AtomicLong seed = this.seed;
    do {
        oldseed = seed.get();
        nextseed = (oldseed * multiplier + addend) & mask;
    } while (!seed.compareAndSet(oldseed, nextseed));
    return (int)(nextseed >>> (48 - bits));
}

OK,祝賀一下怎么樣，因為我們已經深入到的線性同余法的核心了——沒錯，就是這幾行代碼！

在分析這段代碼前，先來簡要介紹一下線性同余法。

在程序中為了使表達式的結果小於某個值，我們常常采用取余的操作，結果是同一個除數的余數，這種方法叫同余法（Congruential method）。

線性同余法是一個很古老的隨機數生成算法，它的數學形式如下：

Xn+1 = (a*Xn+c)(mod m)

其中，

m>0,0<a<m,0<c<m

這里Xn這個序列生成一系列的隨機數，X0是種子。隨機數產生的質量與m，a，c三個參數的選取有很大關系。這些隨機數並不是真正的隨機，而是滿足在某一周期內隨機分布，這個周期的最長為m。根據Hull-Dobell Theorem，當且僅當：

1. c和m互素;

2. a-1可被所有m的質因數整除;

3. 當m是4的整數倍，a-1也是4的整數倍時，周期為m。所以m一般都設置的很大，以延長周期。

現在我們回過頭來看剛才的程序，注意這行代碼：

nextseed = (oldseed * multiplier + addend) & mask;

和Xn+1=(a*Xn+c)(mod m)的形式很像有木有！

沒錯，就是這一行代碼應用到了線性同余法公式！不過還有一個問題：怎么沒見取余符號？嘿嘿，先讓我們看看三個變量的數值聲明：

    private static final long multiplier = 0x5DEECE66DL;
    private static final long addend = 0xBL;
    private static final long mask = (1L << 48) - 1;

其中multiplier和addend分別代表公式中的a和c，很好理解，但mask代表什么呢？其實，x & [(1L << 48)–1]與 x（mod 2^48）等價。解釋如下：

x對於2的N次冪取余，由於除數是2的N次冪，如：

0001，0010，0100，1000。。。。

相當於把x的二進制形式向右移N位，此時移到小數點右側的就是余數，如：

13 = 1101 8 = 1000

13 / 8 = 1.101，所以小數點右側的101就是余數，化成十進制就是5

然而，無論是C語言還是java,位運算移走的數顯然都一去不復返了。（什么，你說在CF寄存器中？好吧，太高端了點，其實還有更給力的方法）有什么好辦法保護這些即將逝去的數據呢？

學着上面的mask，我們不妨試着把2的N次冪減一：

0000，0001，0011，0111，01111，011111。。。

怎么樣，有啟發了嗎？

我們知道，某個數（限0和1）與1作與（&）操作，結果還是它本身；而與0作與操作結果總是0，即：

a & 1 = a, a & 0 = 0

而我們將x對2^N取余操作希望達到的目的可以理解為：

1、所有比2^N位（包括2^N那一位）全都為0

2、所有比2^N低的位保持原樣

因此， x & （2^N-1）與x（mod 2^N）運算等價，還是13與8的例子：

1101 % 1000 = 0101 1101 & 0111 = 0101

二者結果一致。

嘿嘿，講明白了這個與運算的含義，我想上面那行代碼的含義應該很明了了，就是線性同余公式的直接套用，其中a = 0x5DEECE66DL, c = 0xBL, m = 2^48，就可以得到一個48位的隨機數，而且這個謹慎的工程師進行了迭代，增加結果的隨機性。再把結果移位，就可以得到指定位數的隨機數。

接下來我們研究一下更常用的一個函數——帶參數n的nextInt：

    public int nextInt(int n) {
        if (n <= 0)
            throw new IllegalArgumentException("n must be positive");
 
        if ((n & -n) == n)  // i.e., n is a power of 2
            return (int)((n * (long)next(31)) >> 31);
 
        int bits, val;
        do {
            bits = next(31);
            val = bits % n;
        } while (bits - val + (n-1) < 0);
        return val;
    }

顯然，這里基本的思路還是一樣的，先調用next函數生成一個31位的隨機數（int類型的范圍），再對參數n進行判斷，如果n恰好為2的方冪，那么直接移位就可以得到想要的結果；如果不是2的方冪，那么就關於n取余，最終使結果在[0,n)范圍內。另外，do-while語句的目的應該是防止結果為負數。

你也許會好奇為什么(n & -n) == n可以判斷一個數是不是2的次方冪，其實我也是研究了一番才弄明白的，其實，這主要與補碼的特性有關：

眾所周知，計算機中負數使用補碼儲存的（不懂什么是補碼的自己百度惡補），舉幾組例子：

2 ：0000 0010 -2 ：1111 1110

8 ：0000 1000 -8 ：1111 1000

18 ：0001 0010 -18 ：1110 1110

20 ：0001 0100 -20 ：1110 1100

不知道大家有沒有注意到，補碼有一個特性，就是可以對於兩個相反數n與-n，有且只有最低一個為1的位數字相同且都為1，而更低的位全為0，更高的位各不相同。因此兩數作按位與操作后只有一位為1，而能滿足這個結果仍為n的只能是原本就只有一位是1的數，也就是恰好是2的次方冪的數了。

不過個人覺得還有一種更好的判斷2的次方冪的方法：

n & (n-1) == 0

感興趣的也可以自己研究一下^o^。

好了，線性同余法就介紹到這了，下面簡要介紹一下另一種同余法——乘同余法（Multiplicative congruential method）。

上文中的線性同余法，主要用來生成整數，而某些情景下，比如科研中，常常只需要（0，1）之間的小數，這時，乘同余法是更好的選擇，它的基本公式和線性同余法很像：

Xn+1=（a*Xn ）(mod m ）

其實只是令線性公式中的c=0而已。只不過，為了得到小數，我們多做一步：

Yn = Xn/m

由於Xn是m的余數，所以Yn的值介於0與1之間，由此到（0，1）區間上的隨機數列。

除此之外，還有混合同余法，二次同余法，三次同余法等類似的方法，公式類似，也各有優劣，在此不詳細介紹了。

同余法優勢在計算速度快，內存消耗少。但是，因為相鄰的隨機數並不獨立，序列關聯性較大。所以，對於隨機數質量要求高的應用，特別是很多科研領域，並不適合用這種方法。

不要走開，下篇博客介紹一個更給力的算法——梅森旋轉算法（Mersenne Twister），持續關注啊！

http://www.myexception.cn/program/1609435.html

Atomic 從JDK5開始, java.util.concurrent包里提供了很多面向並發編程的類. 使用這些類在多核CPU的機器上會有比較好的性能.
主要原因是這些類里面大多使用(失敗-重試方式的)樂觀鎖而不是synchronized方式的悲觀鎖.

今天有時間跟蹤了一下AtomicInteger的incrementAndGet的實現.
本人對並發編程也不是特別了解, 在這里就是做個筆記, 方便以后再深入研究.

1. incrementAndGet的實現

    public final int incrementAndGet() {
        for (;;) {
            int current = get();
            int next = current + 1;
            if ( compareAndSet(current, next))
                return next;
        }
    }

首先可以看到他是通過一個無限循環(spin)直到increment成功為止.
循環的內容是
1.取得當前值
2.計算+1后的值
3.如果當前值還有效(沒有被)的話設置那個+1后的值
4.如果賦值沒成功(
當前值已經無效了即被別的線程改過了.
expect這個參數就是用來校驗當前值是否被別的參數更改了
), 再從1開始.

2. compareAndSet的實現

    public final boolean compareAndSet(int expect, int update) {
        return unsafe.compareAndSwapInt(this, valueOffset, expect, update);
    }

直接調用的是UnSafe這個類的compareAndSwapInt方法
全稱是sun.misc.Unsafe. 這個類是Oracle(Sun)提供的實現. 可以在別的公司的JDK里就不是這個類了

3. compareAndSwapInt的實現

    /**
     * Atomically update Java variable to <tt>x</tt> if it is currently
     * holding <tt>expected</tt>.
     * @return <tt>true</tt> if successful
     */
    public final native boolean compareAndSwapInt(Object o, long offset,
                                                  int expected,
                                                  int x);

可以看到, 不是用Java實現的, 而是通過JNI調用操作系統的原生程序.

4. compareAndSwapInt的native實現
如果你下載了OpenJDK的源代碼的話在hotspot\src\share\vm\prims\目錄下可以找到unsafe.cpp

UNSAFE_ENTRY(jboolean, Unsafe_CompareAndSwapInt(JNIEnv *env, jobject unsafe, jobject obj, jlong offset, jint e, jint x))
  UnsafeWrapper("Unsafe_CompareAndSwapInt");
  oop p = JNIHandles::resolve(obj);
  jint* addr = (jint *) index_oop_from_field_offset_long(p, offset);
  return (jint)(Atomic::cmpxchg(x, addr, e)) == e;
UNSAFE_END

可以看到實際上調用Atomic類的cmpxchg方法.

5. Atomic的cmpxchg
這個類的實現是跟操作系統有關, 跟CPU架構也有關, 如果是windows下x86的架構
實現在hotspot\src\os_cpu\windows_x86\vm\目錄的atomic_windows_x86.inline.hpp文件里

inline jint     Atomic::cmpxchg    (jint     exchange_value, volatile jint*     dest, jint     compare_value) {
  // alternative for InterlockedCompareExchange
  int mp = os::is_MP();
  __asm {
    mov edx, dest
    mov ecx, exchange_value
    mov eax, compare_value
    LOCK_IF_MP(mp)
    cmpxchg dword ptr [edx], ecx
  }
}

在這里可以看到是用嵌入的匯編實現的, 關鍵CPU指令是 cmpxchg
到這里沒法再往下找代碼了. 也就是說CAS的原子性實際上是CPU實現的. 其實在這一點上還是有排他鎖的. 只是比起用synchronized, 這里的排他時間要短的多. 所以在多線程情況下性能會比較好.

代碼里有個alternative for InterlockedCompareExchange
這個InterlockedCompareExchange是WINAPI里的一個函數, 做的事情和上面這段匯編是一樣的
http://msdn.microsoft.com/en-us/library/windows/desktop/ms683560%28v=vs.85%29.aspx

6. 最后再貼一下x86的cmpxchg指定

Opcode CMPXCHG

CPU: I486+
Type of Instruction: User

Instruction: CMPXCHG dest, src

Description: Compares the accumulator with dest. If equal the "dest"
is loaded with "src", otherwise the accumulator is loaded
with "dest".

Flags Affected: AF, CF, OF, PF, SF, ZF

CPU mode: RM,PM,VM,SMM
+++++++++++++++++++++++
Clocks:
CMPXCHG reg, reg 6
CMPXCHG mem, reg 7 (10 if compartion fails)

http://www.blogjava.net/mstar/archive/2013/04/24/398351.html

姚期智：
他先是進入清華大學高等研究中心任全職教授。之后主導成立了一個“姚班”！
之所以發起成立這個實驗班，是因為他感覺當前，中國的計算機科學本科教育水平，與麻省理工、斯坦福等，國外一流大學的教學水平，仍有一定的差距，因此，他希望能以他在國外，多年的理論研究與教學經驗，把這個班的學生培養成為具有麻省理工、斯坦福同等水平的世界頂尖計算機科學人才。
他曾在致清華全校同學的信中寫道：
“我們的目標並不是培養優秀的計算機軟件程序員，我們要培養的是具有國際水平的一流計算機人才。”

他說：“我感覺物理學研究，與我原來想象的有些不同。恰在這個時期計算機剛剛興起，有很多有意思的問題等着解決。我恰巧遇上這一學科，我認為這個選擇是對的。”
“人生就像雞蛋，從外打破是壓力，從內打破是成長。只有不斷自我修正，才會擁有向上爬的力量！”

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 線性同余法隨機數生成器純線性同余隨機數生成器解密隨機數生成器（1）——真隨機數生成器（轉） python3 線性同余發生器（ random 隨機數生成器）偽隨機數產生周期的一些探究關於 ThreadLocalRandom 隨機數生成器淺談隨機數生成器及其應用 js 隨機數生成器 C++11 隨機數生成器 Java中的隨機數生成器：Random，ThreadLocalRandom，SecureRandom（轉）隨機數生成器，可用於頁面驗證碼的生成