php源碼分析之base64_encode函數

本文轉載自查看原文 2015-12-08 20:20 2279 php源碼分析

base64_encode編碼規律分析

字符串長度除以3向上取整乘以4等於編碼后的字符串長度

ceil(strlen($string)/3)*4 = strlen(base64_encode($string));

例如base64_encode("abcd") == "YWJjZA==" (2*4=8)

如果字符串長度除以3的余數是0,則編碼后沒有“=”符號,且如果每相鄰3個字符塊相同,則編碼后相鄰4個字符串相同,例如base64_encode("abcabc")=="YWJjYWJj"
如果字符串長度除以3的余數是1,則編碼后有兩個“=”符號,例如base64_encode("abcd") == "YWJjZA=="
如果字符串長度除以3的余數是2,則編碼后有一個“=”符號,例如base64_encode("abcde") == "YWJjZGU="

這些規律是筆者在php源代碼中總結出來的，如果感興趣的話，請耐心聽我分析

首先，我們打開實現base64_encode函數的源碼文件（php源碼/ext/standard/base64.c文件）

我把主要部分代碼貼出來

..........................

static const char base64_table[] = {
    'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M',
    'N', 'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z',
    'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm',
    'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z',
    '0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '+', '/', '\0'
};

static const char base64_pad = '=';

.............................................

PHPAPI unsigned char *php_base64_encode(const unsigned char *str, int length, int *ret_length) /* {{{ */
{
    const unsigned char *current = str;
    unsigned char *p;
    unsigned char *result;

    if (length < 0) {
        if (ret_length != NULL) {
            *ret_length = 0;
        }
        return NULL;
    }

    result = (unsigned char *) safe_emalloc((length + 2) / 3, 4 * sizeof(char), 1);
    p = result;

    while (length > 2) { /* keep going until we have less than 24 bits */
        *p++ = base64_table[current[0] >> 2];
        *p++ = base64_table[((current[0] & 0x03) << 4) + (current[1] >> 4)];
        *p++ = base64_table[((current[1] & 0x0f) << 2) + (current[2] >> 6)];
        *p++ = base64_table[current[2] & 0x3f];

        current += 3;
        length -= 3; /* we just handle 3 octets of data */
    }

    /* now deal with the tail end of things */
    if (length != 0) {
        *p++ = base64_table[current[0] >> 2];
        if (length > 1) {
            *p++ = base64_table[((current[0] & 0x03) << 4) + (current[1] >> 4)];
            *p++ = base64_table[(current[1] & 0x0f) << 2];
            *p++ = base64_pad;
        } else {
            *p++ = base64_table[(current[0] & 0x03) << 4];
            *p++ = base64_pad;
            *p++ = base64_pad;
        }
    }
    if (ret_length != NULL) {
        *ret_length = (int)(p - result);
    }
    *p = '\0';
    return result;
}

................................................

PHP_FUNCTION(base64_encode)
{
    char *str;
    unsigned char *result;
    int str_len, ret_length;

    if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &str, &str_len) == FAILURE) {
        return;
    }
    result = php_base64_encode((unsigned char*)str, str_len, &ret_length);
    if (result != NULL) {
        RETVAL_STRINGL((char*)result, ret_length, 0);
    } else {
        RETURN_FALSE;
    }
}

PHP_FUNCTION(base64_encode)函數表示：注冊base64_encode函數

函數中先定義一個字符串指針變量str(用來保存base64_encode傳遞過來的字符串參數),無符號指針變量result(用來保存編碼后的字符串),整形str_len(字符串參數長度),ret_length(編碼后的字符串長度)

zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "s", &str, &str_len)將參數字符串保存到str中，字符串長度賦值給str_len

調用php_base64_encode函數獲取編碼后的字符串到result中

如果result不為空的話通過RETVAL_STRINGL返回,否則返回false

php_base64_encode(const unsigned char *str, int length, int *ret_length)函數對字符串進行編碼

這里主要看這段代碼

//我命名為第一代碼塊
while (length > 2) { /* keep going until we have less than 24 bits */
　　*p++ = base64_table[current[0] >> 2];
　　*p++ = base64_table[((current[0] & 0x03) << 4) + (current[1] >> 4)];
　　*p++ = base64_table[((current[1] & 0x0f) << 2) + (current[2] >> 6)];
　　*p++ = base64_table[current[2] & 0x3f];

　　current += 3;
　　length -= 3; /* we just handle 3 octets of data */
}

我們假設需要編碼的字符串為“abc”3個字符,則

第一個編碼字符

current[0]為a (a字符的二進制為01100001)

current[0]>>2為24(00011000)

所以第一個編碼字符是Y(base64_table[24]等於Y，對應前面定義的base64_table數組)

第二個編碼字符

current[0] & 0x03<<4為16

current[1] >> 4為6(01100010>>4等於00000110等於6)

所以第二個編碼字符是W(base64_table[22]等於W)

第三個編碼字符

current[1] & 0x0f為2(01100010&00001111等於00000010等於2)

(current[1] & 0x0f) << 2為8(00000010<<2等於00001000等於8)

所以第三個編碼字符是J(base64_table[9]等於J)

第四個編碼字符

current[2] & 0x3f為35(01100011&00111111等於00100011等於35)

所以第四個編碼字符為j(base64_table[35]等於j)

最后current指針加3到'\0'字符，length-3等於0，終止了while循環

此時的編碼字符是YWJj(如果字符串是‘abcabcabc’這樣的,那么編碼字符串是YWJjYWJjWYJj,因為current移動3個字符執行的都是同樣的算法)

我們繼續往下看

由於length-3后變成了0

所以

//我命名為第二代碼塊
if (length != 0) {
　　*p++ = base64_table[current[0] >> 2];
　　if (length > 1) {
　　　　*p++ = base64_table[((current[0] & 0x03) << 4) + (current[1] >> 4)];
　　　　*p++ = base64_table[(current[1] & 0x0f) << 2];
　　　　*p++ = base64_pad;
　　} else {
　　　　*p++ = base64_table[(current[0] & 0x03) << 4];
　　　　*p++ = base64_pad;
　　　　*p++ = base64_pad;
　　}
}

是不執行的，最終abc的編碼為YWJj,請大家驗證

假如字符串不是abc而是abcd,那么編碼為YWJj.........了，我們繼續來看上面的我命名的第二代碼塊

if(length!=0)//其實length要么等於0，要么等於1，要么等於2

abcd字符串經過while循環以后length=4-3=1了,current的指針轉到d字符

在我命名的第二代碼塊中

*p++ = base64_table[current[0]>>2]的結果為Z

if(length>1)//false因為length==1

所以執行else里面的代碼

*p++ = base64_table[(current[0]&0x03)<<]的結果為A

*p++ = base64_pad的結果為'='

所以所以abcd字符的編碼為"YWJjZA=="

字符串經過while循環后，3個字符作為一個塊編碼成4個字符，剩下的字符由if判斷，如果剩下一個字符，則為兩個編碼字符和兩個“=”，如果剩下兩個字符，則為3個編碼字符和一個“=”

這一點充分證明了我開始所講的

感謝大家的耐心閱讀，語言組織不到位的地方，歡迎大家通過評論提醒，在下隨時更正

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 php中 base64_decode與base64_encode加密解密函數 PHP學習 base64_encode +號變空格 php base64_encode 在url地址參數編碼上使用 php base64_encode和base64_decode 編碼/解碼url php使用base64_encode和base64_decode對數據進行編碼和解碼 [轉]PHP base64_encode 在URL地址參數編碼上使用 python_base64和encode函數 URL傳遞中文字符，特殊危險字符的解決方案（僅供參考）urldecode、base64_encode Jmeter，使用內置函數，__base64Encode加密 Base64Decode 和Base64Encode