深入理解numpy

本文轉載自查看原文 2016-12-16 13:22 1607

numpy是一個很大的庫，完全了解它是不現實的，只能是了解常用的功能。平時遇見不懂的地方弄清楚，注意積累。

組元不需要圓括號，雖然我們經常在Python中用圓括號將組元括起來，但是其實組元的語法定義只需要用逗號隔開即可，例如 x,y=y,x 就是用組元交換變量值的一個例子。

一、為啥需要numpy

Python雖然說注重優雅簡潔，但它終究是需要考慮效率的。
Python的列表，跟Java一樣，其實只是一維列表。一維列表相當於一種類型，這樣對於元素的訪問效率是很低的。
Python中一切皆引用，每一個int對象都要用指針指一下再用int存儲一下，浪費空間也浪費時間。當讀取某個元素的時候需要先讀取引用，再根據引用指向的內存地址來讀取int值。
numpy相當於完全采用了C語言那套數組機制。

二、numpy原則

一切皆一維，多維只是馬甲
多維數組的內部實現就是一維。
定長，一切皆矩形，一切皆長方體。
比如定義了一個數組a[3]，則len(a[0])=len(a[1])=len(a[2])，各個元素不能變長。正是因為定長這個原則，才有可能實現“一切皆一維”這個原則。
數組中元素類型相同，長度相同
numpy中的數組都是一維數組，並且這個一維數組中每個元素的長度相同，各個元素屬於同一種類型。
numpy中的元素相當於結構體，一個結構體所占字節數是固定的，numpy是允許用戶自定義結構體類型的。
數組就是一塊空間
想對它作何解釋就作何解釋，想給它穿上什么馬甲就給它穿上什么馬甲。
對於一個包含24個元素的一維數組，可以把它理解為4×6或者2×12或者3×8的二維數組，也可以把它理解為2×2×6或者3×2×4的三維數組。

三、numpy概念

ndarray.ndim
n前綴表示個數，dim表示dimension，ndim表示維數
ndarray.shape
數組的維度，這是一個指示數組在每個維度上大小的整數元組，這個元組的長度顯然是秩，即維度或者ndim屬性
ndarray.size
數組元素的總個數，等於shape屬性中元組元素的乘積。
ndarray.dtype
一個用來描述數組中元素類型的對象，可以通過創造或指定dtype使用標准Python類型。另外NumPy提供它自己的數據類型。
ndarray.itemsize
數組中每個元素的字節大小。例如，一個元素類型為float64的數組itemsiz屬性值為8(=64/8)，又如，一個元素類型為complex32的數組item屬性為4(=32/8)。
ndarray.data
包含實際數組元素的緩沖區，通常我們不需要使用這個屬性，因為我們總是通過索引來使用數組中的元素。
這個屬性太重要了，因為numpy中的ndarray只是一個馬甲，data部分才是真真正正的數據。ndarray中的shape、dtype等屬性相當於“數據解釋器”，用來描述data中的數據是如何組織的。

一個例子

>>> from numpy  import *
>>> a = arange(15).reshape(3,5)
>>> a
array([[ 0,  1,  2,  3,  4],
       [ 5,  6,  7,  8,  9],
       [10, 11, 12, 13, 14]])
>>> a.shape
(3, 5)
>>> a.ndim
2
>>> a.dtype.name
'int32'
>>> a.itemsize
4
>>> a.size
15
>>> type(a)
numpy.ndarray
>>> b = array([6, 7, 8])
>>> b
array([6, 7, 8])
>>> type(b)
numpy.ndarray

改變數組維度的兩種方式：

a.reshape(d1,d2...)，不改變a本身
a.shape=元組，改變a本身，相當於a=a.reshape(元組)

a=np.arange(24)

b=a.reshape(4,-1)

a.shape,b.shape
Out[21]: ((24,), (4, 6))

a.shape=-1,5
Traceback (most recent call last):

  File "<iPython-input-22-0a3b0c92d497>", line 1, in <module>
    a.shape=-1,5

ValueError: total size of new array must be unchanged


a.shape=-1,4

a
Out[24]: 
array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11],
       [12, 13, 14, 15],
       [16, 17, 18, 19],
       [20, 21, 22, 23]])

一個比較常用的功能是把多維數組變成一維數組，有如下三種實現方式

a.shape=-1
b=a.reshape(-1)
b=a.ravel() b和a共用同一份data
b=a.flatten() b的data是a的data的復制品
a.resize((d1,d2...)) 相當於a=a.reshape((d1,d2...))，也相當a.shape=d1,d2... ，它直接更改a的形狀

四、創建ndarray對象

np.array([[1,2],[3,4]],dtype=userType)
使用array對象來封裝Python的列表或者元祖或者range對象。
創建一維對象
np.linspace(start,stop,num,endpoint=True,retStep=false,dtype=None)產生等差數列，指明數組的首元素、末元素和數組長度，生成一個等差數列。
np.logspace產生等比數列，參數跟linspace完全一致
np.arange(start=0，end，step=1，dtype=None)產生等差數列
給定維數數組創建ndarray
np.ones全1數組
np.zeros全0數組
np.empty不做處理的數組，只負責開辟空間，比前面兩個速度快
創建隨機ndarray，可以指定不同的隨機分布，詳見下文
從字節序列出發創建ndarray
np.fromstring，np.frombuffer，np.fromfile
一切都是小頭序
從迭代器出發創建ndarray
np.fromiter:np。fromiter(range(100),dtype=np.int32,count=10)
從函數出發創建ndarray
np.fromfunction

>>> def func2(i, j):
...     return (i+1) * (j+1)
...
>>> a = np.fromfunction(func2, (9,9))
>>> a
array([[  1.,   2.,   3.,   4.,   5.,   6.,   7.,   8.,   9.],
       [  2.,   4.,   6.,   8.,  10.,  12.,  14.,  16.,  18.],
       [  3.,   6.,   9.,  12.,  15.,  18.,  21.,  24.,  27.],
       [  4.,   8.,  12.,  16.,  20.,  24.,  28.,  32.,  36.],
       [  5.,  10.,  15.,  20.,  25.,  30.,  35.,  40.,  45.],
       [  6.,  12.,  18.,  24.,  30.,  36.,  42.,  48.,  54.],
       [  7.,  14.,  21.,  28.,  35.,  42.,  49.,  56.,  63.],
       [  8.,  16.,  24.,  32.,  40.,  48.,  56.,  64.,  72.],
       [  9.,  18.,  27.,  36.,  45.,  54.,  63.,  72.,  81.]])

五.創建隨機ndarray對象

隨機值都是在區間[0，1)上

均勻分布
np.random.random(size=None)默認返回一個0~1之間的數字，可以指明維度序列來生成特定維度的隨機值。
np.random.randint(low，high，size)返回一個int型數組，每個元素都在[low，high)之間。Python標注庫中的np.random.randint(low,high)返回一個[low,high]之間的值（注意是閉區間）。
np.random.rand，相當於np.random.random((d1，d2，d3))，只不過這個函數的參數可以是多個而不僅僅是一個元組。

import numpy as np
print(np.random.random((1,2,3)))
print(np.random.rand(1,2,3))

正態分布
np.random.randn

六、存取元素

1、共享存儲空間的切片操作
numpy的設計原則就是“高效”。Python中的切片是復制原數組，效率很低。而numpy中的數組切片與原數組共享同一內存，效率極高。

>>> b = a[3:7] # 通過下標范圍產生一個新的數組b，b和a共享同一塊數據空間
>>> b
array([101，   4，   5，   6])
>>> b[2] = -10 # 將b的第2個元素修改為-10
>>> b
array([101，   4， -10，   6])
>>> a # a的第5個元素也被修改為10
array([  0，   1， 100， 101，   4， -10，   6，   7，   8，   9])

2、numpy下標操作完全包括Python中的下標操作

>>> a = np.arange(10)
>>> a[5]    # 用整數作為下標可以獲取數組中的某個元素
5
>>> a[3:5]  # 用范圍作為下標獲取數組的一個切片,包括a[3]不包括a[5]
array([3, 4])
>>> a[:5]   # 省略開始下標,表示從a[0]開始
array([0, 1, 2, 3, 4])
>>> a[:-1]  # 下標可以使用負數,表示從數組后往前數
array([0, 1, 2, 3, 4, 5, 6, 7, 8])
>>> a[2:4] = 100,101    # 下標還可以用來修改元素的值
>>> a
array([  0,   1, 100, 101,   4,   5,   6,   7,   8,   9])
>>> a[1:-1:2]   # 范圍中的第三個參數表示步長,2表示隔一個元素取一個元素
array([  1, 101,   5,   7])
>>> a[::-1] # 省略范圍的開始下標和結束下標,步長為-1,整個數組頭尾顛倒
array([  9,   8,   7,   6,   5,   4, 101, 100,   1,   0])
>>> a[5:1:-2] # 步長為負數時,開始下標必須大於結束下標
array([  5, 101])

3、numpy下標操作更加靈活
在Python下標操作基礎上，numpy的下標操作更加靈活：

a=np。arange(25)。reshape(5，5)

a
Out[63]: 
array([[ 0，  1，  2，  3，  4]，
       [ 5，  6，  7，  8，  9]，
       [10， 11， 12， 13， 14]，
       [15， 16， 17， 18， 19]，
       [20， 21， 22， 23， 24]])

a[3，3]
Out[65]: 18

a[:，2]
Out[66]: array([ 2，  7， 12， 17， 22])

4、通過下標數組獲取元素數組
numpy提供了兩種方式：

使用整數序列
使用bool數組

使用整數序列獲取多個元素，返回的是原數組的副本

>>> x = np.arange(10,1,-1)
>>> x
array([10,  9,  8,  7,  6,  5,  4,  3,  2])
>>> x[[3, 3, 1, 8]] # 獲取x中的下標為3, 3, 1, 8的4個元素,組成一個新的數組
array([7, 7, 9, 2])
>>> b = x[np.array([3,3,-3,8])]  #下標可以是負數
>>> b[2] = 100
>>> b
array([7, 7, 100, 2])
>>> x   # 由於b和x不共享數據空間,因此x中的值並沒有改變
array([10,  9,  8,  7,  6,  5,  4,  3,  2])
>>> x[[3,5,1]] = -1, -2, -3 # 整數序列下標也可以用來修改元素的值
>>> x
array([10, -3,  8, -1,  6, -2,  4,  3,  2])

使用布爾數組獲取元素，如果布爾數組不夠長，那么不夠長的部分默認為false

>>> x = np.arange(5,0,-1)
>>> x
array([5, 4, 3, 2, 1])
>>> x[np.array([True, False, True, False, False])]
>>> # 布爾數組中下標為0,2的元素為True,因此獲取x中下標為0,2的元素
array([5, 3])
>>> x[[True, False, True, False, False]]
>>> # 如果是布爾列表,則把True當作1, False當作0,按照整數序列方式獲取x中的元素
array([4, 5, 4, 5, 5])
>>> x[np.array([True, False, True, True])]
>>> # 布爾數組的長度不夠時,不夠的部分都當作False
array([5, 3, 2])
>>> x[np.array([True, False, True, True])] = -1, -2, -3
>>> # 布爾數組下標也可以用來修改元素
>>> x
array([-1,  4, -2, -3,  1])

5、多維數組
多維數組的存取和一維數組類似，因為多維數組有多個軸，因此它的下標需要用多個值來表示，NumPy采用組元(tuple)作為數組的下標。

多維數組的下標也可以使用下標數組和掩碼數組，但要注意這時得到的是原數組的副本。

七、numpy數據類型

bool_ Boolean (True or False) stored as a byte
int_ Default integer type (same as C long; normally either int64 or int32)
intc Identical to C int (normally int32 or int64)
intp Integer used for indexing (same as C ssize_t; normally either int32 or int64)
int8 Byte (-128 to 127)
int16 Integer (-32768 to 32767)
int32 Integer (-2147483648 to 2147483647)
int64 Integer (-9223372036854775808 to 9223372036854775807)
uint8 Unsigned integer (0 to 255)
uint16 Unsigned integer (0 to 65535)
uint32 Unsigned integer (0 to 4294967295)
uint64 Unsigned integer (0 to 18446744073709551615)
float_ Shorthand for float64。
float16 Half precision float: sign bit， 5 bits exponent， 10 bits mantissa
float32 Single precision float: sign bit， 8 bits exponent， 23 bits mantissa
float64 Double precision float: sign bit， 11 bits exponent， 52 bits mantissa
complex_ Shorthand for complex128。
complex64 Complex number， represented by two 32-bit floats (real and imaginary components)
complex128 Complex number， represented by two 64-bit floats (real and imaginary components)

要注意，直接更改ndarray的dtype屬性，並不改變data部分。還是那句話，ndarray是一個馬甲，用來描述一塊內存。

a.dtype=np.float32：並不改變data部分
b=a.astype(np.float32)：並不改變a,只是將a的data部分進行數據類型轉換后復制到了b的data部分

a=np.random.random(4)

a.dtype
Out[21]: dtype('float64')

a.dtype=np.float32

len(a)
Out[23]: 8

a.dtype=np.float16

len(a)
Out[25]: 16

八、通用函數

通用函數的作用對象是數組中的每一個元素。
通用函數通常有一個out參數，如果帶上這個參數就可以避免開辟新的內存空間。

九、廣播broadcast

兩個不同維度的數組相加

import numpy as np

a = np.arange(5)
b = np.arange(6).reshape(-1, 1)


def rep(a, c):
   for i in range(a.ndim-1, -1, -1):
      if a.shape[i] == c.shape[i]: continue
      if a.shape[i] == 1:
         a = a.repeat(c.shape[i], axis=i)
      else:
         raise Exception("dimention not match exception")
   return a


def add(a, b):
   if a.ndim>b.ndim: a, b = b, a
   ashape = [1] * (b.ndim-a.ndim) + list(a.shape)
   a = a.reshape(ashape)
   cshape = [max(a.shape[i], b.shape[i]) for i in range(a.ndim)]
   c = np.empty(cshape)
   a = rep(a, c)
   b = rep(b, c)
   a = a.reshape(-1)
   b = b.reshape(-1)
   cc = c.reshape(-1)
   for i in range(len(cc)):
      cc[i] = a[i] + b[i]
   return c


print(add(a, b))
print(a + b)

九、ndarray的組合與拆分

np.hstack
np.vstack
np.dstack
np.column_stack
np.row_stack

np.hsplit
np.vsplit
np.dsplit
np.split

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 深入理解JVM 2、MapStruct 深入理解深入理解AJAX vuex 深入理解 SpringBoot深入理解深入理解webpack 深入理解 BigDecimal 深入理解token 深入理解 SynchronizationContext 深入理解Provider