一. 基礎數據類型補充內容
1.1 字符串
字符串咱們之前已經講了一些非常重要的方法,剩下還有一些方法雖然不是那么重要,但是也算是比較常用,在此給大家在補充一些,需要大家盡量記住。

#captalize,swapcase,title print(name.capitalize()) #首字母大寫 print(name.swapcase()) #大小寫翻轉 msg='taibai say hi' print(msg.title()) #每個單詞的首字母大寫 # 內同居中,總長度,空白處填充 ret2 = a1.center(20,"*") print(ret2) #尋找字符串中的元素是否存在 # ret6 = a4.find("fjdk",1,6) # print(ret6) # 返回的找到的元素的索引,如果找不到返回-1 # ret61 = a4.index("fjdk",4,6) # print(ret61) # 返回的找到的元素的索引,找不到報錯。
1.2 元組
python中元組有一個特性,元組中如果只含有一個元素且沒有逗號,則該元組不是元組,與改元素數據類型一致,如果有逗號,那么它是元組。

tu = (1) print(tu,type(tu)) # 1 <class 'int'> tu1 = ('alex') print(tu1,type(tu1)) # 'alex' <class 'str'> tu2 = ([1, 2, 3]) print(tu2,type(tu2)) # [1, 2, 3] <class 'list'> tu = (1,) print(tu,type(tu)) # (1,) <class 'tuple'> tu1 = ('alex',) print(tu1,type(tu1)) # ('alex',) <class 'tuple'> tu2 = ([1, 2, 3],) print(tu2,type(tu2)) # ([1, 2, 3],) <class 'tuple'>
元組也有一些其他的方法:
index:通過元素找索引(可切片),找到第一個元素就返回,找不到該元素即報錯。

tu = ('太白', [1, 2, 3, ], 'WuSir', '女神') print(tu.index('太白')) # 0
count: 獲取某元素在列表中出現的次數

tu = ('太白', '太白', 'WuSir', '吳超') print(tu.count('太白')) # 2
1.3 列表
列表的其他操作方法:
count(數)(方法統計某個元素在列表中出現的次數)。
1 a = ["q","w","q","r","t","y"] 2 print(a.count("q"))
index(方法用於從列表中找出某個值第一個匹配項的索引位置)
1 a = ["q","w","r","t","y"] 2 print(a.index("r"))
sort (方法用於在原位置對列表進行排序)。
reverse (方法將列表中的元素反向存放)。
1 a = [2,1,3,4,5] 2 a.sort()# 他沒有返回值,所以只能打印a 3 print(a) 4 a.reverse()#他也沒有返回值,所以只能打印a 5 print(a)
列表也可以相加與整數相乘

l1 = [1, 2, 3] l2 = [4, 5, 6] # print(l1+l2) # [1, 2, 3, 4, 5, 6] print(l1*3) # [1, 2, 3, 1, 2, 3, 1, 2, 3]
循環列表,改變列表大小的問題
先不着急,說這個問題,先做一道小題:
有列表l1, l1 = [11, 22, 33, 44, 55],請把索引為奇數對應的元素刪除(不能一個一個刪除,此l1只是舉個例子,里面的元素不定)。
有人說這個還不簡單么?我循環列表,然后進行判斷,只要他的索引為奇數,我就刪除。OK,你可以照着這個思路去做。
那么根據題意,這個題最終的結果應該是:l1 = [11, 33, 55],但是你得到的結果卻是: l1 = [11, 33, 44] 為什么不對呢???
用這個進行舉例:當你循環到22時,你將列表中的22刪除了,但是你帶來的影響是:33,44,55都會往前進一位,他們的索引由原來的2,3,4變成了1,2,3 所以你在往下進行循環時,就會發現,額........完全不對了。
那這個怎么解決呢?有三種解決方式:

想看? 門都沒有,課上聽吧。

你覺得我會寫答案么?

哈哈,就不告訴你。
所以,我們要總結一下:
在循環一個列表時的過程中,如果你要改變列表的大小(增加值,或者刪除值),那么結果很可能會出錯或者報錯。
1.4 dict
首先是字典的增刪改查有幾個方法需要給大家講解一下:

#popitem 3.5版本之前,popitem為隨機刪除,3.6之后為刪除最后一個,有返回值 dic = {'name': '太白', 'age': 18} ret = dic.popitem() print(ret,dic) # ('age', 18) {'name': '太白'} # update dic = {'name': '太白', 'age': 18} dic.update(sex='男', height=175) print(dic) # {'name': '太白', 'age': 18, 'sex': '男', 'height': 175} dic = {'name': '太白', 'age': 18} dic.update([(1, 'a'),(2, 'b'),(3, 'c'),(4, 'd')]) print(dic) # {'name': '太白', 'age': 18, 1: 'a', 2: 'b', 3: 'c', 4: 'd'} dic1 = {"name":"jin","age":18,"sex":"male"} dic2 = {"name":"alex","weight":75} dic1.update(dic2) print(dic1) # {'name': 'alex', 'age': 18, 'sex': 'male', 'weight': 75} print(dic2) # {'name': 'alex', 'weight': 75}
fromkeys:創建一個字典:字典的所有鍵來自一個可迭代對象,字典的值使用同一個值。

# dic = dict.fromkeys('abcd','太白') # print(dic) # {'a': '太白', 'b': '太白', 'c': '太白', 'd': '太白'} # # dic = dict.fromkeys([1, 2, 3],'太白') # print(dic) # {1: '太白', 2: '太白', 3: '太白'} # 這里有一個坑,就是如果通過fromkeys得到的字典的值為可變的數據類型,那么你的小心了。 dic = dict.fromkeys([1, 2, 3], []) dic[1].append(666) print(id(dic[1]),id(dic[2]),id(dic[3])) # {1: [666], 2: [666], 3: [666]} print(dic) # {1: [666], 2: [666], 3: [666]}
循環字典,改變字典大小的問題
來,先來研究一個小題,有如下字典:
dic = {'k1':'太白','k2':'barry','k3': '白白', 'age': 18} 請將字典中所有鍵帶k元素的鍵值對刪除。那么拿到這個題,有人說我一個一個刪除,這是不行的,因為這個字典只是舉個例子,里面的元素不確定,所以你要怎么樣?你要遍歷所有的鍵,符合的刪除,對吧? 嗯,終於上套了,哦不,上道了,請開始你的表演。
dic = {'k1':'太白','k2':'barry','k3': '白白', 'age': 18} for i in dic: if 'k' in i: del dic[i] print(dic) 你會發現,報錯了。。。。。 錯誤原因: RuntimeError: dictionary changed size during iteration 翻譯過來是:字典在循環迭代時,改變了大小。
這是什么意思? 他的意思很簡單,你的字典在循環時,不要改變字典的大小,只要改變大小,就會報錯!那么怎么解決???

哈哈哈哈哈,天真。
所以說,他和列表差不多,只不過比列表更暴力一些,對其進行總結就是:
在循環一個字典的過程中,不要改變字典的大小(增,刪字典的元素),這樣會直接報錯。
二. 數據類型間的轉換問題
咱們現在學過的數據類型有:int bool str list tuple dict set ,這些數據類型之間都存在着相互轉換的問題,有些轉換是非常重要的,那么有些轉換則基本不用,那么接下來我們學習一下比較重要的數據的轉換問題。
int bool str 三者轉換

# int ---> bool i = 100 print(bool(i)) # True # 非零即True i1 = 0 print(bool(i1)) # False 零即False # bool ---> int t = True print(int(t)) # 1 True --> 1 t = False print(int(t)) # 0 False --> 0 # int ---> str i1 = 100 print(str(i1)) # '100' # str ---> int # 全部由數字組成的字符串才可以轉化成數字 s1 = '90' print(int(s1)) # 90 # str ---> bool s1 = '太白' s2 = '' print(bool(s1)) # True 非空即True print(bool(s2)) # False # bool ---> str t1 = True print(str(True)) # 'True'
str list 兩者轉換

# str ---> list s1 = 'alex 太白 武大' print(s1.split()) # ['alex', '太白', '武大'] # list ---> str # 前提 list 里面所有的元素必須是字符串類型才可以 l1 = ['alex', '太白', '武大'] print(' '.join(l1)) # 'alex 太白 武大'
list set 兩者轉換

# list ---> set s1 = [1, 2, 3] print(set(s1)) # set ---> list set1 = {1, 2, 3, 3,} print(list(set1)) # [1, 2, 3]
str bytes 兩者轉換

# str ---> bytes s1 = '太白' print(s1.encode('utf-8')) # b'\xe5\xa4\xaa\xe7\x99\xbd' # bytes ---> str b = b'\xe5\xa4\xaa\xe7\x99\xbd' print(b.decode('utf-8')) # '太白'
所有數據都可以轉化成bool值
轉化成bool值為False的數據類型有: '', 0, (), {}, [], set(), None
剩下的一些數據類型也可以互相轉化,在這里我就不一一介紹了。
三.基礎數據類型的總結
按存儲空間的占用分(從低到高)
數字 字符串 集合:無序,即無序存索引相關信息 元組:有序,需要存索引相關信息,不可變 列表:有序,需要存索引相關信息,可變,需要處理數據的增刪改 字典:有序,需要存key與value映射的相關信息,可變,需要處理數據的增刪改(3.6之后有序)
按存值個數區分
標量/原子類型 | 數字,字符串 |
容器類型 | 列表,元組,字典 |
按可變不可變區分
可變 | 列表,字典 |
不可變 | 數字,字符串,元組,布爾值 |
按訪問順序區分
直接訪問 | 數字 |
順序訪問(序列類型) | 字符串,列表,元組 |
key值訪問(映射類型) | 字典 |
四. 編碼的進階
前兩天咱們已經講了編碼,我相信大家對編碼有一定的了解了,那么,咱們先回顧一下:
首先來說,編碼即是密碼本,編碼記錄的就是二進制與文字之間的對應關系,現存的編碼本有:
ASCII碼:包含英文字母,數字,特殊字符與01010101對應關系。
a 01000001 一個字符一個字節表示。
GBK:只包含本國文字(以及英文字母,數字,特殊字符)與0101010對應關系。
a 01000001 ascii碼中的字符:一個字符一個字節表示。
中 01001001 01000010 中文:一個字符兩個字節表示。
Unicode:包含全世界所有的文字與二進制0101001的對應關系。
a 01000001 01000010 01000011 00000001
b 01000001 01000010 01100011 00000001
中 01001001 01000010 01100011 00000001
UTF-8:包含全世界所有的文字與二進制0101001的對應關系(最少用8位一個字節表示一個字符)。
a 01000001 ascii碼中的字符:一個字符一個字節表示。
To 01000001 01000010 (歐洲文字:葡萄牙,西班牙等)一個字符兩個字節表示。
中 01001001 01000010 01100011 亞洲文字;一個字符三個字節表示。
簡單回顧完編碼之后,再給大家普及一些知識點:
1. 在計算機內存中,統一使用Unicode編碼,當需要將數據保存到硬盤或者需要網絡傳輸的時候,就轉換為非Unicode編碼比如:UTF-8編碼。
其實這個不用深入理解,他就是規定,舉個例子:用文件編輯器(word,wps,等)編輯文件的時候,從文件將你的數據(此時你的數據是非Unicode(可能是UTF-8,也可能是gbk,這個編碼取決於你的編輯器設置))字符被轉換為Unicode字符讀到內存里,進行相應的編輯,編輯完成后,保存的時候再把Unicode轉換為非Unicode(UTF-8,GBK 等)保存到文件。
2. 不同編碼之間,不能直接互相識別。
比如你的一個數據:‘老鐵沒毛病’是以utf-8的編碼方式編碼並發送給一個朋友,那么你發送的肯定是通過utf-8的編碼轉化成的二進制01010101,那么你的朋友接收到你發的這個數據,他如果想查看這個數據必須將01010101轉化成漢字,才可以查看,那么此時那也必須通過utf-8編碼反轉回去,如果要是通過gbk編碼反轉,那么這個內容可能會出現亂碼或者報錯。
那么了解完這兩點之后,咱們開始進入編碼進階的最重要的內容。
前提條件:python3x版本(python2x版本與這個不同)。
主要用途:數據的存儲或者傳輸。
剛才咱們也說過了,在計算機內存中,統一使用Unicode編碼,當需要將數據保存到硬盤或者需要網絡傳輸的時候,就轉換為非Unicode編碼比如:UTF-8編碼。
咱們就以網絡傳輸為例:
好那么接下來咱們繼續討論,首先先聲明一個知識點就是這里所說的'數據',這個數據,其實准確的說是以字符串(特殊的字符串)類型的數據。那么有同學就會問到,python中的數據類型很多,int bool list dict str等等,如果我想將一個列表數據通過網絡傳輸給小明同學,不行么? 確切的說不行,你必須將這個列表轉化成一個特殊的字符串類型,然后才可以傳輸出去,數據的存儲也是如此。
那么你就清楚一些了,你想通過存儲或者網絡傳輸的數據是一個特殊的字符串類型,那么我就直接將這個字符串傳出去不就行了么?比如我這有一個數據:'今晚10點吃雞,大吉大利' 這不就是字符串類型么?我直接將這個數據通過網絡發送給小明不就可以了么?不行。這里你還沒有看清一個問題,就是特殊的字符串。為什么?
那么這個解決方式是什么呢?
那么這個bytes類型是個什么類型呢?其實他也是Python基礎數據類型之一:bytes類型。
這個bytes類型與字符串類型,幾乎一模一樣,可以看看bytes類型的源碼,bytes類型可以用的操作方法與str相差無幾.

class bytes(object): """ bytes(iterable_of_ints) -> bytes bytes(string, encoding[, errors]) -> bytes bytes(bytes_or_buffer) -> immutable copy of bytes_or_buffer bytes(int) -> bytes object of size given by the parameter initialized with null bytes bytes() -> empty bytes object Construct an immutable array of bytes from: - an iterable yielding integers in range(256) - a text string encoded using the specified encoding - any object implementing the buffer API. - an integer """ def capitalize(self): # real signature unknown; restored from __doc__ """ B.capitalize() -> copy of B Return a copy of B with only its first character capitalized (ASCII) and the rest lower-cased. """ pass def center(self, width, fillchar=None): # real signature unknown; restored from __doc__ """ B.center(width[, fillchar]) -> copy of B Return B centered in a string of length width. Padding is done using the specified fill character (default is a space). """ pass def count(self, sub, start=None, end=None): # real signature unknown; restored from __doc__ """ B.count(sub[, start[, end]]) -> int Return the number of non-overlapping occurrences of subsection sub in bytes B[start:end]. Optional arguments start and end are interpreted as in slice notation. """ return 0 def decode(self, *args, **kwargs): # real signature unknown """ Decode the bytes using the codec registered for encoding. encoding The encoding with which to decode the bytes. errors The error handling scheme to use for the handling of decoding errors. The default is 'strict' meaning that decoding errors raise a UnicodeDecodeError. Other possible values are 'ignore' and 'replace' as well as any other name registered with codecs.register_error that can handle UnicodeDecodeErrors. """ pass def endswith(self, suffix, start=None, end=None): # real signature unknown; restored from __doc__ """ B.endswith(suffix[, start[, end]]) -> bool Return True if B ends with the specified suffix, False otherwise. With optional start, test B beginning at that position. With optional end, stop comparing B at that position. suffix can also be a tuple of bytes to try. """ return False def expandtabs(self, tabsize=8): # real signature unknown; restored from __doc__ """ B.expandtabs(tabsize=8) -> copy of B Return a copy of B where all tab characters are expanded using spaces. If tabsize is not given, a tab size of 8 characters is assumed. """ pass def find(self, sub, start=None, end=None): # real signature unknown; restored from __doc__ """ B.find(sub[, start[, end]]) -> int Return the lowest index in B where subsection sub is found, such that sub is contained within B[start,end]. Optional arguments start and end are interpreted as in slice notation. Return -1 on failure. """ return 0 @classmethod # known case def fromhex(cls, *args, **kwargs): # real signature unknown; NOTE: unreliably restored from __doc__ """ Create a bytes object from a string of hexadecimal numbers. Spaces between two numbers are accepted. Example: bytes.fromhex('B9 01EF') -> b'\\xb9\\x01\\xef'. """ pass def hex(self): # real signature unknown; restored from __doc__ """ B.hex() -> string Create a string of hexadecimal numbers from a bytes object. Example: b'\xb9\x01\xef'.hex() -> 'b901ef'. """ return "" def index(self, sub, start=None, end=None): # real signature unknown; restored from __doc__ """ B.index(sub[, start[, end]]) -> int Return the lowest index in B where subsection sub is found, such that sub is contained within B[start,end]. Optional arguments start and end are interpreted as in slice notation. Raises ValueError when the subsection is not found. """ return 0 def isalnum(self): # real signature unknown; restored from __doc__ """ B.isalnum() -> bool Return True if all characters in B are alphanumeric and there is at least one character in B, False otherwise. """ return False def isalpha(self): # real signature unknown; restored from __doc__ """ B.isalpha() -> bool Return True if all characters in B are alphabetic and there is at least one character in B, False otherwise. """ return False def isdigit(self): # real signature unknown; restored from __doc__ """ B.isdigit() -> bool Return True if all characters in B are digits and there is at least one character in B, False otherwise. """ return False def islower(self): # real signature unknown; restored from __doc__ """ B.islower() -> bool Return True if all cased characters in B are lowercase and there is at least one cased character in B, False otherwise. """ return False def isspace(self): # real signature unknown; restored from __doc__ """ B.isspace() -> bool Return True if all characters in B are whitespace and there is at least one character in B, False otherwise. """ return False def istitle(self): # real signature unknown; restored from __doc__ """ B.istitle() -> bool Return True if B is a titlecased string and there is at least one character in B, i.e. uppercase characters may only follow uncased characters and lowercase characters only cased ones. Return False otherwise. """ return False def isupper(self): # real signature unknown; restored from __doc__ """ B.isupper() -> bool Return True if all cased characters in B are uppercase and there is at least one cased character in B, False otherwise. """ return False def join(self, *args, **kwargs): # real signature unknown; NOTE: unreliably restored from __doc__ """ Concatenate any number of bytes objects. The bytes whose method is called is inserted in between each pair. The result is returned as a new bytes object. Example: b'.'.join([b'ab', b'pq', b'rs']) -> b'ab.pq.rs'. """ pass def ljust(self, width, fillchar=None): # real signature unknown; restored from __doc__ """ B.ljust(width[, fillchar]) -> copy of B Return B left justified in a string of length width. Padding is done using the specified fill character (default is a space). """ pass def lower(self): # real signature unknown; restored from __doc__ """ B.lower() -> copy of B Return a copy of B with all ASCII characters converted to lowercase. """ pass def lstrip(self, *args, **kwargs): # real signature unknown """ Strip leading bytes contained in the argument. If the argument is omitted or None, strip leading ASCII whitespace. """ pass @staticmethod # known case def maketrans(*args, **kwargs): # real signature unknown """ Return a translation table useable for the bytes or bytearray translate method. The returned table will be one where each byte in frm is mapped to the byte at the same position in to. The bytes objects frm and to must be of the same length. """ pass def partition(self, *args, **kwargs): # real signature unknown """ Partition the bytes into three parts using the given separator. This will search for the separator sep in the bytes. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it. If the separator is not found, returns a 3-tuple containing the original bytes object and two empty bytes objects. """ pass def replace(self, *args, **kwargs): # real signature unknown """ Return a copy with all occurrences of substring old replaced by new. count Maximum number of occurrences to replace. -1 (the default value) means replace all occurrences. If the optional argument count is given, only the first count occurrences are replaced. """ pass def rfind(self, sub, start=None, end=None): # real signature unknown; restored from __doc__ """ B.rfind(sub[, start[, end]]) -> int Return the highest index in B where subsection sub is found, such that sub is contained within B[start,end]. Optional arguments start and end are interpreted as in slice notation. Return -1 on failure. """ return 0 def rindex(self, sub, start=None, end=None): # real signature unknown; restored from __doc__ """ B.rindex(sub[, start[, end]]) -> int Return the highest index in B where subsection sub is found, such that sub is contained within B[start,end]. Optional arguments start and end are interpreted as in slice notation. Raise ValueError when the subsection is not found. """ return 0 def rjust(self, width, fillchar=None): # real signature unknown; restored from __doc__ """ B.rjust(width[, fillchar]) -> copy of B Return B right justified in a string of length width. Padding is done using the specified fill character (default is a space) """ pass def rpartition(self, *args, **kwargs): # real signature unknown """ Partition the bytes into three parts using the given separator. This will search for the separator sep in the bytes, starting and the end. If the separator is found, returns a 3-tuple containing the part before the separator, the separator itself, and the part after it. If the separator is not found, returns a 3-tuple containing two empty bytes objects and the original bytes object. """ pass def rsplit(self, *args, **kwargs): # real signature unknown """ Return a list of the sections in the bytes, using sep as the delimiter. sep The delimiter according which to split the bytes. None (the default value) means split on ASCII whitespace characters (space, tab, return, newline, formfeed, vertical tab). maxsplit Maximum number of splits to do. -1 (the default value) means no limit. Splitting is done starting at the end of the bytes and working to the front. """ pass def rstrip(self, *args, **kwargs): # real signature unknown """ Strip trailing bytes contained in the argument. If the argument is omitted or None, strip trailing ASCII whitespace. """ pass def split(self, *args, **kwargs): # real signature unknown """ Return a list of the sections in the bytes, using sep as the delimiter. sep The delimiter according which to split the bytes. None (the default value) means split on ASCII whitespace characters (space, tab, return, newline, formfeed, vertical tab). maxsplit Maximum number of splits to do. -1 (the default value) means no limit. """ pass def splitlines(self, *args, **kwargs): # real signature unknown """ Return a list of the lines in the bytes, breaking at line boundaries. Line breaks are not included in the resulting list unless keepends is given and true. """ pass def startswith(self, prefix, start=None, end=None): # real signature unknown; restored from __doc__ """ B.startswith(prefix[, start[, end]]) -> bool Return True if B starts with the specified prefix, False otherwise. With optional start, test B beginning at that position. With optional end, stop comparing B at that position. prefix can also be a tuple of bytes to try. """ return False def strip(self, *args, **kwargs): # real signature unknown """ Strip leading and trailing bytes contained in the argument. If the argument is omitted or None, strip leading and trailing ASCII whitespace. """ pass def swapcase(self): # real signature unknown; restored from __doc__ """ B.swapcase() -> copy of B Return a copy of B with uppercase ASCII characters converted to lowercase ASCII and vice versa. """ pass def title(self): # real signature unknown; restored from __doc__ """ B.title() -> copy of B Return a titlecased version of B, i.e. ASCII words start with uppercase characters, all remaining cased characters have lowercase. """ pass def translate(self, *args, **kwargs): # real signature unknown """ Return a copy with each character mapped by the given translation table. table Translation table, which must be a bytes object of length 256. All characters occurring in the optional argument delete are removed. The remaining characters are mapped through the given translation table. """ pass def upper(self): # real signature unknown; restored from __doc__ """ B.upper() -> copy of B Return a copy of B with all ASCII characters converted to uppercase. """ pass def zfill(self, width): # real signature unknown; restored from __doc__ """ B.zfill(width) -> copy of B Pad a numeric string B with zeros on the left, to fill a field of the specified width. B is never truncated. """ pass def __add__(self, *args, **kwargs): # real signature unknown """ Return self+value. """ pass def __contains__(self, *args, **kwargs): # real signature unknown """ Return key in self. """ pass def __eq__(self, *args, **kwargs): # real signature unknown """ Return self==value. """ pass def __getattribute__(self, *args, **kwargs): # real signature unknown """ Return getattr(self, name). """ pass def __getitem__(self, *args, **kwargs): # real signature unknown """ Return self[key]. """ pass def __getnewargs__(self, *args, **kwargs): # real signature unknown pass def __ge__(self, *args, **kwargs): # real signature unknown """ Return self>=value. """ pass def __gt__(self, *args, **kwargs): # real signature unknown """ Return self>value. """ pass def __hash__(self, *args, **kwargs): # real signature unknown """ Return hash(self). """ pass def __init__(self, value=b'', encoding=None, errors='strict'): # known special case of bytes.__init__ """ bytes(iterable_of_ints) -> bytes bytes(string, encoding[, errors]) -> bytes bytes(bytes_or_buffer) -> immutable copy of bytes_or_buffer bytes(int) -> bytes object of size given by the parameter initialized with null bytes bytes() -> empty bytes object Construct an immutable array of bytes from: - an iterable yielding integers in range(256) - a text string encoded using the specified encoding - any object implementing the buffer API. - an integer # (copied from class doc) """ pass def __iter__(self, *args, **kwargs): # real signature unknown """ Implement iter(self). """ pass def __len__(self, *args, **kwargs): # real signature unknown """ Return len(self). """ pass def __le__(self, *args, **kwargs): # real signature unknown """ Return self<=value. """ pass def __lt__(self, *args, **kwargs): # real signature unknown """ Return self<value. """ pass def __mod__(self, *args, **kwargs): # real signature unknown """ Return self%value. """ pass def __mul__(self, *args, **kwargs): # real signature unknown """ Return self*value.n """ pass @staticmethod # known case of __new__ def __new__(*args, **kwargs): # real signature unknown """ Create and return a new object. See help(type) for accurate signature. """ pass def __ne__(self, *args, **kwargs): # real signature unknown """ Return self!=value. """ pass def __repr__(self, *args, **kwargs): # real signature unknown """ Return repr(self). """ pass def __rmod__(self, *args, **kwargs): # real signature unknown """ Return value%self. """ pass def __rmul__(self, *args, **kwargs): # real signature unknown """ Return self*value. """ pass def __str__(self, *args, **kwargs): # real signature unknown """ Return str(self). """ pass
那么str與bytes類型到底有什么區別和聯系呢,接下來咱們以表格的形式給你做對比。
類名 | str類型 | bytes類型 | 標注 |
名稱 | str,字符串,文本文字 | bytes,字節文字 | 不同,可以通過文本文字或者字節文字加以區分 |
組成單位 | 字符 | 字節 | 不同 |
組成形式 | '' 或者 "" 或者 ''' ''' 或者 """ """ | b'' 或者 b"" 或者 b''' ''' 或者 b""" """ | 不同,bytes類型就是在引號前面+b(B)大小寫都可以 |
表現形式 | 英文: 'alex' 中文: '中國' |
英文:b'alex' 中文:b'\xe4\xb8\xad\xe5\x9b\xbd' |
字節文字對於ascii中的元素是可以直接顯示的, 但是非ascii碼中的元素是以十六進制的形式表示的,不易看出。 |
編碼方式 | Unicode | 可指定編碼(除Unicode之外)比如UTF-8,GBK 等 | 不同 |
相應功能 | upper lower spllit 等等 | upper lower spllit 等等 | 幾乎相同 |
轉譯 | 可在最前面加r進行轉譯 | 可在最前面加r進行轉譯 | 相同 |
重要用途 | python基礎數據類型,用於存儲少量的常用的數據 | 負責以二進制字節序列的形式記錄所需記錄的對象, 至於該對象到底表示什么(比如到底是什么字符) 則由相應的編碼格式解碼所決定。 Python3中,bytes通常用於網絡數據傳輸、 二進制圖片和文件的保存等等 |
bytes就是用於數據存儲和網絡傳輸數據 |
更多 | ...... | ...... |
那么上面寫了這么多,咱們不用全部記住,對於某些知識點了解一下即可,但是對於有些知識點是需要大家理解的:
bytes類型也稱作字節文本,他的主要用途就是網絡的數據傳輸,與數據存儲。那么有些同學肯定問,bytes類型既然與str差不多,而且操作方法也很相似,就是在字符串前面加個b不就行了,python為什么還要這兩個數據類型呢?我只用bytes不行么?
如果你只用bytes開發,不方便。因為對於非ascii碼里面的文字來說,bytes只是顯示的是16進制。很不方便。
s1 = '中國' b1 = b'\xe4\xb8\xad\xe5\x9b\xbd' # utf-8 的編碼
好,上面咱們對於bytes類型應該有了一個大致的了解,對str 與 bytes的對比也是有了對比的了解,那么咱們最終要解決的問題,現在可以解決了,那就是str與bytes類型的轉換的問題。
如果你的str數據想要存儲到文件或者傳輸出去,那么直接是不可以的,上面我們已經圖示了,我們要將str數據轉化成bytes數據就可以了。
str ----> bytes
# encode稱作編碼:將 str 轉化成 bytes類型 s1 = '中國' b1 = s1.encode('utf-8') # 轉化成utf-8的bytes類型 print(s1) # 中國 print(b1) # b'\xe4\xb8\xad\xe5\x9b\xbd' s1 = '中國' b1 = s1.encode('gbk') # 轉化成gbk的bytes類型 print(s1) # 中國 print(b1) # b'\xd6\xd0\xb9\xfa'
bytes ---> str
# decode稱作解碼, 將 bytes 轉化成 str類型 b1 = b'\xe4\xb8\xad\xe5\x9b\xbd' s1 = b1.decode('utf-8') print(s1) # 中國
那么這里還有一個最重要的,也是你們以后工作中經常遇到的讓人頭疼的問題,就是gbk編碼的數據,轉化成utf-8編碼的數據。有人說老師,我怎么有點蒙呢?這是什么? 來,捋一下,bytes類型他叫字節文本,他的編碼方式是非Unicode的編碼,非Unicode即可以是gbk,可以是UTF-8,可以是GB2312.....
b1 = b'\xe4\xb8\xad\xe5\x9b\xbd' # 這是utf-8編碼bytes類型的中國 b2 = b'\xd6\xd0\xb9\xfa' # 這是gbk編碼bytes類型的中國
那么gbk編碼的bytes如何轉化成utf-8編碼的bytes呢?
不同編碼之間,不能直接互相識別。
上面我說了,不同編碼之間是不能直接互相是別的,這里說了不能直接,那就可以間接,如何間接呢? 現存世上的所有的編碼都和誰有關系呢? 都和萬國碼Unicode有關系,所以需要借助Unicode進行轉換。
看下面的圖就行了!