以下是在我學習過程中常用的兩種導入數據的方式
方法一:
c = open('ML2017Data/testTarget.csv',"r")
file = csv.reader(c)
data_set = []
for line in file:
data_set.append(line)
data_set = np.array(data_set)
c.close()
上面程序的效果是將csv文件中的文本按行打印,每一行的元素都是以逗號分隔符’,’分隔得來。line里邊的數據類型是string類型。
把string 數據轉化成float型
c = open('ml-latest-small/ratings.csv','r') file = csv.reader(c) data_set = [] for line in file: #skip the frist line if file.line_num == 1: continue #change the string to float line = list(map(float, line)) data_set.append(line) c.close()
方法二:用numpy讀取文件,首先要導入numpy包
import numpy as np
trainInput_cvs = np.loadtxt('ML2017Data/trainInput.csv',dtype='str')
trainInput = trainInput_cvs.astype('float')
這種方法返回的是一個array類型的數據
方法三: 用pandas 讀取數據
import pandas as pd ratings = pd.read_csv('ml-latest-small/ratings.csv') #change the string to float dataset = ratings.values
