SKlearning大部分的输入数据都是M * N数组.
然而我们从数据库或文件读取得来的通常是Python内定的类型tuple或list
它们的优势就不说了,但是直接把list或tuple构成的二维数组传入scikit是会出问题的.
如:
1
2
|
DeprecationWarning:
Passing 1d arrays as data is deprecated in 0.17
and will raise ValueError in 0.19.
Reshape your data either using X.reshape(-1,
1) if your
data has a single feature or X.reshape(1, -1) if it
contains a single sample. DeprecationWarning) |
下面贴上如何把list/tuple转为scikit使用的array
首先, 准备数据如下:
读取一行数据变为一维数组
conn = sql.connect('result_sale.db') conn.text_factory = str dataSet = conn.execute('select * from sampleData') tpRows = dataSet.fetchone() conn.close() print type(tpRows) print tpRows lstRows = list(tpRows) aryRows1 = np.array(lstRows) # 转成数组 #aryRows2 = np.array(lstRows).reshape(1, -1) # 转成1行N列 (二维数组) #aryRows3 = np.array(lstRows).reshape(-1, 1) # 转成N行1列 (二维数组) print lstRows print aryRows1
输入如下: 请留意输入的不同点 :)
1
2
3
4
5
|
( '00' , '01' , '02' , '03' , '04' , '05' , '06' , '07' , '08' )
(tuple) [ '00' , '01' , '02' , '03' , '04' , '05' , '06' , '07' , '08' ]
(list) [ '00' '01' '02' '03' '04' '05' '06' '07' '08' ]
(array) Process
finished with exit code 0 |
一次性转换整个数据集
conn = sql.connect('result_sale.db') conn.text_factory = str dataSet = conn.execute('select * from sampleData') tpRows = dataSet.fetchall() conn.close() aryRows1 = np.array(tpRows) # 转成数组 #aryRows2 = np.array(tpRows).reshape(1, -1) # 转成1行N列 (二维数组) #aryRows3 = np.array(tpRows).reshape(-1, 1) # 转成N行1列 (二维数组) print aryRows1 #print aryRows2 #print aryRows3
输入如下:
1
2
3
4
5
6
7
8
9
10
11
|
[[ '00' '01' '02' '03' '04' '05' '06' '07' '08' ] [ '10' '11' '12' '13' '14' '15' '16' '17' '18' ] [ '20' '21' '22' '23' '24' '25' '26' '27' '28' ] [ '30' '31' '32' '33' '34' '35' '36' '37' '38' ] [ '40' '41' '42' '43' '44' '45' '46' '47' '48' ] [ '50' '51' '52' '53' '54' '55' '56' '57' '58' ] [ '60' '61' '62' '63' '64' '65' '66' '67' '68' ] [ '70' '71' '72' '73' '74' '75' '76' '77' '78' ] [ '80' '81' '82' '83' '84' '85' '86' '87' '88' ]] Process
finished with exit code 0 |
逐条纪录转换, 可以用下标来引用数组
conn = sql.connect('result_sale.db') conn.text_factory = str dataSet = conn.execute('select * from sampleData') tpRows = dataSet.fetchall() conn.close() #aryRows = np.zeros([len(tpRows), len(tpRows[0])]) aryRows = np.ones_like(tpRows) #亦可使用 empty, empty_like, zeros, zeros_like 等方法 j=0 for row in tpRows: aryRows[j][:] = row j += 1 print aryRows
输入如下:
1
2
3
4
5
6
7
8
9
10
11
|
[[ '00' '01' '02' '03' '04' '05' '06' '07' '08' ] [ '10' '11' '12' '13' '14' '15' '16' '17' '18' ] [ '20' '21' '22' '23' '24' '25' '26' '27' '28' ] [ '30' '31' '32' '33' '34' '35' '36' '37' '38' ] [ '40' '41' '42' '43' '44' '45' '46' '47' '48' ] [ '50' '51' '52' '53' '54' '55' '56' '57' '58' ] [ '60' '61' '62' '63' '64' '65' '66' '67' '68' ] [ '70' '71' '72' '73' '74' '75' '76' '77' '78' ] [ '80' '81' '82' '83' '84' '85' '86' '87' '88' ]] Process
finished with exit code 0 |