將數組,矩陣存入csv文件中


我們在做各種模型訓練時,往往會先將數據處理成矩陣,然后交給建模的人去訓練。這時通常數據清洗者提交的是保存了矩陣的文件,一般為TXT或csv,接下來主要講解我在實現這個過程中遇到的一些問題。

import numpy
numpy.savetxt('new.csv', my_matrix, delimiter=',')

看上面這段d代碼,通常我們會直接使用上面這兩句代碼來進行保存。但通常會遇到下面兩類錯誤:

1. 類型匹配錯誤

如上面截圖。原因是:loadtxt()這個函數本身在默認情況下認為數據元素都是數值型的,所以用%.18e的格式去讀取。所以當遇到str或其他非數值型時就會出錯。這里是的指定數據類型解決的。正確代碼如下:

import numpy
numpy.savetxt('new.csv', my_matrix, fmt='%s', delimiter=',')

2. 存入csv文件時,部分數據丟失

原因,csv默認是用逗號分隔的,所以如果你的數據中含有逗號,就會被它識別,因此在存入之前,應先將逗號進行提換或者加轉義字符。

這里補充一下saveTXT的原型:

numpy.savetxt

numpy. savetxt ( fnameXfmt='%.18e'delimiter=' 'newline='\n'header=''footer=''comments='# ' ) [source]

Save an array to a text file.

Parameters:

fname : filename or file handle

If the filename ends in .gz, the file is automatically saved in compressed gzip format. loadtxt understands gzipped files transparently.

X : array_like

Data to be saved to a text file.

fmt : str or sequence of strs, optional

A single format (%10.5f), a sequence of formats, or a multi-format string, e.g. ‘Iteration %d – %10.5f’, in which case delimiter is ignored. For complex X, the legal options for fmt are:

  1. a single specifier,  fmt=’%.4e’, resulting in numbers formatted

    like ‘ (%s+%sj)’ % (fmt, fmt)

  2. a full string specifying every real and imaginary part, e.g.

    ‘ %.4e %+.4ej %.4e %+.4ej %.4e %+.4ej’ for 3 columns

  3. a list of specifiers, one per column - in this case, the real

    and imaginary part must have separate specifiers, e.g. [‘%.3e + %.3ej’, ‘(%.15e%+.15ej)’] for 2 columns

delimiter : str, optional

String or character separating columns.

newline : str, optional

String or character separating lines.

New in version 1.5.0.

header : str, optional

String that will be written at the beginning of the file.

New in version 1.7.0.

footer : str, optional

String that will be written at the end of the file.

New in version 1.7.0.

comments : str, optional

String that will be prepended to the header and footer strings, to mark them as comments. Default: ‘# ‘, as expected by e.g. numpy.loadtxt.

New in version 1.7.0.

  

Notes

Further explanation of the fmt parameter (%[flag]width[.precision]specifier):

flags:

- : left justify

+ : Forces to precede result with + or -.

0 : Left pad the number with zeros instead of space (see width).

width:
Minimum number of characters to be printed. The value is not truncated if it has more characters.
precision:
  • For integer specifiers (eg. d,i,o,x), the minimum number of digits.
  • For e, E and f specifiers, the number of digits to print after the decimal point.
  • For g and G, the maximum number of significant digits.
  • For s, the maximum number of characters.
specifiers:

c : character

d or i : signed decimal integer

e or E : scientific notation with e or E.

f : decimal floating point

g,G : use the shorter of e,E or f

o : signed octal

s : string of characters

u : unsigned decimal integer

x,X : unsigned hexadecimal integer

This explanation of fmt is not complete, for an exhaustive specification see [R280].


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM