原文鏈接:http://tecdat.cn/?p=3014
前言
預測是通過基於來自過去和當前狀態的信息來對將要發生的事情做出聲明。
每個人每天都以不同程度的成功解決預測問題。例如,需要預測天氣,收獲,能源消耗,外匯(外匯)貨幣對或股票,地震和許多其他東西的變動。...
預測分析
通過分類,深度學習能夠在例如圖像中的像素和人的名稱之間建立相關性。你可以稱之為靜態預測。出於同樣的原因,暴露於足夠的正確數據,深度學習能夠建立當前事件和未來事件之間的相關性。從某種意義上說,未來的事件就像標簽一樣。深度學習並不一定關心時間,或者事情尚未發生。給定時間序列,深度學習可以讀取一串數字並預測下一個最可能發生的數字。
數據樣例:
2011001;3;9;20;24;26;32;10 2011002;6;8;12;17;28;33;5 2011003;13;14;21;22;23;27;4 2011004;4;6;8;10;13;26;5 2011005;6;9;12;14;20;22;13 2011006;1;3;5;13;16;18;5 2011007;1;9;17;24;26;31;5 2011008;10;12;13;17;24;31;15 2011009;17;18;23;24;25;26;4 2011010;1;4;5;9;15;19;13 2011011;1;12;18;19;21;24;10 2011012;7;8;11;13;15;26;13 2011013;1;3;13;16;21;22;8 2011014;5;7;10;11;23;26;1
截屏
import random for x in range(0,6):#NUM_OF_RED=6 choice_num_red = random.choice( redBalls ) print( choice_num_red ) redBalls.remove(choice_num_red) for y in range(0,1):#NUM_OF_BLUE=1 choice_num_blue = random.choice( blueBalls ) print( choice_num_blue ) #scipy test code #matplotlib test print(pylab.plot(abs(b))) #show() #from matplotlib.mlab import normpdf #import matplotlib.numerix as nx #import pylab as p # #x = nx.arange(-4, 4, 0.01) #y = normpdf(x, 0, 1) # unit normal #p.plot(x,y, color='red', lw=2) #p.show()
plt.plot(dfs_blue_balls_count_values,'x',label='Dot plot') plt.legend() plt.ylabel('Y-axis,number of blue balls') plt.xlabel('X-axis,number of duplication') plt.show() #Jitter plot idx_min = min(dfs_blue_balls_count_values) idx_max = max(dfs_blue_balls_count_values) idx_len = idx_max-idx_min print("min:",idx_min,"max:",idx_max) num_jitter = 0 samplers = random.sample(range(idx_min,idx_max),idx_len) while num_jitter < 5: samplers += random.sample(range(idx_min,idx_max),idx_len) num_jitter += 1 ##lots of jitter effect print("samplers:",samplers) #plt.plot(samplers,'ro',label='Jitter plot') #plt.ylabel('Y-axis,number of blue balls') #plt.xlabel('X-axis,number of duplication') #plt.legend() #plt.show() #Histograms and Kernel Density Estimates: #Scott rule, #This rule assumes that the data follows a Gaussian distribution; #Plotting the blue balls appear frequency histograms(x-axis:frequency,y-axis:VIPs) ##@see http://pandas.pydata.org/pandas-docs/dev/basics.html#value-counts-histogramming num_of_bin = len(series_blue_balls_value_counts) array_of_ball_names = series_blue_balls_value_counts.keys() print("Blue ball names:",array_of_ball_names) list_merged_by_ball_id = [] for x in xrange(0,num_of_bin): num_index = x+1.5 list_merged_by_ball_id += [num_index]*dfs_blue_balls_count_values[x] print("list_merged_by_ball_id:",list_merged_by_ball_id) ##Histograms plotting plt.hist(list_merged_by_ball_id, bins=num_of_bin) plt.legend() plt.xlabel('Histograms,number of appear time by blue ball number') plt.ylabel('Histograms,counter of appear time by blue ball number') plt.show() ###Gaussian_KDE ##CDF(The Cumulative Distribution Function from scipy.stats import cumfreq idx_max = max(dfs_blue_balls_count_values) hi = idx_max a = numpy.arange(hi) ** 2 # for nbins in ( 2, 20, 100 ): for nbins in dfs_blue_balls_count_values: cf = cumfreq(a, nbins) # bin values, lowerlimit, binsize, extrapoints w = hi / nbins x = numpy.linspace( w/2, hi - w/2, nbins ) # care # print x, cf plt.plot( x, cf[0], label=str(nbins) ) plt.legend() plt.xlabel('CDF,number of appear time by blue ball number') plt.ylabel('CDF,counter of appear time by blue ball number') plt.show() ###Optional: Comparing Distributions with Probability Plots and QQ Plots ###Quantile plot of the server data. A quantile plot is a graph of the CDF with the x and y axes interchanged. ###Probability plot for the data set shown,a standard normal distribution: ###@see: http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.probplot.html import scipy.stats as stats prob_measurements = numpy.random.normal(loc = 20, scale = 5, size=num_of_bin) stats.probplot(prob_measurements, dist="norm", plot=plt) plt.show()
路線圖:
階段I.圖形:看數據;
1.單變量:形狀和分布; (點/抖動圖,直方圖和核密度估計,累積分布函數,秩序...)
2.兩個變量:建立關系; (散點圖,征服噪聲,對數圖,銀行......)
3.時間變量:時間序列分析; (平滑,關聯,過濾器,卷積..)
4.兩個以上的變量;圖形多變量分析;(假彩色圖,多圖......)
5.Intermezzo:一個數據分析會議;(Session,gnuplot ..)
6 ...
階段II.Analytics:建模數據;
1.評估和信封背面;
2.縮放參數的模型;
3.概率模型的分析;
4 ...
第三階段。計算:挖掘數據;
1.Simulations;
2.尋找集群;
3.在森林中尋找決策樹;
4 ....
第四階段。應用:使用數據;
1.報告,BI(商業智能),儀表板;
2.財務計算和建模;
3.預測分析;
4 ....
=======
參考
時間序列預測的TensorFlow教程:https://github.com/tgjeon/TensorFlow-Tutorials-for-Time-Series
參考文獻:
http://deeplearning4j.org/usingrnns.html
http://www.scriptol.com/programming/list-algorithms.php
http://www.ipedr.com/vol25/54-ICEME2011-N20032.pdf
http://www.brightpointinc.com/flexdemos/chartslicer/chartslicersample.html
http://blog.lookbackon.com/?page_id=2506
http://stats.stackexchange.com/questions/68662/using-deep-learning-for-time-series-prediction