[java] 數據處理



背景:

有一組30天內的溫度與時間的數據,格式如下:

詳細情況:共30天的8k+項數據,每天內有260+項,每個記錄溫度的時間精確到秒

任務就是想根據這樣的數據找到規律,來完成給定具體的時間預測出此時的溫度

處理思路:先把將數據用時序圖表示出來,看看有什么樣的規律

代碼如下:

import java.awt.Font;
import java.io.BufferedReader;
import java.io.FileReader;

import org.jfree.chart.ChartFactory;
import org.jfree.chart.ChartPanel;
import org.jfree.chart.JFreeChart;
import org.jfree.chart.axis.DateAxis;
import org.jfree.chart.axis.ValueAxis;
import org.jfree.chart.plot.XYPlot;
import org.jfree.data.time.Day;
import org.jfree.data.time.Hour;
import org.jfree.data.time.Minute;

import org.jfree.data.time.Second;
import org.jfree.data.time.TimeSeries;
import org.jfree.data.time.TimeSeriesCollection;
import org.jfree.data.xy.XYDataset;


public class TimeSeriesChart {
    ChartPanel frame1;  
    public TimeSeriesChart(){  
        XYDataset xydataset = createDataset();  
        JFreeChart jfreechart = ChartFactory.createTimeSeriesChart("temperature-time", "time", "temperature",xydataset, true, true, true);  
        XYPlot xyplot = (XYPlot) jfreechart.getPlot();  
        DateAxis dateaxis = (DateAxis) xyplot.getDomainAxis();  
        frame1=new ChartPanel(jfreechart,true); 
        
        //水平底部標題
        dateaxis.setLabelFont(new Font("黑體",Font.BOLD,14));
        //垂直標題
        dateaxis.setTickLabelFont(new Font("宋體",Font.BOLD,12));
        //獲取柱狀 
        ValueAxis rangeAxis=xyplot.getRangeAxis(); 
        rangeAxis.setLabelFont(new Font("黑體",Font.BOLD,15));  
        jfreechart.getLegend().setItemFont(new Font("黑體", Font.BOLD, 15));
        //設置標題字體  
        jfreechart.getTitle().setFont(new Font("宋體",Font.BOLD,20));
  
    }   
    private static XYDataset createDataset()
    {
        TimeSeries timeseries = new TimeSeries("溫度隨時間變化圖");
        String temperature = null;
        String time = null;
        try 
        {
            BufferedReader reader = new BufferedReader(new FileReader("C:\\Users\\lichaoxing\\Desktop\\52001848#2018-07-01-00-00-00_2018-07-31-00-00-00.csv"));
            reader.readLine(); 
            String line = null;
            //int i = 0;
            while((line=reader.readLine())!=null)
            {  
                String item[] = line.split(",");//CSV格式文件為逗號分隔符文件,這里根據逗號切分 
                temperature = item[2];//這就是你要的數據了 
                time = item[4];
                double value = Double.parseDouble(temperature);//如果是數值,可以轉化為數值 
                time = time.replace("\"", "");
                String tmp_split1[] = time.split(" ");
                String tmp_split2[] = tmp_split1[0].split("-");
                String tmp_split3[] = tmp_split1[1].split(":");
                    
                //System.out.println(tmp_split1[0]);
                System.out.println(tmp_split1[1]);
                Day day = new Day(Integer.valueOf(tmp_split2[2]), Integer.valueOf(tmp_split2[1]), Integer.valueOf(tmp_split2[0]));
                Hour hour = new Hour(Integer.valueOf(tmp_split3[0]), day);
                Minute minute = new Minute(Integer.valueOf(tmp_split3[1]), hour);
                Second second = new Second( Integer.valueOf(tmp_split3[2]) ,minute);

                timeseries.add(second, value);

                  //if(i++ > 260)
                  //    break;

            }
            reader.close();
        }
        catch(Exception e) 
        {
            e.printStackTrace();
        }
        TimeSeriesCollection timeseriescollection = new TimeSeriesCollection();  
        timeseriescollection.addSeries(timeseries);  

        return timeseriescollection;  
     }  
    public ChartPanel getChartPanel()
    {  
        return frame1;  
              
    }  
}

 

import java.awt.GridLayout;  
import javax.swing.JFrame; 

public class tmp {

    public static void main(String[] args)throws Exception
    {
        
        JFrame frame=new JFrame("統計圖");  
        frame.setLayout(new GridLayout(1,1,10,10)); 
        /*添加折線圖*/  
        frame.add(new TimeSeriesChart().getChartPanel());
        frame.setBounds(50, 50, 800, 600);  
        frame.setVisible(true);           
    }
}

 得到下面的時序圖

分析:除了個別異樣數據點外,看上去十分平滑,但是並不能具體看到每天的狀況,介於每天溫度變化基本一致,於是考慮在代碼while中,添加提前終止條件(上邊注釋的代碼),觀察一天的情況

分析:現在這一天的數據看着就清晰很多了,可以大致認為數據是類正弦的,如果對於精確度要求不高,可以認為它就是一個具有周期的數據

於是考慮將含有一個谷底(極小值)的一段作為周期的一個,可以近似看作是二次函數,那現在就來擬合這個二次函數,擬合采用多項式擬合

方法就是:根據局部極小值連續出現兩次求解周期(這兩次的值及可能不同,不過也無所謂,只是用其來大概計算周期)

代碼如下:

import java.io.BufferedReader;
import java.io.File;
import java.io.FileReader;
import java.text.SimpleDateFormat;
import java.util.Date;

import org.apache.commons.math3.fitting.PolynomialCurveFitter;
import org.apache.commons.math3.fitting.WeightedObservedPoints;

public class predict_temperature 
{
       
    private static String[] observed_data(double flag, BufferedReader reader) throws Exception
    {
        String line = null;
        String[] i_want = new String[4];
        if(flag > 0)
        {
            double tmp = 1000;
            System.out.println(flag);
            while((line=reader.readLine())!=null)
            {  
                String item[] = line.split(","); 
                double value = Double.parseDouble(item[2]);
                if(value <= tmp)
                    tmp = value;
                else 
                {
                    i_want[0] = item[0];
                    i_want[1] = item[4].replace("\"", "");
                    break;
                }
            }
            while((line=reader.readLine())!=null)
            {  
                String item[] = line.split(","); 
                double value = Double.parseDouble(item[2]);
                if(value >= tmp)
                    tmp = value;
                else 
                    break;
            }
            while((line=reader.readLine())!=null)
            {  
                String item[] = line.split(","); 
                double value = Double.parseDouble(item[2]);
                if(value <= tmp)
                    tmp = value;
                else
                {
                    i_want[2] = item[0];
                    i_want[3] = item[4].replace("\"", "");
                    break;
                }
            }
        }
        else
        {
            double tmp = -1000;
            while((line=reader.readLine())!=null)
            {  
                String item[] = line.split(","); 
                double value = Double.parseDouble(item[2]);
                if(value >= tmp)
                    tmp = value;
                else 
                {
                    i_want[0] = item[0];
                    i_want[1] = item[4].replace("\"", "");
                    break;
                }
            }
            while((line=reader.readLine())!=null)
            {  
                String item[] = line.split(","); 
                double value = Double.parseDouble(item[2]);
                if(value <= tmp)
                    tmp = value;
                else 
                    break;
            }
            while((line=reader.readLine())!=null)
            {  
                String item[] = line.split(","); 
                double value = Double.parseDouble(item[2]);
                if(value >= tmp)
                    tmp = value;
                else
                {
                    i_want[2] = item[0];
                    i_want[3] = item[4].replace("\"", "");
                    break;
                }
            }
        }
        return i_want;
        
    }
    
    public static void main(String[] args) throws Exception
    {
        
        WeightedObservedPoints points = new WeightedObservedPoints();

        String input_time = args[1] + " " + args[2];
        File file = new File(args[0]);
        double time_diff = 0;
        
        BufferedReader reader = new BufferedReader(new FileReader(file));
        reader.readLine();
        reader.mark((int)file.length());
        
        /*計算周期*/
        double compare_item1 = Double.parseDouble(reader.readLine().split(",")[2]);
        double compare_item2 = Double.parseDouble(reader.readLine().split(",")[2]);
        String[] cycle_result = new String[4];
        cycle_result = observed_data(compare_item1 - compare_item2, reader);
        int start_num = Integer.parseInt(cycle_result[0]);
        int end_num = Integer.parseInt(cycle_result[2]);
        SimpleDateFormat tmp_day = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
        Date start_now = tmp_day.parse(cycle_result[1]);
        Date end_now = tmp_day.parse(cycle_result[3]);
        /*計算周期*/
        int cycle = end_num - start_num;
        reader.reset();
        SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
        SimpleDateFormat input_time_format = new SimpleDateFormat("yyyy/MM/dd HH:mm:ss");
        Date input_time_ = input_time_format.parse(input_time);
        Date start_time = null;
        int i = 0;
        String line = null;
        String time = null;
        while((line=reader.readLine())!=null)
        {  
            String item[] = line.split(","); 
            time = item[4];
            double value = Double.parseDouble(item[2]); 
            time = time.replace("\"", "");
            Date now = sdf.parse(time);
            if(i == 0)
                start_time = now;
            double offset = (now.getTime() - start_time.getTime());
            points.add(offset, value);
            time_diff = (input_time_.getTime() - start_time.getTime()) % (end_now.getTime() - start_now.getTime());
            if(i++ > cycle)
                break;
                
        }
        PolynomialCurveFitter fitter = PolynomialCurveFitter.create(2);
        double[] result = fitter.fit(points.toList());

        double result_time = result[2] * time_diff * time_diff + result[1]* time_diff + result[0];
        System.out.println(result_time);
        reader.close();
    }
}

這里我要解釋一下  observed_data   方法

由於數據開始不知道是遞增還是遞減,可以先讀取兩個連續的溫度用於判斷此時是增還是減,就是下面這兩行代碼

double compare_item1 = Double.parseDouble(reader.readLine().split(",")[2]);
double compare_item2 = Double.parseDouble(reader.readLine().split(",")[2]);

 

我這里的找周期方法思路很簡單,就是先找到一個局部最低(高)點,記錄此時的序號與時間

再繼續沿着線向前走,下一個拐點肯定是局部最高(低)點,此時它是中間點,什么都不做

再繼續的話,又到了一個局部最低(高)點,記錄此時的序號與時間

現在:計算兩次記錄的差值,便可以知道周期點的個數,以及周期時間

對於預測,當然就可以根據預測時間與一天的起始時間差值模周期時間將其映射到第一個周期內,將余數代數擬合函數,求解近似值

到這,就可以預測溫度了,比如配置時間參數

觀測的真實值是:

預測結果為:

可以看出,結果還算可以(不過有些時間點的數據誤差有在1-2之間的)


本節完......


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM