spark dataframe 類型轉換


讀一張表,對其進行二值化特征轉換。可以二值化要求輸入類型必須double類型,類型怎么轉換呢?

直接利用spark column 就可以進行轉換:

 

DataFrame dataset = hive.sql("select age,sex,race from hive_race_sex_bucktizer ");

/**

* 類型轉換

*/

dataset = dataset.select(dataset.col("age").cast(DoubleType).as("age"),dataset.col("sex"),dataset.col("race"));

 

是不是很簡單。想起之前的類型轉換做法,遍歷並創建另外一個滿足類型要求的RDD,然后根據RDD創建Datafame,好復雜!!!!

 

		JavaRDD<Row> parseDataset =   dataset.toJavaRDD().map(new Function<Row,Row>() {

			@Override
			public Row call(Row row) throws Exception {
				System.out.println(row);
				long age = row.getLong(row.fieldIndex("age"));
				String sex = row.getAs("sex");
				String race =row.getAs("race");
				double raceV  = -1;
				if("white".equalsIgnoreCase(race)){
					raceV = 1;
				} else if("black".equalsIgnoreCase(race)) {
					raceV = 2;
				} else if("yellow".equalsIgnoreCase(race)) {
					raceV = 3;
				} else if("Asian-Pac-Islander".equalsIgnoreCase(race)) {
					raceV = 4;
				}else if("Amer-Indian-Eskimo".equalsIgnoreCase(race)) {
					raceV = 3;
				}else {
					raceV = 0;
				}
				
				return RowFactory.create(age,("male".equalsIgnoreCase(sex)?1:0),raceV);
			}
		});
		
		StructType schema = new StructType(new StructField[]{
				 createStructField("_age", LongType, false),
				  createStructField("_sex", IntegerType, false),
				  createStructField("_race", DoubleType, false)
				});
		
		DataFrame  df  =  hive.createDataFrame(parseDataset, schema);

  不斷探索,不斷嘗試!

 


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM