解決ValueError: Some of types cannot be determined by the first 100 rows, - 碼上歡樂

相關內容簡體繁體

解決ValueError: Some of types cannot be determined by the first 100 rows,

本文轉載自查看原文 2020-06-17 10:52 766 Python/ 舊舊的

在spark中試圖將RDD轉換成DataFrame時，有時會提示ValueError: Some of types cannot be determined by the first 100 rows, please try again with sampling

原因

RDD中元素的內部結構是未知的、不明確的，也就是說每個元素里面有哪些字段，每個字段是什么類型，這些都是不知道的，而DataFrame則要求對元素的內部結構有完全的知情權。

但是在前100行的數據采樣過程中還是無法確定字段的類型，所以就會提示這個。

解決辦法

一、提高數據采樣率(sampling ratio)

sqlContext.createDataFrame(rdd, samplingRatio=0.2)

其中的samplingRatio參數就是數據采樣率，可以先設置為0.2試試，如果不行，可以繼續增加。

該方法的缺點在於，數據抽樣確定類型之后，如果后續類型發生變化，則會導致程序崩潰，抽樣檢測完成還是無法確定類型，依舊會崩潰

所以就有了下面的解決方案。

二、顯式聲明要創建的DataFrame的數據結構，即schema。

　　

#首先引入類型和方法，具體有StructType, StructField, StringType, IntegerType等方法，處理不同的數據類型
from pyspark.sql.types import *

#構建 schema
schema = StructType([
    StructField("column_1", StringType(), True),
    StructField("column_2", IntegerType(), True)
    .
    .
    .
])

#傳入聲明
df = sqlContext.createDataFrame(rdd, schema=schema)

當顯式聲明schema並應用到createDataFrame方法中后，就不再需要samplingRatio參數了。

實際開發工程中建議使用顯式聲明schema的方案，這樣可以避免出現因奇葩數據導致的錯誤。

作者：舊舊的 <393210556@qq.com> 解決問題的方式，就是解決它一次

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 The AC power adapter wattage and type cannot be determined. 解決方案 ValueError: This model has not yet been built. Build the model first by calling `build()` or calling `fit()` with some data, or specify an `input_shape` argument in the first layer(s) for automatic git 解決push報錯：[rejected] master -> master (fetch first) error: failed to push some refs to mvn編譯“Cannot find matching toolchain definitions for the following toolchain types“報錯解決方法 Function types cannot have argument labels 錯誤解決方案寫入EXCEL數據報錯：VALUEERROR: CANNOT CONVERT {'} TO EXCEL 解決方法 pytorch報錯----------- ***ValueError: some of the strides of a given numpy array are negative. pandas里面過濾列出現ValueError: cannot index with vector containing NA / NaN values錯誤的解決方法(轉) ERROR: COALESCE types character varying and integer cannot be matched No Spring WebApplicationInitializer types detected on classpath異常的解決

粵ICP備18138465號 © 2018-2025 CODEPRJ.COM