再次訓練與參數調整
在UsedCarsPricePredictionMLModel.training.cs
文件下,有訓練設置與訓練模型的方法
BuildPipeline
方法中是ML .NET自動生成的訓練設置,包括選擇了哪些參數,預測的字段是什么,
以及調用LightGbm
方法,參數配置為
{
NumberOfLeaves=17,
MinimumExampleCountPerLeaf=25,
NumberOfIterations=6019,
MaximumBinCountPerFeature=24,
LearningRate=1F,
LabelColumnName=@"Price",
FeatureColumnName=@"Features",
Booster=new GradientBooster.Options()
{
SubsampleFraction=0.706948120047722F,
FeatureFraction=0.521537449021549F,
L1Regularization=0.00247814105551342F,
L2Regularization=0.00137211480690565F
}
}
這些都是由ML .NET自動生成好的推薦配置參數,如果本身對機器學習有所研究,可以在此基礎上進行修改,以達到優化模型的作用
參考資料 LightGbmExtensions.LightGbm 方法
完整訓練代碼如下
public static IEstimator<ITransformer> BuildPipeline(MLContext mlContext)
{
// Data process configuration with pipeline data transformations
var pipeline = mlContext.Transforms.Categorical.OneHotEncoding(new []{new InputOutputColumnPair(@"Fuel_Type", @"Fuel_Type"),new InputOutputColumnPair(@"Transmission", @"Transmission"),new InputOutputColumnPair(@"Owner_Type", @"Owner_Type")})
.Append(mlContext.Transforms.ReplaceMissingValues(new []{new InputOutputColumnPair(@"Year", @"Year"),new InputOutputColumnPair(@"Kilometers_Driven", @"Kilometers_Driven"),new InputOutputColumnPair(@"Seats", @"Seats")}))
.Append(mlContext.Transforms.Text.FeaturizeText(@"Name", @"Name"))
.Append(mlContext.Transforms.Text.FeaturizeText(@"Location", @"Location"))
.Append(mlContext.Transforms.Text.FeaturizeText(@"Engine", @"Engine"))
.Append(mlContext.Transforms.Text.FeaturizeText(@"Power", @"Power"))
.Append(mlContext.Transforms.Concatenate(@"Features", new []{@"Fuel_Type",@"Transmission",@"Owner_Type",@"Year",@"Kilometers_Driven",@"Seats",@"Name",@"Location",@"Engine",@"Power"}))
.Append(mlContext.Regression.Trainers.LightGbm(new LightGbmRegressionTrainer.Options(){NumberOfLeaves=17,MinimumExampleCountPerLeaf=25,NumberOfIterations=6019,MaximumBinCountPerFeature=24,LearningRate=1F,LabelColumnName=@"Price",FeatureColumnName=@"Features",Booster=new GradientBooster.Options(){SubsampleFraction=0.706948120047722F,FeatureFraction=0.521537449021549F,L1Regularization=0.00247814105551342F,L2Regularization=0.00137211480690565F}}));
return pipeline;
}
之后可以調用RetrainPipeline
方法再次訓練,得到新的模型
public static ITransformer RetrainPipeline(MLContext context, IDataView trainData)
{
var pipeline = BuildPipeline(context);
var model = pipeline.Fit(trainData);
return model;
}
獲取model后保存成文件
//注意,這里使用txt或者tsv格式的文件
string trainCsvPath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "TrainData", "train-data.txt");
string testCsvPath = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "TrainData", "test-data2.txt");
string modelDirectory = Path.Combine(AppDomain.CurrentDomain.BaseDirectory, "Model");
string modelPath = Path.Combine(modelDirectory, "UsedCarsPricePredictionMLModel.zip");
MLContext mlContext = new MLContext(seed: 0);
IDataView trainingDataView = mlContext.Data.LoadFromTextFile<ModelInput>(trainCsvPath, hasHeader: true);
var model = UsedCarsPricePredictionMLModel.RetrainPipeline(mlContext, trainingDataView);
if (!Directory.Exists(modelDirectory))
Directory.CreateDirectory(modelDirectory);
mlContext.Model.Save(model, trainingDataView.Schema, modelPath);
小問題
問題1:
Property 'Column1' is missing the LoadColumnAttribute attribute
根據提示,需要為ModelInput
模型輸入類的每個屬性添加LoadColumn
特性,指明所在列
問題2:
Schema mismatch for input column 'Name_CharExtractor': expected Expected known-size vector of Single, got Vector<Single> Arg_ParamName_Name
根據ML.NET: Schema mismatch for input column 'AnswerFeaturized_CharExtractor': expected Expected Single or known-size vector of Single, got Vector
.csv
文件,改為.txt
文件或者.tsv
文件
示例代碼
參考資料
10分鍾快速入門
官方示例machinelearning-samples
教程:將回歸與 ML.NET 配合使用以預測價格