Schemaless模式
schemaless模式是一組solr功能的集合,允許用戶通過簡單的索引例子數據快速構建一個有效的schema,而不需要手動的編輯schema.這些solr功能都是在solrconfig.xml中指定的.主要是:
schema管理:schema修改是通過Solr API 而不是手動修改來完成的.參考--在solrconfig中管理schema定義.
字段值class的猜測:明顯的,不可見的字段運行是通過一組級聯的基於值的解析器,這些解析器可以猜測字段值的java類,用來解析Boolean, Integer, Long, Float, Double, 和Date.
基於字段值的java類,自動schema字段添加.
這三個功能預先配置在example/example-schemaless/solr/目錄下,為了使用預先配置的schemaless模式,到example目錄下,啟動solr,使用一下命令設置solr.solr.home系統屬性到這個目錄.
java -Dsolr.solr.home=example-schemaless/solr -jar start.jar
example-schemaless/solr/collection1/conf/下的schema主要依賴兩個字段,id和_version_,這些可以調用schema API的/schema/fields來查看.curl http://localhost:8983/solr/schema/fields :
{
"responseHeader":{
"status":0,
"QTime":1},
"fields":[{
"name":"_version_",
"type":"long",
"indexed":true,
"stored":true},
{
"name":"id",
"type":"string",
"multiValued":false,
"indexed":true,
"required":true,
"stored":true,
"uniqueKey":true}]}
添加一個cvs文檔,它的字段沒有在schem中添加,具有基於值的字段類型.
curl "http://localhost:8983/solr/update?commit=true" -H "Content-type:application/csv"
-d '
id,Artist,Album,Released,Rating,FromDistributor,Sold
44C,Old Shews,Mead for Walking,1988-08-13,0.01,14,0'
輸出表明成功:
<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"><int name="status">0</int><int name="QTime">106</int></lst> </response>
在schema中,現在的字段(curl http://localhost:8983/solr/schema/fields):
{
"responseHeader":{
"status":0,
"QTime":1},
"fields":[{
"name":"Album",
"type":"text_general"}, // Field value guessed as String -> text_general
fieldType
{
"name":"Artist",
"type":"text_general"}, // Field value guessed as String -> text_general
fieldType
{
"name":"FromDistributor",
"type":"tlongs"}, // Field value guessed as Long -> tlongs fieldType
{
"name":"Rating",
"type":"tdoubles"}, // Field value guessed as Double -> tdoubles fieldType
{
"name":"Released",
"type":"tdates"}, // Field value guessed as Date -> tdates fieldType
{
"name":"Sold",
"type":"tlongs"}, // Field value guessed as Long -> tlongs fieldType
{
"name":"_version_",
...
},
{
"name":"id",
...
}]}
一旦一個字段添加到schema中,它的字段類型就是固定的.舉例說明,如果已經添加了上一個文檔,字段Sold的字段類型就是tlongs,但是下面這個文檔這個字段中不是一個整數數字值.
curl "http://localhost:8983/solr/update?commit=true" -H "Content-type:application/csv"
-d '
id,Description,Sold
19F,Cassettes by the pound,4.93'
輸出結果表面失敗:
<?xml version="1.0" encoding="UTF-8"?> <response> <lst name="responseHeader"> <int name="status">400</int> <int name="QTime">7</int> </lst> <lst name="error"> <str name="msg">ERROR: [doc=19F] Error adding field 'Sold'='4.93' msg=For input string: "4.93"</str> <int name="code">400</int> </lst> </response>