更改elasticsearch的score評分

　　在某些情況下，我們需要自定義score的分值，從而達到個性化搜索的目的。例如我們通過機器學習可以得到每個用戶的特征向量、同時知道每個商品的特征向量，如何計算這兩個特征向量的相似度？這個兩個特征向量越高，評分越高，從而把那些與用戶相似度高的商品優先推薦給用戶。

插件源碼解讀

　　通過查看官網文檔，運行一個腳步必須通過“ScriptEngine”來實現的。為了開發一個自定義的插件，我們需要實現“ScriptEngine”接口，並通過getScriptEngine()這個方法來加載我們的插件。ScriptEngine接口具體介紹見文獻[1].下面通過官網給出的一個具體例子：

  private static class MyExpertScriptEngine implements ScriptEngine {
  //可以命名自己在腳本api中使用的名稱來引用這個腳本后端。
    @Override
    public String getType() {
        return "expert_scripts";
    }

　　//核心方法，下面是通過java的lamada表達式來實現的
    @Override
    public <T> T compile(String scriptName, String scriptSource, ScriptContext<T> context, Map<String, String> params) {
        if (context.equals(SearchScript.CONTEXT) == false) {
            throw new IllegalArgumentException(getType() + " scripts cannot be used for context [" + context.name + "]");
        }
        // we use the script "source" as the script identifier
        if ("pure_df".equals(scriptSource)) {
        //通過p來獲取參數params中的值，lookup得到文檔中的的值
            SearchScript.Factory factory = (p, lookup) -> new SearchScript.LeafFactory() {
                final String field;
                final String term;
                {
                    if (p.containsKey("field") == false) {
                        throw new IllegalArgumentException("Missing parameter [field]");
                    }
                    if (p.containsKey("term") == false) {
                        throw new IllegalArgumentException("Missing parameter [term]");
                    }
                    field = p.get("field").toString();
                    term = p.get("term").toString();
                }

                @Override
                public SearchScript newInstance(LeafReaderContext context) throws IOException {
                    PostingsEnum postings = context.reader().postings(new Term(field, term));
                    if (postings == null) {
                        // the field and/or term don't exist in this segment, so always return 0
                        return new SearchScript(p, lookup, context) {
                            @Override
                            public double runAsDouble() {
                                return 0.0d;
                            }
                        };
                    }
                    return new SearchScript(p, lookup, context) {
                        int currentDocid = -1;
                        @Override
                        public void setDocument(int docid) {
                            // advance has undefined behavior calling with a docid <= its current docid
                            if (postings.docID() < docid) {
                                try {
                                    postings.advance(docid);
                                } catch (IOException e) {
                                    throw new UncheckedIOException(e);
                                }
                            }
                            currentDocid = docid;
                        }
                        @Override
                        public double runAsDouble() {
                            if (postings.docID() != currentDocid) {
                                // advance moved past the current doc, so this doc has no occurrences of the term
                                return 0.0d;
                            }
                            try {
                                return postings.freq();
                            } catch (IOException e) {
                                throw new UncheckedIOException(e);
                            }
                        }
                    };
                }

                @Override
                public boolean needs_score() {
                    return false;
                }
            };
            return context.factoryClazz.cast(factory);
        }
        throw new IllegalArgumentException("Unknown script name " + scriptSource);
    }

    @Override
    public void close() {
        // optionally close resources
    }
}

通過分析上面的代碼及結合業務需求，我們給出如下腳步：

腳步一

    package com;
    
    import org.apache.logging.log4j.LogManager;
    import org.apache.logging.log4j.Logger;
    import org.apache.lucene.index.LeafReaderContext;
    import org.elasticsearch.script.ScriptContext;
    import org.elasticsearch.script.ScriptEngine;
    import org.elasticsearch.script.SearchScript;
    
    import java.io.IOException;
    import java.util.*;
    
    /**
     * \* Created with IntelliJ IDEA.
     * \* User: 0.0
     * \* Date: 18-8-９
     * \* Time: 下午2:32
     * \* Description:為了得到個性化推薦搜索效果，我們計算用戶向量與每個產品特征向量的相似度。
     * 　　　　　　　　　相似度越高，最后得到的分值越高，排序越靠前.
     * \
     */

    public class FeatureVectorScoreSearchScript implements ScriptEngine {
        private final static Logger logger = LogManager.getLogger(FeatureVectorScoreSearchScript.class);
        @Override
        public String getType() {
            return "feature_vector_scoring_script";
        }
    @Override
    public <T> T compile(String scriptName, String scriptSource, ScriptContext<T> context, Map<String, String> params) {
        logger.info("The feature_vector_scoring_script is calculating the similarity of users and commodities");
        if (!context.equals(SearchScript.CONTEXT)) {
            throw new IllegalArgumentException(getType() + " scripts cannot be used for context [" + context.name + "]");
        }
        if("whb_fvs".equals(scriptSource)) {
            SearchScript.Factory factory = (p, lookup) -> new SearchScript.LeafFactory() {
                // 對入參檢查
                final Map<String, Object> inputFeatureVector;
                final String field;
                {
                    if (p.containsKey("field") == false) {
                        throw new IllegalArgumentException("Missing parameter [field]");
                    }
                    if(p.containsKey("inputFeatureVector") == false){
                        throw new IllegalArgumentException("Missing parameter [inputFeatureVector]");
                    }
                    field = p.get("field").toString();
                    inputFeatureVector = (Map<String,Object>) p.get("inputFeatureVector");

                }
                @Override
                public SearchScript newInstance(LeafReaderContext context) throws IOException {
                    return new SearchScript(p, lookup, context) {
                        @Override
                        public double runAsDouble() {
                            if(lookup.source().containsKey(field)==true){
                                final Map<String, Double> productFeatureVector = (Map<String, Double>) lookup.source().get(field);
                                return calculateVectorSimilarity(inputFeatureVector, productFeatureVector);
                            }else {
                                logger.info("The " + field + " is not exist in the product");
                                return 0.0D;
                            }
                        }
                    };
                }

                @Override
                public boolean needs_score() {
                    return false;
                }
            };
            return context.factoryClazz.cast(factory);
        }throw new IllegalArgumentException("Unknown script name " + scriptSource);

    }

    @Override
    public void close() {
    }

    //計算兩個向量的相似度(cos)
    public double calculateVectorSimilarity(Map<String, Object> inputFeatureVector , Map<String,Double> productFeatureVector){
        double sumOfProduct = 0.0D;
        double sumOfUser = 0.0D;
        double sumOfSquare = 0.0D;
        if(inputFeatureVector!=null && productFeatureVector!=null){
            for(Map.Entry<String, Object> entry: inputFeatureVector.entrySet()){
                String dimName = entry.getKey();
                double dimScore = Double.parseDouble(entry.getValue().toString());
                double itemDimScore = productFeatureVector.get(dimName);
                sumOfUser += dimScore*dimScore;
                sumOfProduct += itemDimScore*itemDimScore;
                sumOfSquare += dimScore*itemDimScore;
            }
            if(sumOfUser*sumOfProduct==0.0D){
                return 0.0D;
            }
            return sumOfSquare / (Math.sqrt(sumOfUser)*Math.sqrt(sumOfProduct));
        }else {
            return 0.0D;
        }
    }

    }

腳本二(fast-vector-distance)


/**
 * \* Created with IntelliJ IDEA.
 * \* User: 王火斌
 * \* Date: 18-8-９
 * \* Time: 下午2:32
 * \* Description:為了得到個性化推薦搜索效果，我們計算用戶向量與每個產品特征向量的相似度。
 * 　　　　　　　　　相似度越高，最后得到的分值越高，排序越靠前.
 * \
 */
/**
package com;
import org.apache.logging.log4j.LogManager;
import org.apache.logging.log4j.Logger;
import org.apache.lucene.index.LeafReaderContext;
import org.elasticsearch.common.settings.Settings;
import org.elasticsearch.plugins.Plugin;
import org.elasticsearch.plugins.ScriptPlugin;
import org.elasticsearch.script.ScriptContext;
import org.elasticsearch.script.ScriptEngine;
import org.elasticsearch.script.SearchScript;
import org.apache.lucene.index.BinaryDocValues;
import org.apache.lucene.store.ByteArrayDataInput;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.DoubleBuffer;
import java.util.*;

 * This class is instantiated when Elasticsearch loads the plugin for the
 * first time. If you change the name of this plugin, make sure to update
 * src/main/resources/es-plugin.properties file that points to this class.
 */
public final class FastVectorDistance extends Plugin implements ScriptPlugin {

    @Override
    public ScriptEngine getScriptEngine(Settings settings, Collection<ScriptContext<?>> contexts) {
        return new FastVectorDistanceEngine();
    }

    private static class FastVectorDistanceEngine implements ScriptEngine {
        private final static Logger logger = LogManager.getLogger(FastVectorDistance.class);
        private static final int DOUBLE_SIZE = 8;

        double queryVectorNorm;

        @Override
        public String getType() {
            return "feature_vector_scoring_script";
        }

        @Override
        public <T> T compile(String scriptName, String scriptSource, ScriptContext<T> context, Map<String, String> params) {
            logger.info("The feature_vector_scoring_script is calculating the similarity of users and commodities");
            if (!context.equals(SearchScript.CONTEXT)) {
                throw new IllegalArgumentException(getType() + " scripts cannot be used for context [" + context.name + "]");
            }
            if ("whb_fvd".equals(scriptSource)) {
                SearchScript.Factory factory = (p, lookup) -> new SearchScript.LeafFactory() {
                    // The field to compare against
                    final String field;
                    //Whether this search should be cosine or dot product
                    final Boolean cosine;
                    //The query embedded vector
                    final Object vector;
                    Boolean exclude;
                    //The final comma delimited vector representation of the query vector
                    double[] inputVector;

                    {
                        if (p.containsKey("field") == false) {
                            throw new IllegalArgumentException("Missing parameter [field]");
                        }

                        //Determine if cosine
                        final Object cosineBool = p.get("cosine");
                        cosine = cosineBool != null ? (boolean) cosineBool : true;

                        //Get the field value from the query
                        field = p.get("field").toString();

                        final Object excludeBool = p.get("exclude");
                        exclude = excludeBool != null ? (boolean) cosineBool : true;

                        //Get the query vector embedding
                        vector = p.get("vector");

                        //Determine if raw comma-delimited vector or embedding was passed
                        if (vector != null) {
                            final ArrayList<Double> tmp = (ArrayList<Double>) vector;
                            inputVector = new double[tmp.size()];
                            for (int i = 0; i < inputVector.length; i++) {
                                inputVector[i] = tmp.get(i);
                            }
                        } else {
                            final Object encodedVector = p.get("encoded_vector");
                            if (encodedVector == null) {
                                throw new IllegalArgumentException("Must have 'vector' or 'encoded_vector' as a parameter");
                            }
                            inputVector = Util.convertBase64ToArray((String) encodedVector);
                        }

                        //If cosine calculate the query vec norm
                        if (cosine) {
                            queryVectorNorm = 0d;
                            // compute query inputVector norm once
                            for (double v : inputVector) {
                                queryVectorNorm += Math.pow(v, 2.0);
                            }
                        }
                    }

                    @Override
                    public SearchScript newInstance(LeafReaderContext context) throws IOException {

                        return new SearchScript(p, lookup, context) {
                            Boolean is_value = false;

                            // Use Lucene LeafReadContext to access binary values directly.
                            BinaryDocValues accessor = context.reader().getBinaryDocValues(field);

                            @Override
                            public void setDocument(int docId) {
                                // advance has undefined behavior calling with a docid <= its current docid
                                try {
                                    accessor.advanceExact(docId);
                                    is_value = true;
                                } catch (IOException e) {
                                    is_value = false;
                                }
                            }


                            @Override
                            public double runAsDouble() {

                                //If there is no field value return 0 rather than fail.
                                if (!is_value) return 0.0d;

                                final int inputVectorSize = inputVector.length;
                                final byte[] bytes;

                                try {
                                    bytes = accessor.binaryValue().bytes;
                                } catch (IOException e) {
                                    return 0d;
                                }


                                final ByteArrayDataInput byteDocVector = new ByteArrayDataInput(bytes);

                                byteDocVector.readVInt();

                                final int docVectorLength = byteDocVector.readVInt(); // returns the number of bytes to read

                                if (docVectorLength != inputVectorSize * DOUBLE_SIZE) {
                                    return 0d;
                                }

                                final int position = byteDocVector.getPosition();

                                final DoubleBuffer doubleBuffer = ByteBuffer.wrap(bytes, position, docVectorLength).asDoubleBuffer();

                                final double[] docVector = new double[inputVectorSize];

                                doubleBuffer.get(docVector);

                                double docVectorNorm = 0d;
                                double score = 0d;

                                //calculate dot product of document vector and query vector
                                for (int i = 0; i < inputVectorSize; i++) {

                                    score += docVector[i] * inputVector[i];

                                    if (cosine) {
                                        docVectorNorm += Math.pow(docVector[i], 2.0);
                                    }
                                }

                                //If cosine, calcluate cosine score
                                if (cosine) {

                                    if (docVectorNorm == 0 || queryVectorNorm == 0) return 0d;

                                    score = score / (Math.sqrt(docVectorNorm) * Math.sqrt(queryVectorNorm));
                                }

                                return score;
                            }
                        };
                    }

                    @Override
                    public boolean needs_score() {
                        return false;
                    }
                };
                return context.factoryClazz.cast(factory);
            }
            throw new IllegalArgumentException("Unknown script name " + scriptSource);
        }

        @Override
        public void close() {}
    }
}

部署

通過maven來部署，具體部署步驟如下：

配置pom文件
加載依賴類，設置項目創建目錄。

4.0.0
es-plugin
elasticsearch-plugin
1.0-SNAPSHOT

 <dependencies>
     <dependency>
         <groupId>org.elasticsearch</groupId>
         <artifactId>elasticsearch</artifactId>
         <version>6.1.1</version>
     </dependency>
     <dependency>
         <groupId>junit</groupId>
         <artifactId>junit</artifactId>
         <version>4.12</version>
         <scope>test</scope>
     </dependency>
 </dependencies>
 <build>
     <plugins>
         <plugin>
             <artifactId>maven-assembly-plugin</artifactId>
             <version>2.3</version>
             <configuration>
                 <appendAssemblyId>false</appendAssemblyId>
                 <outputDirectory>${project.build.directory}/releases/</outputDirectory>
                 <descriptors>
                     <descriptor>${basedir}/src/assembly/plugin.xml</descriptor>
                 </descriptors>
             </configuration>
             <executions>
                 <execution>
                     <phase>package</phase>
                     <goals>
                         <goal>single</goal>
                     </goals>
                 </execution>
             </executions>
         </plugin>
         <plugin>
             <groupId>org.apache.maven.plugins</groupId>
             <artifactId>maven-compiler-plugin</artifactId>
             <configuration>
                 <source>1.8</source>
                 <target>1.8</target>
             </configuration>
         </plugin>
     </plugins>
 </build>

2.創建xml文件

<?xml version="1.0"?>
<assembly>
    <id>plugin</id>
    <formats>
        <format>zip</format>
    </formats>
    <includeBaseDirectory>false</includeBaseDirectory>
    <fileSets>
        <fileSet>
            <directory>${project.basedir}/src/main/resources</directory>
            <outputDirectory>feature-vector-score</outputDirectory>
        </fileSet>
    </fileSets>
    <dependencySets>
        <dependencySet>
            <outputDirectory>feature-vector-score</outputDirectory>
            <useProjectArtifact>true</useProjectArtifact>
            <useTransitiveFiltering>true</useTransitiveFiltering>
            <excludes>
                <exclude>org.elasticsearch:elasticsearch</exclude>
                <exclude>org.apache.logging.log4j:log4j-api</exclude>
            </excludes>
        </dependencySet>
    </dependencySets>
</assembly>

3.創建plugin-descriptor.properties文件

description=feature-vector-similarity
version=1.0
name=feature-vector-score
site=${elasticsearch.plugin.site}
jvm=true
classname=com.FeatureVectorScoreSearchPlugin
java.version=1.8
elasticsearch.version=6.1.1
isolated=${elasticsearch.plugin.isolated}

description:simple summary of the plugin
version(String):plugin’s version
name(String):the plugin name
classname(String):the name of the class to load, fully-qualified.
java.version(String):version of java the code is built against. Use the system property java.specification.version. Version string must be a sequence of nonnegative decimal integers separated by "."'s and may have leading zeros.

測試

創建索引

create_index = {
    "settings": {
        "analysis": {
            "analyzer": {
                # this configures the custom analyzer we need to parse vectors such that the scoring
                # plugin will work correctly
                "payload_analyzer": {
                    "type": "custom",
                    "tokenizer":"whitespace",
                    "filter":"delimited_payload_filter"
                }
            }
        }
    },
    "mappings": {
           "movies": {
            # this mapping definition sets up the metadata fields for the movies
            "properties": {
                "movieId": {
                    "type": "integer"
                },
                "tmdbId": {
                    "type": "keyword"
                },
                "genres": {
                    "type": "keyword"
                },
                "release_date": {
                    "type": "date",
                    "format": "year"
                },
                "@model": {
                    # this mapping definition sets up the fields for movie factor vectors of our model
                    "properties": {
                        "factor": {
                            "type": "binary",
                            "doc_values": true
                        },
                        "version": {
                            "type": "keyword"
                        },
                        "timestamp": {
                            "type": "date"
                        }
                    }
                }
            }}
}}

查詢

You can execute the script by specifying its lang as expert_scripts, and the name of the script as the script source:

{
  "query": {
     
     "function_score": {
      "query": {
        "match_all": {  
        }
      },
        "functions": [
          {
            "script_score": {
              "script": {
                  "source": "whb_fvd",
                  "lang" : "feature_vector_scoring_script",
                  "params": {
                      "field": "@model.factor",
                      "cosine": true,
                      "encoded_vector" :"v9EUmGAAAAC/6f9VAAAAAL/j+OOgAAAAv+m6+oAAAAA/lTSDIAAAAL/FdkTAAAAAv7rKHKAAAAA/0iyEYAAAAD/ZUY6gAAAAP7TzYoAAAAA/1K4IAAAAAD+yH9XgAAAAv6QRBSAAAAA/vRiiwAAAAL/mRhzgAAAAv9WxpiAAAAC/8YD+QAAAAL/jpbtgAAAAv+zmD+AAAAC/1eqtIAAAAA==" 
                  }
              }
            }
          }
        ]
    }
  }
}

版本說明

在最近一年中,es版本迭代速度很快，上述插件主要使用了SearchScript類適用於v5.4-v6.4。在esv5.4以下的版本，主要使用ExecutableScript類。對於es大於6.4版本，出現了一個新類ScoreScript來實現自定義評分腳本。

項目詳細見github

https://github.com/SnailWhb/elasticsearch_pulgine_fast-vector-distance

參考文獻

[1]https://static.javadoc.io/org.elasticsearch/elasticsearch/6.0.1/org/elasticsearch/script/ScriptEngine.html
[2]https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting-engine.html
[3]https://github.com/jiashiwen/elasticsearchpluginsample
[4]https://www.elastic.co/guide/en/elasticsearch/plugins/6.3/plugin-authors.html

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 特征值與特征向量的計算方法（原）使用mkl計算特征值和特征向量計算正互反矩陣的特征值及特征向量左右特征向量特征向量特征向量的數值求法矩陣的特征值和特征向量特征值和特征向量 python計算平面的法向-利用協方差矩陣求解特征值和特征向量特征向量與特征值及其應用