知識圖譜推理與實踐(3) -- jena自定義builtin

本文轉載自查看原文 2019-09-12 09:29 567 推理/ java編程/ 知識圖譜/ Jena

在第2篇里，介紹了jena的The general purpose rule engine（通用規則引擎）及其使用，本篇繼續探究，如何自定義builtin。

builtin介紹

先回顧builtin為何物，官方叫Builtin primitives,可以理解為內置函數、內置指令，可以返回true或者false用來檢驗rule是否匹配，官方包含如下的primitives

Builtin	Operations
isLiteral(?x) notLiteral(?x) isFunctor(?x) notFunctor(?x) isBNode(?x) notBNode(?x)	Test whether the single argument is or is not a literal, a functor-valued literal or a blank-node, respectively.
bound(?x...) unbound(?x..)	Test if all of the arguments are bound (not bound) variables
equal(?x,?y) notEqual(?x,?y)	Test if x=y (or x != y). The equality test is semantic equality so that, for example, the xsd:int 1 and the xsd:decimal 1 would test equal.
lessThan(?x, ?y), greaterThan(?x, ?y) le(?x, ?y), ge(?x, ?y)	Test if x is <, >, <= or >= y. Only passes if both x and y are numbers or time instants (can be integer or floating point or XSDDateTime).
sum(?a, ?b, ?c) addOne(?a, ?c) difference(?a, ?b, ?c) min(?a, ?b, ?c) max(?a, ?b, ?c) product(?a, ?b, ?c) quotient(?a, ?b, ?c)	Sets c to be (a+b), (a+1) (a-b), min(a,b), max(a,b), (a b), (a/b). Note that these do not run backwards, if in `sum` a and c are bound and b is unbound then the test will fail rather than bind b to (c-a). This could be fixed.
strConcat(?a1, .. ?an, ?t) uriConcat(?a1, .. ?an, ?t)	Concatenates the lexical form of all the arguments except the last, then binds the last argument to a plain literal (strConcat) or a URI node (uriConcat) with that lexical form. In both cases if an argument node is a URI node the URI will be used as the lexical form.
regex(?t, ?p) regex(?t, ?p, ?m1, .. ?mn)	Matches the lexical form of a literal (?t) against a regular expression pattern given by another literal (?p). If the match succeeds, and if there are any additional arguments then it will bind the first n capture groups to the arguments ?m1 to ?mn. The regular expression pattern syntax is that provided by java.util.regex. Note that the capture groups are numbered from 1 and the first capture group will be bound to ?m1, we ignore the implicit capture group 0 which corresponds to the entire matched string. So for example regexp('foo bar', '(.) (. )', ?m1, ?m2) will bind `m1` to `"foo"` and `m2` to `"bar"`.
now(?x)	Binds ?x to an xsd:dateTime value corresponding to the current time.
makeTemp(?x)	Binds ?x to a newly created blank node.
makeInstance(?x, ?p, ?v) makeInstance(?x, ?p, ?t, ?v)	Binds ?v to be a blank node which is asserted as the value of the ?p property on resource ?x and optionally has type ?t. Multiple calls with the same arguments will return the same blank node each time - thus allowing this call to be used in backward rules.
makeSkolem(?x, ?v1, ... ?vn)	Binds ?x to be a blank node. The blank node is generated based on the values of the remain ?vi arguments, so the same combination of arguments will generate the same bNode.
noValue(?x, ?p) noValue(?x ?p ?v)	True if there is no known triple (x, p, ) or (x, p, v) in the model or the explicit forward deductions so far.
remove(n, ...) drop(n, ...)	Remove the statement (triple) which caused the n'th body term of this (forward-only) rule to match. Remove will propagate the change to other consequent rules including the firing rule (which must thus be guarded by some other clauses). In particular, if the removed statement (triple) appears in the body of a rule that has already fired, the consequences of such rule are retracted from the deducted model. Drop will silently remove the triple(s) from the graph but not fire any rules as a consequence. These are clearly non-monotonic operations and, in particular, the behaviour of a rule set in which different rules both drop and create the same triple(s) is undefined.
isDType(?l, ?t) notDType(?l, ?t)	Tests if literal ?l is (or is not) an instance of the datatype defined by resource ?t.
print(?x, ...)	Print (to standard out) a representation of each argument. This is useful for debugging rather than serious IO work.
listContains(?l, ?x) listNotContains(?l, ?x)	Passes if ?l is a list which contains (does not contain) the element ?x, both arguments must be ground, can not be used as a generator.
listEntry(?list, ?index, ?val)	Binds ?val to the ?index'th entry in the RDF list ?list. If there is no such entry the variable will be unbound and the call will fail. Only usable in rule bodies.
listLength(?l, ?len)	Binds ?len to the length of the list ?l.
listEqual(?la, ?lb) listNotEqual(?la, ?lb)	listEqual tests if the two arguments are both lists and contain the same elements. The equality test is semantic equality on literals (sameValueAs) but will not take into account owl:sameAs aliases. listNotEqual is the negation of this (passes if listEqual fails).
listMapAsObject(?s, ?p ?l) listMapAsSubject(?l, ?p, ?o)	These can only be used as actions in the head of a rule. They deduce a set of triples derived from the list argument ?l : listMapAsObject asserts triples (?s ?p ?x) for each ?x in the list ?l, listMapAsSubject asserts triples (?x ?p ?o).
table(?p) tableAll()	Declare that all goals involving property ?p (or all goals) should be tabled by the backward engine.
hide(p)	Declares that statements involving the predicate p should be hidden. Queries to the model will not report such statements. This is useful to enable non-monotonic forward rules to define flag predicates which are only used for inference control and do not "pollute" the inference results.

builtin 自定義

自定義很簡單，實現Builtin接口, 然后使用BuiltinRegistry.theRegistry.register注冊即可。

Builtin接口定義如下：

public interface Builtin {

    /**
     * Return a convenient name for this builtin, normally this will be the name of the 
     * functor that will be used to invoke it and will often be the final component of the
     * URI.
     */
    public String getName();
    
    /**
     * Return the full URI which identifies this built in.
     */
    public String getURI();
    
    /**
     * Return the expected number of arguments for this functor or 0 if the number is flexible.
     */
    public int getArgLength();
    
    /**
     * This method is invoked when the builtin is called in a rule body.
     * @param args the array of argument values for the builtin, this is an array 
     * of Nodes, some of which may be Node_RuleVariables.
     * @param length the length of the argument list, may be less than the length of the args array
     * for some rule engines
     * @param context an execution context giving access to other relevant data
     * @return return true if the buildin predicate is deemed to have succeeded in
     * the current environment
     */
    public boolean bodyCall(Node[] args, int length, RuleContext context);
    
    /**
     * This method is invoked when the builtin is called in a rule head.
     * Such a use is only valid in a forward rule.
     * @param args the array of argument values for the builtin, this is an array 
     * of Nodes.
     * @param length the length of the argument list, may be less than the length of the args array
     * for some rule engines
     * @param context an execution context giving access to other relevant data
     */
    public void headAction(Node[] args, int length, RuleContext context);
    
    /**
     * Returns false if this builtin has side effects when run in a body clause,
     * other than the binding of environment variables.
     */
    public boolean isSafe();
    
    /**
     * Returns false if this builtin is non-monotonic. This includes non-monotonic checks like noValue
     * and non-monotonic actions like remove/drop. A non-monotonic call in a head is assumed to 
     * be an action and makes the overall rule and ruleset non-monotonic. 
     * Most JenaRules are monotonic deductive closure rules in which this should be false.
     */
    public boolean isMonotonic();
}

一般我們不用直接實現該接口，可以繼承默認的實現BaseBuiltin, 一般只需要Override 下getName提供指令名稱，實現bodyCall,提供函數調用即可。

    @Override
    public String getName() {
        return "semsim";
    }

比如，我們來自定義一個指令，用來計算兩兩語義相似度：

public class SemanticSimilarityBuiltin extends BaseBuiltin {
    /**
     * Return a convenient name for this builtin, normally this will be the name of the
     * functor that will be used to invoke it and will often be the final component of the
     * URI.
     */
    @Override
    public String getName() {
        return "semsim";
    }

    @Override
    public int getArgLength() {
        return 3;
    }


    /**
     * This method is invoked when the builtin is called in a rule body.
     *
     * @param args    the array of argument values for the builtin, this is an array
     *                of Nodes, some of which may be Node_RuleVariables.
     * @param context an execution context giving access to other relevant data
     * @return return true if the buildin predicate is deemed to have succeeded in
     * the current environment
     */
    @Override
    public boolean bodyCall(Node[] args, int length, RuleContext context) {
        checkArgs(length, context);
        Node n1 = getArg(0, args, context);
        Node n2 = getArg(1, args, context);
        Node score = getArg(2,args,context);

        if(!score.isLiteral()  || score.getLiteral().getValue()==null){
         return false;
        }
        String value;
        Double hold = Double.parseDouble(score.getLiteralValue().toString());

        //  n.isLiteral() && n.getLiteralValue() instanceof Number

        if (n1.isLiteral() && n2.isLiteral()) {
            String v1 = n1.getLiteralValue().toString();
            String v2 = n2.getLiteralValue().toString();

            // 調用服務計算相似度
            String requestUrl = "http://API-URL:5101/similarity/cosine?s1="+v1+"&s2="+v2;
            String result = HttpClientUtil.doGet(requestUrl);
            JSONObject json = JSON.parseObject(result);
            if(json.getDouble("similarity") >= hold){
                return true;
            }

            return true;
        }
        return false;
    }
}

這里有個getArgLength和checkArgs(length, context)，可以用來限制參數長度，檢驗必須符合該長度。
可以通過getArg(idx, args, context)來獲取待計算的參數
上面的計算相似度，主要是調用外度的服務來計算兩兩的語義向量的cosine得分，如果滿足閾值，我們就認為規則匹配

測試

我們來測試上面的定義的計算語義相似度的指令semsim，還是用第2篇里的例子：

我們新增加兩個屬性主要業務和競爭對手，我們定義，如果兩個公司的主要業務語義上相似，我們就認為兩家公司是競爭對手。

        Property 主要業務 = myMod.createProperty(finance + "主要業務");
        Property 競爭對手 = myMod.createProperty(finance + "競爭對手");

        // 加入三元組
      
        myMod.add(萬達集團, 主要業務, "房地產，文娛");
        myMod.add(融創中國, 主要業務, "房地產");

然后定義規則：

[ruleCompetitor: (?c1 :主要業務 ?b1) (?c2 :主要業務 ?b2) notEqual(?c1,?c2) semsim(?b1,?b2,0.6)  -> (?c1 :競爭對手 ?c2)]

規則意思是，公司C1 主要業務是 b1,c2 主要業務是b2,並且c1和c2不是同一家公司，如果b1，b2的相似度大於0.6，那么C1和c2是競爭對手。

完整測試代碼：

       // 注冊自定義builtin
        BuiltinRegistry.theRegistry.register(new SemanticSimilarityBuiltin());

        Model myMod = ModelFactory.createDefaultModel();
        String finance = "http://www.example.org/kse/finance#";
        Resource 孫宏斌 = myMod.createResource(finance + "孫宏斌");
        Resource 融創中國 = myMod.createResource(finance + "融創中國");
        Resource 樂視網 = myMod.createResource(finance + "樂視網");
        Property 執掌 = myMod.createProperty(finance + "執掌");
        Resource 賈躍亭 = myMod.createResource(finance + "賈躍亭");
        Resource 地產公司 = myMod.createResource(finance + "地產公司");
        Resource 公司 = myMod.createResource(finance + "公司");
        Resource 法人實體 = myMod.createResource(finance + "法人實體");
        Resource 人 = myMod.createResource(finance + "人");
        Property 主要收入 = myMod.createProperty(finance + "主要收入");
        Resource 地產事業 = myMod.createResource(finance + "地產事業");
        Resource 王健林 = myMod.createResource(finance + "王健林");
        Resource 萬達集團 = myMod.createResource(finance + "萬達集團");
        Property 主要資產 = myMod.createProperty(finance + "主要資產");


        Property 股東 = myMod.createProperty(finance + "股東");
        Property 關聯交易 = myMod.createProperty(finance + "關聯交易");
        Property 收購 = myMod.createProperty(finance + "收購");

        Property 主要業務 = myMod.createProperty(finance + "主要業務");
        Property 競爭對手 = myMod.createProperty(finance + "競爭對手");

        // 加入三元組
        myMod.add(孫宏斌, 執掌, 融創中國);
        myMod.add(賈躍亭, 執掌, 樂視網);
        myMod.add(王健林, 執掌, 萬達集團);
        myMod.add(樂視網, RDF.type, 公司);
        myMod.add(萬達集團, RDF.type, 公司);
        myMod.add(融創中國, RDF.type, 地產公司);
        myMod.add(地產公司, RDFS.subClassOf, 公司);
        myMod.add(公司, RDFS.subClassOf, 法人實體);
        myMod.add(孫宏斌, RDF.type, 人);
        myMod.add(賈躍亭, RDF.type, 人);
        myMod.add(王健林, RDF.type, 人);
        myMod.add(萬達集團, 主要資產, 地產事業);
        myMod.add(萬達集團, 主要業務, "房地產，文娛");
        myMod.add(融創中國, 主要收入, 地產事業);
        myMod.add(融創中國, 主要業務, "房地產");
        myMod.add(孫宏斌, 股東, 樂視網);
        myMod.add(孫宏斌, 收購, 萬達集團);

        PrintUtil.registerPrefix("", finance);

        // 輸出當前模型
        StmtIterator i = myMod.listStatements(null, null, (RDFNode) null);
        while (i.hasNext()) {
            System.out.println(" - " + PrintUtil.print(i.nextStatement()));
        }


        GenericRuleReasoner reasoner = (GenericRuleReasoner) GenericRuleReasonerFactory.theInstance().create(null);
        reasoner.setRules(Rule.parseRules(
            "[ruleHoldShare: (?p :執掌 ?c) -> (?p :股東 ?c)] \n"
                + "[ruleConnTrans: (?p :收購 ?c) -> (?p :股東 ?c)] \n"
                + "[ruleConnTrans: (?p :股東 ?c) (?p :股東 ?c2) -> (?c :關聯交易 ?c2)] \n"
                + "[ruleCompetitor:: (?c1 :主要業務 ?b1) (?c2 :主要業務 ?b2) notEqual(?c1,?c2) semsim(?b1,?b2,0.6)  -> (?c1 :競爭對手 ?c2)] \n"
                + "-> tableAll()."));
        reasoner.setMode(GenericRuleReasoner.HYBRID);

        InfGraph infgraph = reasoner.bind(myMod.getGraph());
        infgraph.setDerivationLogging(true);

        System.out.println("推理后...\n");

        Iterator<Triple> tripleIterator = infgraph.find(null, null, null);
        while (tripleIterator.hasNext()) {
            System.out.println(" - " + PrintUtil.print(tripleIterator.next()));
        }

運行結果：

 - (:萬達集團 :關聯交易 :樂視網)
 - (:萬達集團 :關聯交易 :融創中國)
 - (:萬達集團 :競爭對手 :融創中國)
 - (:萬達集團 :關聯交易 :萬達集團)
 - (:孫宏斌 :股東 :萬達集團)
 - (:孫宏斌 :股東 :融創中國)
 - (:融創中國 :關聯交易 :萬達集團)
 - (:融創中國 :競爭對手 :萬達集團)
 - (:融創中國 :關聯交易 :樂視網)
 - (:融創中國 :關聯交易 :融創中國)
 - (:樂視網 :關聯交易 :萬達集團)
 - (:樂視網 :關聯交易 :融創中國)
 - (:樂視網 :關聯交易 :樂視網)
 - (:賈躍亭 :股東 :樂視網)
 - (:王健林 :股東 :萬達集團)
 - (:公司 rdfs:subClassOf :法人實體)
 - (:萬達集團 :主要業務 '房地產，文娛')
 - (:萬達集團 :主要資產 :地產事業)
 - (:萬達集團 rdf:type :公司)
 - (:地產公司 rdfs:subClassOf :公司)
 - (:融創中國 :主要業務 '房地產')
 - (:融創中國 :主要收入 :地產事業)
 - (:融創中國 rdf:type :地產公司)
 - (:孫宏斌 :收購 :萬達集團)
 - (:孫宏斌 :股東 :樂視網)
 - (:孫宏斌 rdf:type :人)
 - (:孫宏斌 :執掌 :融創中國)
 - (:樂視網 rdf:type :公司)
 - (:賈躍亭 rdf:type :人)
 - (:賈躍亭 :執掌 :樂視網)
 - (:王健林 rdf:type :人)
 - (:王健林 :執掌 :萬達集團)

可以根據需要，擴展更多的builtin，比如運行js，比如http請求。。。

作者：Jadepeng
出處：jqpeng的技術記事本--http://www.cnblogs.com/xiaoqi
您的支持是對博主最大的鼓勵，感謝您的認真閱讀。
本文版權歸作者所有，歡迎轉載，但未經作者同意必須保留此段聲明，且在文章頁面明顯位置給出原文連接，否則保留追究法律責任的權利。

免責聲明！

本站轉載的文章為個人學習借鑒使用，本站對版權不負任何法律責任。如果侵犯了您的隱私權益，請聯系本站郵箱yoyou2525@163.com刪除。

猜您在找 知識圖譜推理與實踐 (2) -- 基於jena實現規則推理知識圖譜推理與實踐（1）知識圖譜的推理知識圖譜推理FOIL 認知圖譜：知識圖譜+認知推理+邏輯表達知識圖譜實踐（一）從文本構建知識圖譜初學者入門知識圖譜必看的能力：推理搜索歷史、推理未來：時序知識圖譜上的兩階段推理知識圖譜學習與實踐（2）——知識圖譜數據模型的構建 20200926 DataFunTalk：知識圖譜專場（1）美團，知識圖譜問答實踐