Background:
最近為了重現tree-based clone detection的論文:L. Jiang, G. Misherghi, Z. Su, and S. Glondu. Deckard: Scalable and accurate tree-based detection of code clones. In Proceedings of ICSE, 2007.
需要對Java class中每個method構建AST,然后將AST轉化成dot格式,最后轉換成vector(這一步作者已經在Github實現(https://github.com/skyhover/Deckard):執行vdbgen即可)。
通過判斷vector之間的相似性來判斷代碼之間的相似性。
這個過程是tree-based clone detection的核心思想。
完整源碼已傳至我的Github: https://github.com/XBWer/JDT_AST_DOT
首先找一個class文件當例子:
Input : test.java
publicclass test { int i = 1; public void testNonEscaped() { startServer(NonEscapedURIResource.class); WebResource r = Client.create().resource(getUri().userInfo("x.y").path("x%20y").build()); assertEquals("CONTENT", r.get(String.class)); } }
Output: test.java_testNonEscaped.dot
digraph "DirectedGraph" { graph [label = "testNonEscaped", labelloc=t, concentrate = true]; "13329486" [ type=31 line=4 ] "327177752" [ type=83 line=4 ] "1458540918" [ type=39 line=4 ] "1164371389" [ type=42 line=4 ] "517210187" [ type=8 line=4 ] "267760927" [ type=21 line=5 ] "633070006" [ type=32 line=5 ] "1459794865" [ type=42 line=5 ] "1776957250" [ type=57 line=5 ] "1268066861" [ type=43 line=5 ] "827966648" [ type=42 line=5 ] "1938056729" [ type=60 line=7 ] "1273765644" [ type=43 line=7 ] "701141022" [ type=42 line=7 ] "1447689627" [ type=59 line=7 ] "112061925" [ type=42 line=7 ] "764577347" [ type=32 line=7 ] "1344645519" [ type=32 line=7 ] "1234776885" [ type=42 line=7 ] "540159270" [ type=42 line=7 ] "422250493" [ type=42 line=7 ] "1690287238" [ type=32 line=7 ] "1690254271" [ type=32 line=7 ] "1440047379" [ type=32 line=7 ] "343965883" [ type=32 line=7 ] "230835489" [ type=42 line=7 ] "280884709" [ type=42 line=7 ] "1847509784" [ type=45 line=7 ] "2114650936" [ type=42 line=7 ] "1635756693" [ type=45 line=7 ] "504527234" [ type=42 line=7 ] "101478235" [ type=21 line=8 ] "540585569" [ type=32 line=8 ] "1007653873" [ type=42 line=8 ] "836514715" [ type=45 line=8 ] "1414521932" [ type=32 line=8 ] "828441346" [ type=42 line=8 ] "1899073220" [ type=42 line=8 ] "555826066" [ type=57 line=8 ] "174573182" [ type=43 line=8 ] "858242339" [ type=42 line=8 ] "13329486" -> "327177752" "13329486" -> "1458540918" "13329486" -> "1164371389" "13329486" -> "517210187" "517210187" -> "267760927" "267760927" -> "633070006" "633070006" -> "1459794865" "633070006" -> "1776957250" "1776957250" -> "1268066861" "1268066861" -> "827966648" "517210187" -> "1938056729" "1938056729" -> "1273765644" "1273765644" -> "701141022" "1938056729" -> "1447689627" "1447689627" -> "112061925" "1447689627" -> "764577347" "764577347" -> "1344645519" "1344645519" -> "1234776885" "1344645519" -> "540159270" "764577347" -> "422250493" "764577347" -> "1690287238" "1690287238" -> "1690254271" "1690254271" -> "1440047379" "1440047379" -> "343965883" "343965883" -> "230835489" "1440047379" -> "280884709" "1440047379" -> "1847509784" "1690254271" -> "2114650936" "1690254271" -> "1635756693" "1690287238" -> "504527234" "517210187" -> "101478235" "101478235" -> "540585569" "540585569" -> "1007653873" "540585569" -> "836514715" "540585569" -> "1414521932" "1414521932" -> "828441346" "1414521932" -> "1899073220" "1414521932" -> "555826066" "555826066" -> "174573182" "174573182" -> "858242339" }
dot文件中,type代表節點的類型(定義請參閱:https://github.com/eclipse/eclipse.jdt.core/blob/master/org.eclipse.jdt.core/dom/org/eclipse/jdt/core/dom/ASTNode.java),line代表在文件中的位置(第幾行)。
可視化后是這個樣子:http://www.webgraphviz.com/
主要步驟:
1.將Java代碼轉成AST;
2.重寫ASTVisitor中的visit方法根據自己的需要去遍歷AST;
3.AST轉.dot格式。
主要的類: