在github上,已經有前輩對這兩種格式的文件間的轉換提供了相應的python庫,比如liac-arff: https://github.com/renatopp/liac-arff。但是當程序比較復雜時,再調用這么多外部文件,未免顯得冗雜;而且這些arff庫,在attribute和值數目不一致時,會報錯。所以,在師兄的支持下,我參考overflow寫了兩個簡單的轉換函數。(用時5個多小時。。。以后要效率啊)
arff2txt():
將arff文件轉換成txt格式:
import re import sys def arff2txt(filename): txtfile = open('./generatedtxt.txt','w') arr = [] lines = [] arff_file = open(filename) for line in arff_file: if not (line.startswith("@")): if not (line.startswith("%")): line = line.strip("\n") line = line.split(',') arr.append(line) del arr[0] for child in arr: del child[10] if child[9] == "True": child[9] = 1 else: child[9] = 0 lines.append('\t'.join(map(str,child))) result = '\n'.join(lines) print result txtfile.writelines(result) txtfile.close()
txt2arff():
將txt文件轉換成arff()格式:
def txt2arff(filename, value): with open('./generatedarff.arff', 'w') as fp: fp.write('''@relation ExceptionRelation @attribute ID string @attribute Thrown numeric @attribute SetLogicFlag numeric @attribute Return numeric @attribute LOC numeric @attribute NumMethod numeric @attribute EmptyBlock numeric @attribute RecoverFlag numeric @attribute OtherOperation numeric @attribute class-att {True,False} @data ''') with open(filename) as f: contents = f.readlines() for content in contents: lines = content.split('\t') lines = [line.strip() for line in lines] if lines[9] == '1': lines[9] = "True" lines.append('{' + str(value) + '}') else: lines[9] = "False" lines.append('{1}') array = ','.join(lines) fp.write("%s\n" % array)