shlex模塊為基於Uninx shell語法的語言提供了一個簡單的lexer(也就是tokenizer)
舉例說明:
有一個文本文件quotes.txt
This string has embedded "double quotes" and 'single quotes' in it, and even "a 'nested example'".
python 代碼
test.py
#!/usr/bin/env python import shlex import sys if len(sys.argv) != 2: print 'please input' sys.exit(1) filename = sys.argv[1] body = file(filename,'rt').read() print 'ORIGINAL:',repr(body) print print 'TOKENS:' lexer = shlex.shlex(body) for token in lexer: print repr(token)
執行命令:
./test.py quotes.txt
ORIGINAL: 'This string has embedded "double quotes" and \'single quotes\' in it,\nand even "a \'nested example\'".\n' TOKENS: 'This' 'string' 'has' 'embedded' '"double quotes"' 'and' "'single quotes'" 'in' 'it' ',' 'and' 'even' '"a \'nested example\'"' '.'
可以看出shlex非常智能強大,比正則表達式方便多了。