cat test2.html | sed -e 's/\(^\|[^0-9]\)\(13[0-9][0-9]\{8\}\|14[579][0-9]\{8\}\|15[0-3,5-9][0-9]\{8\}\|16[6][0-9]\{8\}\|17[0135678][0-9]\{8\}\|18[0-9][0-9]\{8\}\|19[89][0-9]\{8\}\)\($\|[^0-9]\)/\nfind_phone:\2\n/g' | sed -e 's/\(^\|[^0-9]\)\([0-9]\{6\}[1-2][0-9]\{3\}\(\(0[1-9]\)\|\(10\|11\|12\)\)\(\([0-2][1-9]\)\|10\|20\|30\|31\)[0-9]\{3\}[0-9Xx]\)\($\|[^0-9]\)/\nfind_idcard:\2\n/g' | awk '/find_.*/{printf $1;printf "\t"}'
测试文件test2.html内容:
dddd
bbb131102198910084421ccc eee13611112222fff13133334444
h15855556666j
aaaa
13177778888
13199990000
18611112222
370785199507319527
测试结果:
find_idcard:131102198910084421 find_phone:13611112222 find_phone:13133334444 find_phone:15855556666 find_phone:13177778888 find_phone:13199990000 find_phone:18611112222 find_idcard:370785199507319527
身份证号正则式:https://www.jb51.net/article/109384.htm
只是参考,不能直接用,shell中或|要加\;左右括号()也要加\;表示8个数字应为[0-9]\{8\} https://zhidao.baidu.com/question/1115861792946350259.html;^表示开头$表示结尾,不需要加\
手机号正则式:https://blog.csdn.net/voidmain_123/article/details/78962164 同只是参考,不能直接用
awk命令:按行读取。未匹配上的不保留 https://www.cnblogs.com/xudong-bupt/p/3721210.html
sed命令:我自己试出来的。。
awk、sed、grep、fgrep、egrep:
https://www.cnblogs.com/EasonJim/p/8282511.html
https://blog.csdn.net/qq504196282/article/details/52995198
https://www.cnblogs.com/moveofgod/p/3540575.html
同时匹配ABC 和 123: sed -n '/ABC/{/123/p}' awk '/ABC/&&/123/{ print $0 }' grep -E '(ABC.*123|123.*ABC)' 匹配ABC 或 123: sed -n '/\(ABC\|123\)/p' awk '/ABC/||/123/{ print $0 }' grep -E '(ABC|123)' 或 egrep 'ABC|123'
shell awk输出换行print,shell输出不换行printf,连续输出中间用分号