工作中遇到要根據文件中某個字段分割成多行文本的處理,想到用awk處理,這里記錄下:
問題:
原文件:假設一共2個字段,用“|”分割,其中第二個字段用“#”分割,但該字段中也有不含“#”的值和空值
要求:根據第二個字段,若含#,將這條數據根據#分割成多條數據,無#和無值的行不變
202143108500|#0_1000_VOICE#0_1000_VOICE#0_1000_VOICE#0_TRAFFIC#0_TRAFFIC#0_TRAFFIC 202121366359|#0_1000_VOICE#0_TRAFFIC 202143108500|#0_1000_VOICE#0_1000_VOICE#0_1000_VOICE#0_TRAFFIC#0_TRAFFIC#0_TRAFFIC 202121366359|#0_1000_VOICE#0_TRAFFIC 202113492312|W_GH_YYM 202132164529|
用awk解決:
1、將含“#”的一行變多行
awk -F "|" -vOFS="|" '{l=split($2,arr,"#");for(i=1;i<l;i++){$2=arr[i+1];print}}' ./test.txt
結果:
202143108500|0_1000_VOICE 202143108500|0_1000_VOICE 202143108500|0_1000_VOICE 202143108500|0_TRAFFIC 202143108500|0_TRAFFIC 202143108500|0_TRAFFIC 202121366359|0_1000_VOICE 202121366359|0_TRAFFIC 202143108500|0_1000_VOICE 202143108500|0_1000_VOICE 202143108500|0_1000_VOICE 202143108500|0_TRAFFIC 202143108500|0_TRAFFIC 202143108500|0_TRAFFIC 202121366359|0_1000_VOICE 202121366359|0_TRAFFIC
2、將不含“#”篩選出來
awk -F "|" '$2!~/#/{print}' ./test.txt
結果:
202113492312|W_GH_YYM 202132164529|
經過上面兩步就可以解決,將結果生成新的文件 a.txt
awk -F "|" -vOFS="|" '{l=split($2,arr,"#");for(i=1;i<l;i++){$2=arr[i+1];print}}' ./test.txt >a.txt awk -F "|" '$2!~/#/{print}' ./test.txt >>a.txt
a.txt:
202143108500|0_1000_VOICE 202143108500|0_1000_VOICE 202143108500|0_1000_VOICE 202143108500|0_TRAFFIC 202143108500|0_TRAFFIC 202143108500|0_TRAFFIC 202121366359|0_1000_VOICE 202121366359|0_TRAFFIC 202143108500|0_1000_VOICE 202143108500|0_1000_VOICE 202143108500|0_1000_VOICE 202143108500|0_TRAFFIC 202143108500|0_TRAFFIC 202143108500|0_TRAFFIC 202121366359|0_1000_VOICE 202121366359|0_TRAFFIC 202113492312|W_GH_YYM 202132164529|