一、String.matches() ## 用於過濾需要處理的日志(如空格空行錯誤字符)
語句: "!123".matches("[a-zA-Z0-9]{4}") //false "34Az".matches("[a-zA-Z0-9]{4}") //true
// 應用: // 1.scala讀取log def readFromTxt(filePath:String): Array[String] ={ import scala.io.Source val source = Source.fromFile(filePath,"UTF-8") val lines = source.getLines().toArray source.close() lines } //2. 應用於過濾日志需要的信息
// regex里三個""",就不需要轉義了! val reg = """([A-Z]+) ([0-9]{4}-[0-9]{1,2}-[0-9]{1,2}) requestURI:(.*)""".r // 先過濾空格,再map lines.filter(_.matches("""([A-Z]+) ([0-9]{4}-[0-9]{1,2}-[0-9]{1,2}) requestURI:(.*)""")) .map(line=>line match{ case reg(level,logdate,addr)=>(level,logdate,addr) }).foreach(println(_))
----補充LOG日志-----
INFO 2000-10-01 requestURI:/c?app=0&p=1&did=180042334&industry=45Z
INFO 2012-11-11 requestURI:/c?app=2&p=3&did=140042334&industry=42Z
WARN 2012-11-11 requestURI:/c?app=2&p=3&did=140042334&industry=42Z
ERROR 2012-11-11 requestURI:/c?app=2&p=3&did=140042334&industry=42Z
二、case模式匹配(推薦使用,最方便)
模式匹配/模式守衛/類型匹配:https://blog.csdn.net/lyq7269/article/details/107759026
例1
// 語句1: val pattern = "([a-zA-Z][0-9][a-zA-Z] [0-9][a-zA-Z][0-9])".r "L3R 6M2" match { case pattern(x) => println("Valid zip-code: " + x ) //x為第1個分組結果,可以匹配多個分組 case x => println("Invalid zip-code: " + x ) } // 語句2: val date = """(\d\d\d\d)-(\d\d)-(\d\d)""".r "2014-05-23" match { case date(year, month, day) => println(year,month,day) } "2014-05-23" match { case date(year, _*) => println("The year of the date is " + year) } "2014-05-23" match { case date(_*) => println("It is a date") }
例2
val reg = """.* set se[0-9]_([0-9]+)_([0-9]+)_([0-9]+)r (.*),.*""".r rdd.foreach { case reg(zs, stu, ques, sa) => println(zs, stu, ques,sa) }
匹配log如下,取紅色字段
2019-06-16 14:24:34 INFO com.noriental.praxissvr.answer.util.PraxisSsdbUtil:45 [SimpleAsyncTaskExecutor-1] [020765925160] req: set se0_34434412_8195023659593_8080r 1,resp: ok 14
注意點:使用模式匹配雖然方便,但是要注意reg中的括號一定不能鑲嵌,比如匹配整數or小數時, ([0-9](\.[0-9])?) 會因為找不到哪個括號而報錯!最好使用 (.*)
三、import scala.util.matching.Regex API
1)findFirstMatchIn() 返回第一個匹配(Option[Match])
語句: import scala.util.matching.Regex val numberPattern: Regex = "[0-9]".r numberPattern.findFirstMatchIn("awesomepassword") match { case Some(_) => println("Password OK") //匹配成功 case None => println("Password must contain a number") //未匹配 }
2)分組處理
findAllMatchIn().toList => List[Regex.Match]
例1
語句2: import scala.util.matching.Regex val studentPattern:Regex="([0-9a-zA-Z-#() ]+):([0-9a-zA-Z-#() ]+)".r val input="name:Jason,age:19,weight:100" for(patternMatch<-studentPattern.findAllMatchIn(input)){ println(s"key: ${patternMatch.group(1)} value: ${patternMatch.group(2)}") }
例2
rdd.map(line=>{ val reg = """.* set se[0-9]_([0-9]+)_([0-9]+)_([0-9]+)r ([0-9](\.[0-9])?),.*""".r reg.findAllMatchIn(line).map(x=>(x.group(1),x.group(2),x.group(3),x.group(4)) .productIterator.mkString("\t")).mkString("") }).foreach(println(_))
匹配log如下,取紅色字段
2019-06-16 14:24:34 INFO com.noriental.praxissvr.answer.util.PraxisSsdbUtil:45 [SimpleAsyncTaskExecutor-1] [020765925160] req: set se0_34434412_8195023659593_8080r 1,resp: ok 14
3)字符串處理
1.字符串中替換
replaceFirstIn("長字符串","需要替換成什么字符")
replaceAllIn("長字符串","需要替換成什么字符")
語句1: "[0-9]+".r.replaceFirstIn("234 Main Street Suite 2034", "567") //234->567 "[0-9]+".r.replaceAllIn("234 Main Street Suite 2034", "567") //234、2034->567
2.
字符串中查找:findAllIn().toList => list[String]
字符串中查找:_用來扔掉不需要的數據,_*用於句末
語句1: val nums = "[0-9]+".r.findAllIn("123 Main Street Suite 2012").toList.foreach(println(_)) 語句2: val date = """(\d\d\d\d)-(\d\d)-(\d\d)""".r "2014-05-23" match { case date(year, month, day) => println(year,month,day) } "2014-05-23" match { case date(year, _*) => println("The year of the date is " + year) } "2014-05-23" match { case date(_*) => println("It is a date") }