從unmarshal帶json字符串字段的json說起


事情是這樣的,有一段json,里面有字段本應該是obj,但是encode的時候被當成string在,就成了這個樣子:

{"body":"{\"sn\":\"aaaa\\\/bbbb\"}"}

json.Unmarshal來解析的話,顯然要映射到這樣的struct里:

	rawStr := `
{"body":"{\"sn\":\"aaaa\\\/bbbb\"}"}
`
	data := struct {
		Body string `json:"body"`
	}{}
	json.Unmarshal([]byte(rawStr), &data)

這樣的話 我得再定義一個struct,然后把body的string解析出來:

	body := struct {
		Sn string
	}{}
	json.Unmarshal([]byte(data.Body), &body)

能不能一次到位 定義好結構體一次解析到位呢?

因為之前有通過實現encoding.TextMarshaler接口來完成結構體里string字段的自定義marshaler,所以理所當然地想到實現encoding.TextUnmarshaler接口來完成自定義的unmarshal

type dataEx struct {
	Body bodyEx
}

type bodyEx struct {
	Sn string
}

func (p *bodyEx) UnmarshalText(text []byte) error {
	return nil
}

func marshalEx(rawStr string) {
	data := &dataEx{}
	err := json.Unmarshal([]byte(rawStr), data)
	if err != nil {
		panic(err)
	}
}

先測試下,在unmarshaltext方法上打上斷點,果然停住了。

實現unmarshaltext,如果直接用dataEx結構體去接收,是解析不了的,因為json解析器在掃描到body字段的value的時候 是當做 json的string處理的,那么我們在UnmarshalText方法里拿到的就是那段字符串,因此只要將這段字符串再解析到bodyEx里就好了:
本來預想的是這樣就ok了:

func (p *bodyEx) UnmarshalText(text []byte) error {
	return json.Unmarshal(text, p)
}

實際運行發現報錯:

json: cannot unmarshal object into Go struct field dataEx.Body of type *main.bodyEx

實際上 這段json解析到這樣的結構體上應該是沒問題的,現在報錯 只能說是因為擴展了UnmarshalText方法導致的。因此暫時這樣處理:

type dataEx struct {
	Body bodyEx
}

type bodyEx struct {
	Sn string
}
type bodyEx2 bodyEx

func (p *bodyEx) UnmarshalText(text []byte) error {
	t := bodyEx2{}
	err := json.Unmarshal(text, &t)
	if err != nil {
		return err
	}
	*p = bodyEx(t)
	return nil
}

至此,解決了json里被轉義的json字符串一次解析到結構體里的問題。

因為上面使用bodyEx2這樣的處理只是自己的猜測和嘗試,我想看看到底為啥實現了UnmarshalText后就不能解析了。因此翻看json.Encode()源碼

scanner

要實現對json字符串的解析,實際上就是對這段字符串進行詞法分析,解析出json里的 obj、number、array、key、value等
json包里有一個scanner,它就是一個狀態機:

// A scanner is a JSON scanning state machine.
// Callers call scan.reset() and then pass bytes in one at a time
// by calling scan.step(&scan, c) for each byte.
// The return value, referred to as an opcode, tells the
// caller about significant parsing events like beginning
// and ending literals, objects, and arrays, so that the
// caller can follow along if it wishes.
// The return value scanEnd indicates that a single top-level
// JSON value has been completed, *before* the byte that
// just got passed in.  (The indication must be delayed in order
// to recognize the end of numbers: is 123 a whole value or
// the beginning of 12345e+6?).

scanner的結構如下:

type scanner struct {
	// step 是遍歷用的函數,它會隨着狀態的不同被賦予不同的實現方法
	step func(*scanner, byte) int
	// Reached end of top-level value.
	endTop bool
	// Stack of what we're in the middle of - array values, object keys, object values.
	parseState []int
	// Error that happened, if any.
	err error
	// total bytes consumed, updated by decoder.Decode
	bytes int64
}

簡單看一下stateBeginValue狀態函數


// stateBeginValue 是開始讀取的狀態
func stateBeginValue(s *scanner, c byte) int {
	if c <= ' ' && isSpace(c) {
		return scanSkipSpace
	}
	switch c {
	case '{':
		s.step = stateBeginStringOrEmpty
		s.pushParseState(parseObjectKey)
		return scanBeginObject
	case '[':
		s.step = stateBeginValueOrEmpty
		s.pushParseState(parseArrayValue)
		return scanBeginArray
	case '"':
		s.step = stateInString
		return scanBeginLiteral
	case '-':
		s.step = stateNeg
		return scanBeginLiteral
	case '0': // beginning of 0.123
		s.step = state0
		return scanBeginLiteral
	case 't': // beginning of true
		s.step = stateT
		return scanBeginLiteral
	case 'f': // beginning of false
		s.step = stateF
		return scanBeginLiteral
	case 'n': // beginning of null
		s.step = stateN
		return scanBeginLiteral
	}
	if '1' <= c && c <= '9' { // beginning of 1234.5
		s.step = state1
		return scanBeginLiteral
	}
	return s.error(c, "looking for beginning of value")
}

一段正常的json,開始讀取的時候(跳過空格后),如果讀到'{'name就意味着是一個obj,如果遇到'['就意味着是一個array,如果遇到其他的,都會返回scanBeginLiteral標記,而這個標記就決定着unmarshal的時候如何映射到對應的結構體里。
decodeStateliteralStore方法里,有各種處理:


// literalStore decodes a literal stored in item into v.
//
// fromQuoted indicates whether this literal came from unwrapping a
// string from the ",string" struct tag option. this is used only to
// produce more helpful error messages.
func (d *decodeState) literalStore(item []byte, v reflect.Value, fromQuoted bool) error {
	// Check for unmarshaler.
	if len(item) == 0 {
		//Empty string given
		d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type()))
		return nil
	}
	isNull := item[0] == 'n' // null
	u, ut, pv := indirect(v, isNull)
	if u != nil {
		return u.UnmarshalJSON(item)
	}
	if ut != nil {
		if item[0] != '"' {
			if fromQuoted {
				d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type()))
				return nil
			}
			val := "number"
			switch item[0] {
			case 'n':
				val = "null"
			case 't', 'f':
				val = "bool"
			}
			d.saveError(&UnmarshalTypeError{Value: val, Type: v.Type(), Offset: int64(d.readIndex())})
			return nil
		}
		s, ok := unquoteBytes(item)
		if !ok {
			if fromQuoted {
				return fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type())
			}
			panic(phasePanicMsg)
		}
		return ut.UnmarshalText(s)
	}

	v = pv

	switch c := item[0]; c {
	case 'n': // null
		// The main parser checks that only true and false can reach here,
		// but if this was a quoted string input, it could be anything.
		if fromQuoted && string(item) != "null" {
			d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type()))
			break
		}
		switch v.Kind() {
		case reflect.Interface, reflect.Ptr, reflect.Map, reflect.Slice:
			v.Set(reflect.Zero(v.Type()))
			// otherwise, ignore null for primitives/string
		}
	case 't', 'f': // true, false
		value := item[0] == 't'
		// The main parser checks that only true and false can reach here,
		// but if this was a quoted string input, it could be anything.
		if fromQuoted && string(item) != "true" && string(item) != "false" {
			d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type()))
			break
		}
		switch v.Kind() {
		default:
			if fromQuoted {
				d.saveError(fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type()))
			} else {
				d.saveError(&UnmarshalTypeError{Value: "bool", Type: v.Type(), Offset: int64(d.readIndex())})
			}
		case reflect.Bool:
			v.SetBool(value)
		case reflect.Interface:
			if v.NumMethod() == 0 {
				v.Set(reflect.ValueOf(value))
			} else {
				d.saveError(&UnmarshalTypeError{Value: "bool", Type: v.Type(), Offset: int64(d.readIndex())})
			}
		}

	case '"': // string
		s, ok := unquoteBytes(item)
		if !ok {
			if fromQuoted {
				return fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type())
			}
			panic(phasePanicMsg)
		}
		switch v.Kind() {
		default:
			d.saveError(&UnmarshalTypeError{Value: "string", Type: v.Type(), Offset: int64(d.readIndex())})
		case reflect.Slice:
			if v.Type().Elem().Kind() != reflect.Uint8 {
				d.saveError(&UnmarshalTypeError{Value: "string", Type: v.Type(), Offset: int64(d.readIndex())})
				break
			}
			b := make([]byte, base64.StdEncoding.DecodedLen(len(s)))
			n, err := base64.StdEncoding.Decode(b, s)
			if err != nil {
				d.saveError(err)
				break
			}
			v.SetBytes(b[:n])
		case reflect.String:
			v.SetString(string(s))
		case reflect.Interface:
			if v.NumMethod() == 0 {
				v.Set(reflect.ValueOf(string(s)))
			} else {
				d.saveError(&UnmarshalTypeError{Value: "string", Type: v.Type(), Offset: int64(d.readIndex())})
			}
		}

	default: // number
		if c != '-' && (c < '0' || c > '9') {
			if fromQuoted {
				return fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type())
			}
			panic(phasePanicMsg)
		}
		s := string(item)
		switch v.Kind() {
		default:
			if v.Kind() == reflect.String && v.Type() == numberType {
				v.SetString(s)
				if !isValidNumber(s) {
					return fmt.Errorf("json: invalid number literal, trying to unmarshal %q into Number", item)
				}
				break
			}
			if fromQuoted {
				return fmt.Errorf("json: invalid use of ,string struct tag, trying to unmarshal %q into %v", item, v.Type())
			}
			d.saveError(&UnmarshalTypeError{Value: "number", Type: v.Type(), Offset: int64(d.readIndex())})
		case reflect.Interface:
			n, err := d.convertNumber(s)
			if err != nil {
				d.saveError(err)
				break
			}
			if v.NumMethod() != 0 {
				d.saveError(&UnmarshalTypeError{Value: "number", Type: v.Type(), Offset: int64(d.readIndex())})
				break
			}
			v.Set(reflect.ValueOf(n))

		case reflect.Int, reflect.Int8, reflect.Int16, reflect.Int32, reflect.Int64:
			n, err := strconv.ParseInt(s, 10, 64)
			if err != nil || v.OverflowInt(n) {
				d.saveError(&UnmarshalTypeError{Value: "number " + s, Type: v.Type(), Offset: int64(d.readIndex())})
				break
			}
			v.SetInt(n)

		case reflect.Uint, reflect.Uint8, reflect.Uint16, reflect.Uint32, reflect.Uint64, reflect.Uintptr:
			n, err := strconv.ParseUint(s, 10, 64)
			if err != nil || v.OverflowUint(n) {
				d.saveError(&UnmarshalTypeError{Value: "number " + s, Type: v.Type(), Offset: int64(d.readIndex())})
				break
			}
			v.SetUint(n)

		case reflect.Float32, reflect.Float64:
			n, err := strconv.ParseFloat(s, v.Type().Bits())
			if err != nil || v.OverflowFloat(n) {
				d.saveError(&UnmarshalTypeError{Value: "number " + s, Type: v.Type(), Offset: int64(d.readIndex())})
				break
			}
			v.SetFloat(n)
		}
	}
	return nil
}

它會先判斷 當前要映射的對象是否實現了 json.Unmarshaler接口和encoding.TextUnmarshaler接口,如果實現了前者,則直接調用前者的方法,否則,如果實現了后者,則針對引號開頭的(quotedjson),會調用其UnmarshalText方法,也就是我們之前實現的自定義方法。

這里看到了為什么我們可以擴展,那為啥開始我們直接把字符串unmarshal到實現了UnmarshalText的對象上會報錯呢?

我們在自定義方法里進行unmarshal的時候,這時候要解析的json是一段正常的json,而非quotedjson了,因此走的是decodeStateobject方法:

// object consumes an object from d.data[d.off-1:], decoding into v.
// The first byte ('{') of the object has been read already.
func (d *decodeState) object(v reflect.Value) error {
	// Check for unmarshaler.
	u, ut, pv := indirect(v, false)
	if u != nil {
		start := d.readIndex()
		d.skip()
		return u.UnmarshalJSON(d.data[start:d.off])
	}
	if ut != nil {
		d.saveError(&UnmarshalTypeError{Value: "object", Type: v.Type(), Offset: int64(d.off)})
		d.skip()
		return nil
	}
    ...//略去一堆
}

上面可以看出,針對obj的情況,若是實現了encoding.TextUnmarshaler接口,則直接返回錯誤了。


免責聲明!

本站轉載的文章為個人學習借鑒使用,本站對版權不負任何法律責任。如果侵犯了您的隱私權益,請聯系本站郵箱yoyou2525@163.com刪除。



 
粵ICP備18138465號   © 2018-2025 CODEPRJ.COM