JSONÊý¾ÝÂÒÂëÎÊÌâ
²¢·¢±à³ÌÍø - ifeve.com 2015-10-03 3 ÔĶÁ
JSON ÂÒÂë
±³¾°
³ÌÐòÔ±Ò»Ìáµ½±àÂëÓ¦¸Ã¶¼²»Ä°Éú£¬Ïñgbk¡¢utf-8¡¢asciiµÈÕâЩ±àÂë¸üÊǾ³£ÔÚÓ㬵«Ê±²»Ê±Ò²»á³ö¸öÂÒÂëÎÊÌ⣬½â¾öÕâ¸öÎÊÌâµÄ·½·¨´ó²¿·Ö¶¼ÊÇÏÈgoogleºÍbaiduһϣ¬×îºó¿ÉÄÜÔÚij¸öê÷½Çê¸ê¹ÀïÕÒµ½Ò»µãÐÅÏ¢£¬È»ºó¾Í»úеµÄ°´²¿¾Í°àµÄÄ£·ÂÏÂÀ´,½á¹ûÎÊÌâ¿ÉÄÜÕæ¾ÍÓÈжø½âÁË,È»ºó¾Í²Ý²ÝÁËÊÂ,Ï»ØÓöµ½ÏàËÆµÄÎÊÌâ,¿ÉÄÜÓÖÊÇÖØ¸´ÉÏÃæµÄ¹ý³Ì¡£ºÜÉÙÓÐÈËÓÐÄÍÐÄÈ¥»¨¾«Á¦ÅªÃ÷°×ÕâдÎÊÌâµÄ¸ù±¾ÔÒò£¬ÒÔ¼°½â¾öÕâЩÎÊÌâµÄÔÀíÊÇʲô¡£ÕâÆªÎÄÕ¾ÍÊÇͨ¹ýÒ»¸öʵ¼Ê°¸Àý,ÊÔ×ÅÈ¥½²Çå³þʲôÊDZàÂë,ÂÒÂëÓÖÊÇÔõô²úÉúµÄ,ÒÔ¼°ÈçºÎ½â¾ö¡£¸Ã°¸ÀýÊÇ´Ólua_cjson.cÕâ¸ö¿â¿ªÊ¼µÄ,¶ÔÕâ¸ö¿â²»ÊìϤҲû¹ØÏµ,Ò²²»ÐèÒªÊìϤËü,ÎÒÃÇÖ»ÊǽèÓÃËüÀ´ËµÃ÷ÂÒÂëÎÊÌâ,Ö»ÐèÒª¸ú×ÅÎÄÕµÄ˼·×߾ͿÉÒÔ¡£
ǰ¶Îʱ¼äͬÊÂÔÚ×÷Ò»¸öÐÂÏîÄ¿µÄʱºòÓõ½ÁËlua_cjson.cÕâ¸ö¿â(ÒÔϼò³Æcjson)£¬½«json´®×ª»»³ÉLUA±¾µØµÄÊý¾Ý½á¹¹,µ«ÊÇÔÚʹÓõĹý³ÌÖгöÏÖÁËÖÐÎÄÂÒÂëÎÊÌâ£¬Ææ¹ÖµÄÊÇÖ»ÓÐÄÇô¼¸¸ö×ÖÊÇÂÒÂ룬ÕâÆäÖоͰüÀ¨¡±–\¡±×Ö£¬ÆäËû×ÖÒ»ÇÐÕý³£¡£¾Á˽âjson´®ÓõÄÊÇGBK±àÂ룬ÄÇÎÊÌâ¾ÍÀ´ÁË£¬ÎªÊ²Ã´ÓÃgbk±àÂë»á³öÏÖÕâ¸öÎÊÌ⣬ÔÒòÊÇʲô£¿ÓÖÓ¦¸ÃÔõô½â¾öÕâ¸öÎÊÌ⣿
Òª½âÊÍÇå³þÕâ¸öÎÊÌ⣬Ê×ÏÈÎÒÃÇÀ´¿´¿´json´®¶¼ÓÐÄÄЩҪÇó¡£
JSON¹æ·¶
jsonÈ«³ÆJavaScript Object NotionÊǽṹ»¯Êý¾ÝÐòÁл¯µÄÒ»¸öÎı¾£¬¿ÉÒÔÃèÊöËÄÖÖ»ù±¾ÀàÐÍ(strings,numbers,booleans and null)ºÍÁ½ÖֽṹÀàÐÍ(objects and arrays)¡£
RFC4627ÖÐÓÐÕâÑùÒ»¶Î»°
A string is a sequence of zero or more Unicode characters.
×Ö·û´®ÓÐÁã¸ö»ò¶à¸öunicode×Ö·ûÐòÁÐ×é³É.
ÔÚÕâÀïÉÔ΢½âÊÍÏÂʲôÊÇunicode×Ö·û¡£ÎÒÃǶ¼ÖªµÀascii×Ö·ûÓÐ×Öĸ¡¢Êý×ÖµÈ,µ«ÊÇËûÊÕ¼µÄ×ÖÖ»ÓÐÒ»°Ù¶à¸ö¡£±ÈÈ纺×־Ͳ»ÊÇascii×Ö·û£¬µ«ÊÇunicodeÊÕ¼Á˺º×Ö,ËùÒÔºº×Ö¿ÉÒÔÊÇunicode×Ö·û¡£ÕâÀïҪ˵Ã÷µÄÊÇunicode×Ö·ûÆäʵ¾ÍÊÇһЩ·ûºÅ¡£
ÏÖÔÚÁíÒ»¸öÎÊÌâ³öÀ´ÁË£¬ÔÚjsonÎı¾ÖÐÓ¦¸ÃÔõô±íʾÕâЩ×Ö·û¡£
Ôڹ淶µÄEncodingƬ¶ÎÊÇÕâÑù˵µÄ
JSON text SHALL be encoded in Unicode. The default encoding is UTF-8¡£
JSONÎı¾SHALL°Ñunicode×Ö·û±àÂ롣ĬÈÏʹÓÃutf-8±àÂë¡£
ÎÒÃÇ¿´µ½ÔÚÕâÀïÓõ½ÁËSHALL[RFC2119]Õâ¸ö¹Ø¼ü×Ö£¬Ò²¾ÍÊÇ˵×Ö·û±ØÐë±»±àÂëºó²ÅÄÜ×÷ΪJSON´®Ê¹Ó᣶øÇÒĬÈÏʹÓÃutf-8±àÂë¡£
ÈçºÎÅжÏʹÓõÄÊÇÄÇÖÖunicode±àÂëÄØ?
Since the first two characters of a JSON text will always be ASCII characters[RFC0020],
it is possible to determine whether an octet stream is UTF-8¡¢UTF-16(BE or LE), or
UTF-32(BE or LE)by looking at the pattern of nulls in the first four octets.
ÓÉÓÚjsonÎı¾µÄǰÁ½¸ö×Ö·û(×¢ÒâÕâÀï˵µÄÊÇ×Ö·û,²»ÊÇ×Ö½Ú)Ò»¶¨ÊÇASCII×Ö·û,Òò´Ë¿ÉÒÔ´ÓÒ»¸ö×Ö½Ú
Á÷µÄǰËĸö×Ö½Ú(×¢ÒâÊÇ×Ö½Ú)ÖÐÅжϳö¸Ã×Ö½ÚÁ÷ÊÇUTF-8¡¢UTF-16(BE or LE)¡¢or UTF-32(BE or LE)±àÂë¡£
00 00 00 xx UTF-32BE (u32±àÂë´ó¶Ë)
xx 00 00 00 UTF-32LE (u32±àÂëС¶Ë)
00 xx 00 xx UTF-16BE (u16±àÂë´ó¶Ë)
xx 00 xx 00 UTF-16LE (u16±àÂëС¶Ë)
xx xx xx xx UTF-8 (utf-8±àÂë)
ps:
u32ÓÃ32λµÄ4×Ö½ÚÕûÊý±íʾһ¸ö×Ö·û£»
u16ÓÃ16λµÄ2×Ö½ÚÕûÊý±íʾһ¸ö×Ö·û,Èç¹û2×Ö½Ú±íʾ²»ÁË,¾ÍÓÃÁ¬ÐøÁ½¸ö16λµÄ2×Ö½ÚÕû
Êý±íʾ£¬ËùÒԾͻá³öÏÖu16±àÂëÖÐÓÐ4¸ö×Ö½Ú±íʾһ¸ö×Ö·ûµÄÇé¿ö£¬ºÍu32µÄËÄ×Ö½Ú²»Ò»
ÑùµÄÊÇ,¸Ã×Ö·ûÔÚu16ÖеÄǰÁ½¸ö×ֽںͺóÁ½¸ö×Ö½ÚÖ®¼ä²»»áÓÐ×ÖÐòµÄÎÊÌâ¡£
utf-8Óöà¸ö8λµÄ1×Ö½ÚÐòÁÐÀ´±íʾһ¸ö×Ö·û,ËùÒÔûÓÐ×ÖÐòµÄÎÊÌâ.
½ØÖ¹µ½ÏÖÔÚÎÒÃÇûÓп´µ½ÈκιØÓÚ¿ÉÒÔʹÓÃGBK±àÂëµÄÐÅÏ¢,ÄѵÀjsonÎı¾¾Í²»ÄÜÓÃgbk±àÂëÂð,Èç¹ûÕæµÄ²»ÄÜÓõϰ£¬ÄÇΪʲôcjson²»ÊǰÑËùÓеÄgbk±àÂë½âÊͳÆÂÒÂë,¶øÊÇÖ»ÓÐij¼¸¸ö×ÖÊÇÂÒÂë.
Ôڹ淶ÖжÔjson½âÎöÆ÷ÓÐÕâÑùÒ»¶ÎÃèÊö:
A JSON parser transforms a JSON text into another representation.
A JSON parser MUST accept all texts that conform to the JSON grammar.
A JSON parser MAY accept non-JSON forms or extensions.
json½âÎöÆ÷¿ÉÒÔ½«Ò»¸öjsonÎı¾×ª»»³ÉÆäËû±íʾ·½Ê½¡£
json½âÎöÆ÷MUST½ÓÊÜËùÓзûºÏjsonÓï·¨µÄÎı¾.
json½âÎöÆ÷MAY½ÓÊÜ·ÇjsonÐÎʽ»òÀ©Õ¹µÄÎı¾.
ÂÒÂëµÄÔÒò
´Ó¹æ·¶¶Ô¶Ô½âÎöÆ÷µÄÃèÊö¿ÉÒÔ¿´µ½,¹æ·¶²¢Ã»ÓÐÒªÇó½âÎöÆ÷±ØÐë¶ÔÎı¾µÄ±àÂ뷽ʽ×öУÑé,¶øÇÒ½âÎöÆ÷Ò²¿ÉÒÔÓÐÑ¡ÔñµÄÈ¥½ÓÊÜ·ÇjsonÐÎʽµÄÎı¾¡£
ÏÖÔÚÎÒÃÇÔÙÀ´¿´¿´cjson½âÎöÆ÷ÊÇÈçºÎ×öµÄ£¬ÔÚcjson¿ªÍ·µÄ×¢ÊÍÖÐ˵ÁËÕâôһ¾ä»°:
Invalid UTF-8 characters are not detected and will be passed untouched¡£
If required, UTF-8 error checking should be done outside this library¡£
·¢ÏÖÎÞЧµÄUTF-8±àÂë»áÖ±½Ó·Å¹ý,Èç¹ûÓбØÒª¶ÔUTF-8±àÂëµÄ¼ì²éÓ¦¸ÃÔڸÿâµÄÖ®Íâ¡£
˵µÄºÜÇå³þ,¶Ô·Çutf8±àÂëÖ±½Ó·Å¹ý£¬²»×öÈκμì²é,ËùÒÔÓÃgbk±àÂë²»·ûºÏ¹æ·¶,µ«ÓÖ¿ÉÒÔ±»½âÎöµÄ´ð°¸¾Í³öÀ´ÁË¡£ÄÇ¡±–\¡±µÈÕâЩ×ÖµÄÂÒÂëÎÊÌâÓÖÊÇÔõô»ØÊÂ? ÎÒÃÇÏÖÔÚ¿´¿´cjson¶Ô¹æ·¶ÖеÄÁíÍâÁ½¸ö±àÂëutf16¡¢utf32ÊÇÈçºÎ×öµÄ£¬È»ºóÔÙ˵ÂÒÂëÎÊÌâ.
ÔÚcjson½âÎö·½·¨µÄ¿ªÊ¼´¦ÊÇÕâô×öµÄ:
/* Detect Unicode other than UTF-8(see RFC 4627, Sec 3)
*
* CJSON can support any simple data type, hence only the first
* character is guaranteed to be ASCII (at worst:'"'). This is
* still enough to detect whether the wrong encoding is in use.
*/
if (json_len >=2 && (!json.data[0] || !json.data[1]))
luaL_error(1,"JSON parser does not support UTF-16 or UTF-32");
Ç°ÃæÎÒÃÇ˵¹ýÒ»¸öjson´®µÄǰÁ½¸ö×Ö·ûÒ»¶¨ÊÇascii×Ö·û,Ò²¾ÍÊÇ˵һ¸öjson´®ÖÁÉÙÒ²µÄÓÐÁ½¸ö×Ö½Ú.ËùÒÔÕâ¶Î´úÂëÊ×ÏÈÅжÏjson´®µÄ³¤¶ÈÊDz»ÊÇ´óÓÚµÈ2,È»ºó¸ù¾Ý´®µÄǰÁ½¸ö×Ö½ÚµÄÖµ,ÊÇ·ñÓÐÁãÀ´ÅжϸÃÎı¾ÊÇ·ñÊÇ·Çutf-8±àÂë¡£½á¹ûÒѾ¿´µ½ÁË£¬È˼Ҳ»Ö§³Ö¹æ·¶ÉÏ˵µÄu16ºÍu32±àÂë.
¡¡