ASPIRE

使用Python读写包含中文的json

想要读写中文json，可以使用Python中的 json 库可以对json进行操作。读入数据可以使用 json.load。

f = open(fileName) #建议使用open()替代file() http://stackoverflow.com/questions/6859499/difference-between-python-file-operation-modules-open-and-file

data = json.load(f)

json格式的数据被载入到一个dict类型的object对象中。

'''
In [5]: json.load??
Type:        function
String form: <function load at 0x0272DAB0>
File:        c:python27libjson__init__.py
Definition:  json.load(fp, encoding=None, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)
Source:
def load(fp, encoding=None, cls=None, object_hook=None, parse_float=None,
        parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
'''

使用 json.dump 可以输出json。不过输出的文本并不是中文，而是转换为 utf-8的格式。此处需要：

json.dump(jsonData, targetFile, ensure_ascii=False, indent=4)

In [3]: import json

In [4]: json.dump??
Type:        function
String form: <function dump at 0x0272DA30>
File:        c:python27libjson__init__.py
Definition:  json.dump(obj, fp, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, encoding='utf-8', default=None, sort_keys=False, **kw)
Source:
def dump(obj, fp, skipkeys=False, ensure_ascii=True, check_circular=True,
        allow_nan=True, cls=None, indent=None, separators=None,
        encoding='utf-8', default=None, sort_keys=False, **kw):
    """Serialize ``obj`` as a JSON formatted stream to ``fp`` (a
    ``.write()``-supporting file-like object).

    If ``skipkeys`` is true then ``dict`` keys that are not basic types
    (``str``, ``unicode``, ``int``, ``long``, ``float``, ``bool``, ``None``)
    will be skipped instead of raising a ``TypeError``.

    If ``ensure_ascii`` is true (the default), all non-ASCII characters in the
    output are escaped with ``uXXXX`` sequences, and the result is a ``str``
    instance consisting of ASCII characters only.  If ``ensure_ascii`` is
    ``False``, some chunks written to ``fp`` may be ``unicode`` instances.
    This usually happens because the input contains unicode strings or the
    ``encoding`` parameter is used. Unless ``fp.write()`` explicitly
    understands ``unicode`` (as in ``codecs.getwriter``) this is likely to
    cause an error.

    If ``check_circular`` is false, then the circular reference check
    for container types will be skipped and a circular reference will
    result in an ``OverflowError`` (or worse).

    If ``allow_nan`` is false, then it will be a ``ValueError`` to
    serialize out of range ``float`` values (``nan``, ``inf``, ``-inf``)
    in strict compliance of the JSON specification, instead of using the
    JavaScript equivalents (``NaN``, ``Infinity``, ``-Infinity``).

    If ``indent`` is a non-negative integer, then JSON array elements and
    object members will be pretty-printed with that indent level. An indent
    level of 0 will only insert newlines. ``None`` is the most compact
    representation.  Since the default item separator is ``', '``,  the
    output might include trailing whitespace when ``indent`` is specified.
    You can use ``separators=(',', ': ')`` to avoid this.

    If ``separators`` is an ``(item_separator, dict_separator)`` tuple
    then it will be used instead of the default ``(', ', ': ')`` separators.
    ``(',', ':')`` is the most compact JSON representation.

    ``encoding`` is the character encoding for str instances, default is UTF-8.

    ``default(obj)`` is a function that should return a serializable version
    of obj or raise TypeError. The default simply raises TypeError.

    If *sort_keys* is ``True`` (default: ``False``), then the output of
    dictionaries will be sorted by key.

    To use a custom ``JSONEncoder`` subclass (e.g. one that overrides the
    ``.default()`` method to serialize additional types), specify it with
    the ``cls`` kwarg; otherwise ``JSONEncoder`` is used.

    """

输出中文的json。通过使用ensure_ascii=False，输出原有的语言文字。indent参数是缩进数量。

更改写文件格式：将上一步导出的 string 直接写文件会报错（可能只在Python2.7中出现）：

UnicodeEncodeError: ‘ascii’ codec can’t encode characters in position 1-9: ordinal not in range(128)

这是由于此处输出的一些ascii编码不支持，所以报错。

解决的办法是，在输出的时候，对文件指定特定的UTF-8编码：

import codecs
with codecs.open(path_to_fileName, 'w', 'utf-8') as fp:
    #write to fp
'''
In [1]: import codecs

In [2]: codecs.open??
Type:        function
String form: <function open at 0x025A8C30>
File:        c:python27libcodecs.py
Definition:  codecs.open(filename, mode='rb', encoding=None, errors='strict', buffering=1)
Source:
def open(filename, mode='rb', encoding=None, errors='strict', buffering=1):
...
'''

#!/usr/bin/env python
# coding=utf-8
import json, codecs

f_in = open('json.txt', 'r')
data_in = json.load(f_in)
f_out = open('json_out.txt', 'w')
json.dump(data_in, f_out, indent=4)
f_out2 = codecs.open('json_out2.txt', 'w', 'utf-8')
json.dump(data_in, f_out2, ensure_ascii=False, indent=4)

参考链接：

24 10 月, 2014

admin

Programing, Tools

json, Python

《 “使用Python读写包含中文的json” 》有 8 条评论

hi说道：

2018-12-14 20:50

Python中如何将json对象读写文件 (python read write json file)
https://stackoverflow.com/questions/12309269/how-do-i-write-json-data-to-a-file
`
# 写
import json
with open(‘data.json’, ‘w’) as fp:
json.dump(data, fp)

# 读
with open(‘data.json’) as fp:
data_loaded = json.load(fp)
`

Reading and Writing JSON to a File in Python
https://stackabuse.com/reading-and-writing-json-to-a-file-in-python/

Reading and Writing JSON through Python
https://stackoverflow.com/questions/45791891/reading-and-writing-json-through-python

回复
hi说道：

2018-12-19 16:30

jsonlines库：高效率的保存多个python对象
https://mp.weixin.qq.com/s/fq5BMnC2FZyb3X4bWgX5uw
`
json文件因其简洁精炼，在网上特别流行，我们写爬虫时经常碰到网站使用json格式传输数据。但是如果要存储的数据有1G，那么读取一个json文件需要一次性读入，这需要占用很大的内存，对电脑压力过大。所以我们需要将数据存储为很多个对象，通过逐行读取方式减轻内存占用压力。所以今天就讲到jsonlines这个库，希望大家能有所收获。

jsonlines
1、每一行都是一个json或python对象
2、采用utf-8编码
`

回复
hi说道：

2018-12-26 23:02

Python将json写入文件异常
python TypeError: set([]) is not JSON serializable
`
异常的原因为，在json里面有 set 类型的变量导致。解决办法分2种：
1. 将 set 替换成 list 类型；
2. 覆写 JSONEncoder 函数；
`
https://stackoverflow.com/questions/8230315/how-to-json-serialize-sets
https://codeday.me/bug/20170625/31452.html

回复
hi说道：

2019-06-11 13:32

json中能使用注释么？ (Can comments be used in JSON?)
https://stackoverflow.com/questions/244777/can-comments-be-used-in-json
`
No.

The JSON should all be data, and if you include a comment, then it will be data too.

You could have a designated data element called “_comment” (or something) that would be ignored by apps that use the JSON data.

You would probably be better having the comment in the processes that generates/receives the JSON, as they are supposed to know what the JSON data will be in advance, or at least the structure of it.

不行。
JSON的内容都必须是「数据」，因此如果你包含一个注释，它也会被当做是数据。比如，你可以添加一个”_comment”元素作为注释说明，但在程序中并不处理这个字段。
`

回复
hi说道：

2019-06-13 19:45

Python中如何将包含中文的json/dict进行格式化输出？
https://stackoverflow.com/questions/12943819/how-to-prettyprint-a-json-file
https://docs.python.org/2/library/json.html#json.dumps
`
import json
your_json = ‘[“foo”, {“bar”:[“你好”, null, 1.0, 2]}]’ # json array string
parsed = json.loads(your_json) # type(parsed) == list

print(parsed)
# [u’foo’, {u’bar’: [u’\u4f60\u597d’, None, 1.0, 2]}]
print(json.dumps(parsed, indent=4, ensure_ascii=False))
# [
# “foo”,
# {
# “bar”: [
# “\u4f60\u597d”,
# null,
# 1.0,
# 2
# ]
# }
# ]
print(json.dumps(parsed, indent=4, ensure_ascii=False))
# [
# “foo”,
# {
# “bar”: [
# “你好”,
# null,
# 1.0,
# 2
# ]
# }
# ]
`

回复
abc说道：

2022-10-27 15:46

Python中如何将字典dict写入文件（存为json格式的字符串）
Writing a dictionary to a text file?
https://stackoverflow.com/questions/36965507/writing-a-dictionary-to-a-text-file
`
import json
file.write(json.dumps(exDict)) # use `json.loads` to do the reverse
# 将字典类型的变量转换成字符串类型
`

回复
abc说道：

2022-10-27 15:47

python中如何将字典类型的变量转换成字节类型？很简单——对字符串变量进行 .encode(‘utf-8’) 编码即可
Python: Convert dictionary to bytes
https://stackoverflow.com/questions/55277431/python-convert-dictionary-to-bytes
`
import json

user_dict = {‘name’: ‘dinesh’, ‘code’: ‘dr-01’}

user_encode_data = json.dumps(user_dict, indent=2).encode(‘utf-8’)
print(user_encode_data)

user_dict_bytes = json.dumps(user_dict).encode(‘utf-8’)
print(user_dict_bytes)
`

回复
hi说道：

2024-09-05 17:44

python判断是否汉字的5种方法实例
https://www.jb51.net/python/290637ks9.htm
`
1. 使用Python内置的ord() — ord()函数将字符转换为Unicode编码，然后判断其范围是否在汉字的范围内 if ‘\u4e00’ <= char <= '\u9fff': return True

2. 使用Python内置的unicodedata库 — if 'CJK' in unicodedata.name(char): return True

3. 使用正则表达式 — 使用 [^\u4e00-\u9fa5] 可以匹配所有非汉字字符，而 [^\x00-\xff] 可以匹配所有双字节字符，包括汉字和符号等

4. 使用中文字符集 — if b'\xb0\xal' <= word.encode('gb2312') <= b'\xd7\xf9': return True

5. 使用第三方库 — 例如 xpinyin 库可以将一个字符串转换为拼音，并判断字符串是否为汉字
`

回复

使用Python读写包含中文的json

参考链接：

《 “使用Python读写包含中文的json” 》 有 8 条评论

发表回复 取消回复

《 “使用Python读写包含中文的json” 》有 8 条评论

发表回复取消回复