使用Python读写包含中文的json


想要读写中文json,可以使用Python中的 json 库可以对json进行操作。读入数据可以使用 json.load。

f = open(fileName)  #建议使用open()替代file() http://stackoverflow.com/questions/6859499/difference-between-python-file-operation-modules-open-and-file

data = json.load(f)

json格式的数据被载入到一个dict类型的object对象中。

'''
In [5]: json.load??
Type:        function
String form: <function load at 0x0272DAB0>
File:        c:python27libjson__init__.py
Definition:  json.load(fp, encoding=None, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw)
Source:
def load(fp, encoding=None, cls=None, object_hook=None, parse_float=None,
        parse_int=None, parse_constant=None, object_pairs_hook=None, **kw):
'''

使用 json.dump 可以输出json。不过输出的文本并不是中文,而是转换为 utf-8的格式。此处需要:

json.dump(jsonData, targetFile, ensure_ascii=False, indent=4)
In [3]: import json

In [4]: json.dump??
Type:        function
String form: <function dump at 0x0272DA30>
File:        c:python27libjson__init__.py
Definition:  json.dump(obj, fp, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, encoding='utf-8', default=None, sort_keys=False, **kw)
Source:
def dump(obj, fp, skipkeys=False, ensure_ascii=True, check_circular=True,
        allow_nan=True, cls=None, indent=None, separators=None,
        encoding='utf-8', default=None, sort_keys=False, **kw):
    """Serialize ``obj`` as a JSON formatted stream to ``fp`` (a
    ``.write()``-supporting file-like object).

    If ``skipkeys`` is true then ``dict`` keys that are not basic types
    (``str``, ``unicode``, ``int``, ``long``, ``float``, ``bool``, ``None``)
    will be skipped instead of raising a ``TypeError``.

    If ``ensure_ascii`` is true (the default), all non-ASCII characters in the
    output are escaped with ``uXXXX`` sequences, and the result is a ``str``
    instance consisting of ASCII characters only.  If ``ensure_ascii`` is
    ``False``, some chunks written to ``fp`` may be ``unicode`` instances.
    This usually happens because the input contains unicode strings or the
    ``encoding`` parameter is used. Unless ``fp.write()`` explicitly
    understands ``unicode`` (as in ``codecs.getwriter``) this is likely to
    cause an error.

    If ``check_circular`` is false, then the circular reference check
    for container types will be skipped and a circular reference will
    result in an ``OverflowError`` (or worse).

    If ``allow_nan`` is false, then it will be a ``ValueError`` to
    serialize out of range ``float`` values (``nan``, ``inf``, ``-inf``)
    in strict compliance of the JSON specification, instead of using the
    JavaScript equivalents (``NaN``, ``Infinity``, ``-Infinity``).

    If ``indent`` is a non-negative integer, then JSON array elements and
    object members will be pretty-printed with that indent level. An indent
    level of 0 will only insert newlines. ``None`` is the most compact
    representation.  Since the default item separator is ``', '``,  the
    output might include trailing whitespace when ``indent`` is specified.
    You can use ``separators=(',', ': ')`` to avoid this.

    If ``separators`` is an ``(item_separator, dict_separator)`` tuple
    then it will be used instead of the default ``(', ', ': ')`` separators.
    ``(',', ':')`` is the most compact JSON representation.

    ``encoding`` is the character encoding for str instances, default is UTF-8.

    ``default(obj)`` is a function that should return a serializable version
    of obj or raise TypeError. The default simply raises TypeError.

    If *sort_keys* is ``True`` (default: ``False``), then the output of
    dictionaries will be sorted by key.

    To use a custom ``JSONEncoder`` subclass (e.g. one that overrides the
    ``.default()`` method to serialize additional types), specify it with
    the ``cls`` kwarg; otherwise ``JSONEncoder`` is used.

    """

输出中文的json。通过使用ensure_ascii=False,输出原有的语言文字。indent参数是缩进数量。

更改写文件格式:将上一步导出的 string 直接写文件会报错(可能只在Python2.7中出现):

UnicodeEncodeError: ‘ascii’ codec can’t encode characters in position 1-9: ordinal not in range(128)

这是由于此处输出的一些ascii编码不支持,所以报错。

解决的办法是,在输出的时候,对文件指定特定的UTF-8编码

import codecs
with codecs.open(path_to_fileName, 'w', 'utf-8') as fp:
    #write to fp
'''
In [1]: import codecs

In [2]: codecs.open??
Type:        function
String form: <function open at 0x025A8C30>
File:        c:python27libcodecs.py
Definition:  codecs.open(filename, mode='rb', encoding=None, errors='strict', buffering=1)
Source:
def open(filename, mode='rb', encoding=None, errors='strict', buffering=1):
...
'''
#!/usr/bin/env python
# coding=utf-8
import json, codecs

f_in = open('json.txt', 'r')
data_in = json.load(f_in)
f_out = open('json_out.txt', 'w')
json.dump(data_in, f_out, indent=4)
f_out2 = codecs.open('json_out2.txt', 'w', 'utf-8')
json.dump(data_in, f_out2, ensure_ascii=False, indent=4)

 

参考链接:
,

《 “使用Python读写包含中文的json” 》 有 8 条评论

  1. Python中如何将json对象读写文件 (python read write json file)
    https://stackoverflow.com/questions/12309269/how-do-i-write-json-data-to-a-file
    `
    # 写
    import json
    with open(‘data.json’, ‘w’) as fp:
    json.dump(data, fp)

    # 读
    with open(‘data.json’) as fp:
    data_loaded = json.load(fp)
    `

    Reading and Writing JSON to a File in Python
    https://stackabuse.com/reading-and-writing-json-to-a-file-in-python/

    Reading and Writing JSON through Python
    https://stackoverflow.com/questions/45791891/reading-and-writing-json-through-python

  2. jsonlines库:高效率的保存多个python对象
    https://mp.weixin.qq.com/s/fq5BMnC2FZyb3X4bWgX5uw
    `
    json文件因其简洁精炼,在网上特别流行,我们写爬虫时经常碰到网站使用json格式传输数据。但是如果要存储的数据有1G,那么读取一个json文件需要一次性读入,这需要占用很大的内存,对电脑压力过大。所以我们需要将数据存储为很多个对象,通过逐行读取方式减轻内存占用压力。所以今天就讲到jsonlines这个库,希望大家能有所收获。

    jsonlines
    1、每一行都是一个json或python对象
    2、采用utf-8编码
    `

  3. json中能使用注释么? (Can comments be used in JSON?)
    https://stackoverflow.com/questions/244777/can-comments-be-used-in-json
    `
    No.

    The JSON should all be data, and if you include a comment, then it will be data too.

    You could have a designated data element called “_comment” (or something) that would be ignored by apps that use the JSON data.

    You would probably be better having the comment in the processes that generates/receives the JSON, as they are supposed to know what the JSON data will be in advance, or at least the structure of it.

    不行。
    JSON的内容都必须是「数据」,因此如果你包含一个注释,它也会被当做是数据。比如,你可以添加一个”_comment”元素作为注释说明,但在程序中并不处理这个字段。
    `

  4. Python中如何将包含中文的json/dict进行格式化输出?
    https://stackoverflow.com/questions/12943819/how-to-prettyprint-a-json-file
    https://docs.python.org/2/library/json.html#json.dumps
    `
    import json
    your_json = ‘[“foo”, {“bar”:[“你好”, null, 1.0, 2]}]’ # json array string
    parsed = json.loads(your_json) # type(parsed) == list

    print(parsed)
    # [u’foo’, {u’bar’: [u’\u4f60\u597d’, None, 1.0, 2]}]
    print(json.dumps(parsed, indent=4, ensure_ascii=False))
    # [
    # “foo”,
    # {
    # “bar”: [
    # “\u4f60\u597d”,
    # null,
    # 1.0,
    # 2
    # ]
    # }
    # ]
    print(json.dumps(parsed, indent=4, ensure_ascii=False))
    # [
    # “foo”,
    # {
    # “bar”: [
    # “你好”,
    # null,
    # 1.0,
    # 2
    # ]
    # }
    # ]
    `

  5. python中如何将字典类型的变量转换成字节类型?很简单——对字符串变量进行 .encode(‘utf-8’) 编码即可
    Python: Convert dictionary to bytes
    https://stackoverflow.com/questions/55277431/python-convert-dictionary-to-bytes
    `
    import json

    user_dict = {‘name’: ‘dinesh’, ‘code’: ‘dr-01’}

    user_encode_data = json.dumps(user_dict, indent=2).encode(‘utf-8’)
    print(user_encode_data)

    user_dict_bytes = json.dumps(user_dict).encode(‘utf-8’)
    print(user_dict_bytes)
    `

  6. python判断是否汉字的5种方法实例
    https://www.jb51.net/python/290637ks9.htm
    `
    1. 使用Python内置的ord() — ord()函数将字符转换为Unicode编码,然后判断其范围是否在汉字的范围内 if ‘\u4e00’ <= char <= '\u9fff': return True

    2. 使用Python内置的unicodedata库 — if 'CJK' in unicodedata.name(char): return True

    3. 使用正则表达式 — 使用 [^\u4e00-\u9fa5] 可以匹配所有非汉字字符,而 [^\x00-\xff] 可以匹配所有双字节字符,包括汉字和符号等

    4. 使用中文字符集 — if b'\xb0\xal' <= word.encode('gb2312') <= b'\xd7\xf9': return True

    5. 使用第三方库 — 例如 xpinyin 库可以将一个字符串转换为拼音,并判断字符串是否为汉字
    `

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注