想要读写中文json,可以使用Python中的 json 库可以对json进行操作。读入数据可以使用 json.load。
f = open(fileName) #建议使用open()替代file() http://stackoverflow.com/questions/6859499/difference-between-python-file-operation-modules-open-and-file
data = json.load(f)
json格式的数据被载入到一个dict类型的object对象中。
''' In [5]: json.load?? Type: function String form: <function load at 0x0272DAB0> File: c:python27libjson__init__.py Definition: json.load(fp, encoding=None, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw) Source: def load(fp, encoding=None, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw): '''
使用 json.dump 可以输出json。不过输出的文本并不是中文,而是转换为 utf-8的格式。此处需要:
json.dump(jsonData, targetFile, ensure_ascii=False, indent=4)
In [3]: import json In [4]: json.dump?? Type: function String form: <function dump at 0x0272DA30> File: c:python27libjson__init__.py Definition: json.dump(obj, fp, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, encoding='utf-8', default=None, sort_keys=False, **kw) Source: def dump(obj, fp, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, encoding='utf-8', default=None, sort_keys=False, **kw): """Serialize ``obj`` as a JSON formatted stream to ``fp`` (a ``.write()``-supporting file-like object). If ``skipkeys`` is true then ``dict`` keys that are not basic types (``str``, ``unicode``, ``int``, ``long``, ``float``, ``bool``, ``None``) will be skipped instead of raising a ``TypeError``. If ``ensure_ascii`` is true (the default), all non-ASCII characters in the output are escaped with ``uXXXX`` sequences, and the result is a ``str`` instance consisting of ASCII characters only. If ``ensure_ascii`` is ``False``, some chunks written to ``fp`` may be ``unicode`` instances. This usually happens because the input contains unicode strings or the ``encoding`` parameter is used. Unless ``fp.write()`` explicitly understands ``unicode`` (as in ``codecs.getwriter``) this is likely to cause an error. If ``check_circular`` is false, then the circular reference check for container types will be skipped and a circular reference will result in an ``OverflowError`` (or worse). If ``allow_nan`` is false, then it will be a ``ValueError`` to serialize out of range ``float`` values (``nan``, ``inf``, ``-inf``) in strict compliance of the JSON specification, instead of using the JavaScript equivalents (``NaN``, ``Infinity``, ``-Infinity``). If ``indent`` is a non-negative integer, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0 will only insert newlines. ``None`` is the most compact representation. Since the default item separator is ``', '``, the output might include trailing whitespace when ``indent`` is specified. You can use ``separators=(',', ': ')`` to avoid this. If ``separators`` is an ``(item_separator, dict_separator)`` tuple then it will be used instead of the default ``(', ', ': ')`` separators. ``(',', ':')`` is the most compact JSON representation. ``encoding`` is the character encoding for str instances, default is UTF-8. ``default(obj)`` is a function that should return a serializable version of obj or raise TypeError. The default simply raises TypeError. If *sort_keys* is ``True`` (default: ``False``), then the output of dictionaries will be sorted by key. To use a custom ``JSONEncoder`` subclass (e.g. one that overrides the ``.default()`` method to serialize additional types), specify it with the ``cls`` kwarg; otherwise ``JSONEncoder`` is used. """
输出中文的json。通过使用ensure_ascii=False,输出原有的语言文字。indent参数是缩进数量。
更改写文件格式:将上一步导出的 string 直接写文件会报错(可能只在Python2.7中出现):
UnicodeEncodeError: ‘ascii’ codec can’t encode characters in position 1-9: ordinal not in range(128)
这是由于此处输出的一些ascii编码不支持,所以报错。
解决的办法是,在输出的时候,对文件指定特定的UTF-8编码:
import codecs with codecs.open(path_to_fileName, 'w', 'utf-8') as fp: #write to fp ''' In [1]: import codecs In [2]: codecs.open?? Type: function String form: <function open at 0x025A8C30> File: c:python27libcodecs.py Definition: codecs.open(filename, mode='rb', encoding=None, errors='strict', buffering=1) Source: def open(filename, mode='rb', encoding=None, errors='strict', buffering=1): ... '''
#!/usr/bin/env python # coding=utf-8 import json, codecs f_in = open('json.txt', 'r') data_in = json.load(f_in) f_out = open('json_out.txt', 'w') json.dump(data_in, f_out, indent=4) f_out2 = codecs.open('json_out2.txt', 'w', 'utf-8') json.dump(data_in, f_out2, ensure_ascii=False, indent=4)
《 “使用Python读写包含中文的json” 》 有 8 条评论
Python中如何将json对象读写文件 (python read write json file)
https://stackoverflow.com/questions/12309269/how-do-i-write-json-data-to-a-file
`
# 写
import json
with open(‘data.json’, ‘w’) as fp:
json.dump(data, fp)
# 读
with open(‘data.json’) as fp:
data_loaded = json.load(fp)
`
Reading and Writing JSON to a File in Python
https://stackabuse.com/reading-and-writing-json-to-a-file-in-python/
Reading and Writing JSON through Python
https://stackoverflow.com/questions/45791891/reading-and-writing-json-through-python
jsonlines库:高效率的保存多个python对象
https://mp.weixin.qq.com/s/fq5BMnC2FZyb3X4bWgX5uw
`
json文件因其简洁精炼,在网上特别流行,我们写爬虫时经常碰到网站使用json格式传输数据。但是如果要存储的数据有1G,那么读取一个json文件需要一次性读入,这需要占用很大的内存,对电脑压力过大。所以我们需要将数据存储为很多个对象,通过逐行读取方式减轻内存占用压力。所以今天就讲到jsonlines这个库,希望大家能有所收获。
jsonlines
1、每一行都是一个json或python对象
2、采用utf-8编码
`
Python将json写入文件异常
python TypeError: set([]) is not JSON serializable
`
异常的原因为,在json里面有 set 类型的变量导致。解决办法分2种:
1. 将 set 替换成 list 类型;
2. 覆写 JSONEncoder 函数;
`
https://stackoverflow.com/questions/8230315/how-to-json-serialize-sets
https://codeday.me/bug/20170625/31452.html
json中能使用注释么? (Can comments be used in JSON?)
https://stackoverflow.com/questions/244777/can-comments-be-used-in-json
`
No.
The JSON should all be data, and if you include a comment, then it will be data too.
You could have a designated data element called “_comment” (or something) that would be ignored by apps that use the JSON data.
You would probably be better having the comment in the processes that generates/receives the JSON, as they are supposed to know what the JSON data will be in advance, or at least the structure of it.
不行。
JSON的内容都必须是「数据」,因此如果你包含一个注释,它也会被当做是数据。比如,你可以添加一个”_comment”元素作为注释说明,但在程序中并不处理这个字段。
`
Python中如何将包含中文的json/dict进行格式化输出?
https://stackoverflow.com/questions/12943819/how-to-prettyprint-a-json-file
https://docs.python.org/2/library/json.html#json.dumps
`
import json
your_json = ‘[“foo”, {“bar”:[“你好”, null, 1.0, 2]}]’ # json array string
parsed = json.loads(your_json) # type(parsed) == list
print(parsed)
# [u’foo’, {u’bar’: [u’\u4f60\u597d’, None, 1.0, 2]}]
print(json.dumps(parsed, indent=4, ensure_ascii=False))
# [
# “foo”,
# {
# “bar”: [
# “\u4f60\u597d”,
# null,
# 1.0,
# 2
# ]
# }
# ]
print(json.dumps(parsed, indent=4, ensure_ascii=False))
# [
# “foo”,
# {
# “bar”: [
# “你好”,
# null,
# 1.0,
# 2
# ]
# }
# ]
`
Python中如何将字典dict写入文件(存为json格式的字符串)
Writing a dictionary to a text file?
https://stackoverflow.com/questions/36965507/writing-a-dictionary-to-a-text-file
`
import json
file.write(json.dumps(exDict)) # use `json.loads` to do the reverse
# 将字典类型的变量转换成字符串类型
`
python中如何将字典类型的变量转换成字节类型?很简单——对字符串变量进行 .encode(‘utf-8’) 编码即可
Python: Convert dictionary to bytes
https://stackoverflow.com/questions/55277431/python-convert-dictionary-to-bytes
`
import json
user_dict = {‘name’: ‘dinesh’, ‘code’: ‘dr-01’}
user_encode_data = json.dumps(user_dict, indent=2).encode(‘utf-8’)
print(user_encode_data)
user_dict_bytes = json.dumps(user_dict).encode(‘utf-8’)
print(user_dict_bytes)
`
python判断是否汉字的5种方法实例
https://www.jb51.net/python/290637ks9.htm
`
1. 使用Python内置的ord() — ord()函数将字符转换为Unicode编码,然后判断其范围是否在汉字的范围内 if ‘\u4e00’ <= char <= '\u9fff': return True
2. 使用Python内置的unicodedata库 — if 'CJK' in unicodedata.name(char): return True
3. 使用正则表达式 — 使用 [^\u4e00-\u9fa5] 可以匹配所有非汉字字符,而 [^\x00-\xff] 可以匹配所有双字节字符,包括汉字和符号等
4. 使用中文字符集 — if b'\xb0\xal' <= word.encode('gb2312') <= b'\xd7\xf9': return True
5. 使用第三方库 — 例如 xpinyin 库可以将一个字符串转换为拼音,并判断字符串是否为汉字
`