=Start=
缘由:
整理总结一下最近遇到比较多的Python中字节和字符串之间的小知识点,方便以后快速参考、学习。
正文:
参考解答:
Python 3中的bytes和str类型
Python 3最重要的新特性大概要算是对文本和二进制数据作了更为清晰的区分。文本总是Unicode,由str类型表示,二进制数据则由bytes类型表示。Python 3不会以任意隐式的方式混用str和bytes,正是这使得两者的区分特别清晰。你不能拼接字符串和字节包,也无法在字节包里搜索字符串(反之亦然),也不能将字符串传入参数为字节包的函数(反之亦然)。这是件好事。
字符串可以编码encode()成字节包,而字节包可以解码decode()成字符串。
# Python 3 交互终端
>>> website = 'https://ixyzero.com/blog/'
>>> type(website)
<class 'str'>
>>> website
'https://ixyzero.com/blog/'
# 将 string 转换成 bytes ,使用 .encode() 方法
>>> website_bytes_utf8 = website.encode(encoding="utf-8")
>>> type(website_bytes_utf8)
<class 'bytes'>
>>> website_bytes_utf8
b'https://ixyzero.com/blog/'
# 将 bytes 转换成 string ,使用 .decode() 方法
>>> website_string = website_bytes_utf8.decode()
>>> type(website_string)
<class 'str'>
>>> website_string
'https://ixyzero.com/blog/'
>>>
&
>>> b"abcde"
b'abcde'
# utf-8 is used here because it is a very common encoding, but you
# need to use the encoding your data is actually in.
>>> b"abcde".decode("utf-8")
'abcde'
Python中如何获取某个字符串的「字节长度」?(python get string byte length)
def utf8len(s):
return len(s.encode('utf-8'))
&
# getsizeof(object, default) -> int
# Return the size of object in bytes.
# 这种方法获取的是Python对象的bytes大小,和我们期望的效果并不相同,而且不同版本、系统的值也并不一致
import sys
sys.getsizeof(s)
>>> len("hello".encode("utf8"))
5
>>> len("你好".encode("utf8"))
6
####
Python 2.7.10 (default, Aug 17 2018, 19:45:58)
[GCC 4.2.1 Compatible Apple LLVM 10.0.0 (clang-1000.0.42)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> reload(sys)
<module 'sys' (built-in)>
>>> sys.setdefaultencoding('utf-8')
>>>
>>> utf8len('你好')
6
>>> utf8len('hello')
5
>>> sys.getsizeof('你好')
43
>>>
>>> sys.getsizeof('hello')
42
>>>
####
Python 3.6.5 (default, Apr 10 2018, 20:17:30)
[GCC 4.2.1 Compatible Apple LLVM 9.1.0 (clang-902.0.39.1)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.getsizeof('你好')
78
>>> sys.getsizeof('hello')
54
>>>
>>> utf8len('你好')
6
>>> utf8len('hello')
5
>>>
参考链接:
- Python 3的bytes/str之别
- http://eli.thegreenplace.net/2012/01/30/the-bytesstr-dichotomy-in-python-3/
- Python3中的bytes和str类型
- python3中bytes与string的互相转换
- Convert bytes to a string?
- https://stackoverflow.com/questions/30686701/python-get-size-of-string-in-bytes
- How many bytes does a string have
- How to get the size of a string in Python?
- https://docs.python.org/3/library/sys.html#sys.getsizeof
https://docs.python.org/3/library/functions.html#len
=END=
《 “Python中的字节和字符串” 》 有 3 条评论
Python 3’s f-Strings: An Improved String Formatting Syntax (Guide)
https://realpython.com/python-f-strings/
`
* “Old-school” String Formatting in Python
____* Option #1: %-formatting
____* Option #2: str.format()
* f-Strings: A New and Improved Way to Format Strings in Python
____* Simple Syntax
____* Arbitrary Expressions
____* Multiline f-Strings
____* Speed
* Python f-Strings: The Pesky Details
____* Quotation Marks
____* Dictionaries
____* Braces
____* Backslashes
____* Inline Comments
* Go Forth and Format!
* Further Reading
`
Python格式化字符串f-string概览
https://blog.csdn.net/sunxb10/article/details/81036693
`
f-string,亦称为格式化字符串常量(formatted string literals),是Python3.6新引入的一种字符串格式化方法,该方法源于PEP 498 – Literal String Interpolation,主要目的是使格式化字符串的操作更加简便。f-string在形式上是以 f 或 F 修饰符引领的字符串(f’xxx’ 或 F’xxx’),以大括号 {} 标明被替换的字段;f-string在本质上并不是字符串常量,而是一个在运行时运算求值的表达式。
f-string在功能方面不逊于传统的%-formatting语句和str.format()函数,同时性能又优于二者,且使用起来也更加简洁明了,因此对于Python3.6及以后的版本,推荐使用f-string进行字符串格式化。
`
神奇的 f-strings
https://zhuanlan.zhihu.com/p/62774871
How to Add New Line in Python f-strings
https://towardsdatascience.com/how-to-add-new-line-in-python-f-strings-7b4ccc605f4a
`
Essentially, you have three options;
The first is to define a new line as a string variable and reference that variable in f-string curly braces.
The second workaround is to use os.linesep that returns the new line character
and the final approach is to use chr(10) that corresponds to the Unicode new line character.
简单来说,还是定义一个值为换行符的字符串变量,然后在f-string中进行引用,这个相对来说更简便一些。
`
https://stackoverflow.com/questions/44780357/how-to-use-newline-n-in-f-string-to-format-output-in-python-3-6
UnicodeDecodeError: ‘utf8’ codec can’t decode byte 0xa5 in position 0: invalid start byte
https://www.w3docs.com/snippets/python/unicodedecodeerror-utf8-codec-cant-decode-byte-0xa5-in-position-0-invalid-start-byte.html
`
byte_string = b’\xa5′
text = byte_string.decode(‘utf8′, errors=’ignore’)
print(‘done’)
print(text) # prints nothing
byte_string = b’\xa5′
text = byte_string.decode(‘utf8′, errors=’replace’)
print(‘done’)
print(text) #� (U+FFFD, the official REPLACEMENT CHARACTER)
`
Test if a python string is printable
https://stackoverflow.com/questions/3636928/test-if-a-python-string-is-printable/50731077#50731077
`
>>> hello = ‘Hello World!’
>>> bell = chr(7)
>>> import string
>>> all(c in string.printable for c in hello)
True
>>> all(c in string.printable for c in bell)
False
>>> printset = set(string.printable)
>>> helloset = set(hello)
>>> bellset = set(bell)
>>> helloset
set([‘!’, ‘ ‘, ‘e’, ‘d’, ‘H’, ‘l’, ‘o’, ‘r’, ‘W’])
>>> helloset.issubset(printset)
True
>>> set(bell).issubset(printset)
False
import string
printset = set(string.printable)
isprintable = set(yourstring).issubset(printset)
`
Python String isprintable() Method
https://www.w3schools.com/python/ref_string_isprintable.asp
`
txt = “Hello! Are you #1?”
x = txt.isprintable()
print(x)
在Python中,可以使用字符串的 isprintable() 方法来检查字符串是否包含不可打印字符。如果字符串中不包含不可打印字符,该方法将返回True,否则将返回False。
请注意,isprintable()方法只能检查字符是否可打印,而不能检查字符是否是ASCII字符。如果需要检查字符串是否仅包含ASCII字符,可以使用 isascii()方法。
`
https://docs.python.org/3/library/string.html#string.printable
`
string.printable
String of ASCII characters which are considered printable. This is a combination of digits, ascii_letters, punctuation, and whitespace.
`