编程语言 - Python 3 新特性介绍

Into Python3

NOTES

本文列举出 Python 3.6 相对于 Python 2.x 作出的改动。但是本文并未列出所有改动，所有改动可参见官方 WhatsNew 。

Why Python 3？

Python 2.0 发布于 2000 年；Python 3.0 发布于 2008 年，最新（201711）版本号为 3.6.3。
Python 社区里大部分重要软件包已经支持 Python 3。各种包支持 Python 3 的进展可参见这里和那里。
Python 2.7 将在 2020 年后停止开始，2020 年后 Python 社区不会再为 Python 2 提供 Bugfix 和安全补丁。
Python 3 提供原生异步支持 Asyncio 。
Django，NumPy 等软件包新版本都不再提供 Python 2 支持。
很多激动人心的改进和标准库。

新语法点

`print()` 函数

# python 2
>>> print "hello", "world"
hello world
>>> print("hello", "world")
('hello', 'world')

# python 3
>>> print("hello", "world")
hello world

`//` 和 `/`

>>> 5 // 2
2
>>> 5 / 2
2.5

整数类型

Python 3 不再区分 int 和 long 两种整数类型，只提供 int 一种类型（其行为和 Python 2 中的 long 类似）。

Python 3 移除了 sys.maxint 常量，以后可以使用 sys.maxsize 替代。

八进制字面量（Octal literal）的书写形式发生了变化，比如，从 0720 变为 0o720。

二进制字面量（Binary literal）的书写形式形如 0b1010 ，还可以通过内建函数 bin() 生成任意数字的二进制字面量：

>>> bin(8410)
'0b10000011011010'

默认编码

CPython 3 使用 UTF8 作为默认编码（ sys.getdefaultencoding() ），也就是说：

不用再为源代码文件头部添加 # coding=utf-8 ；
Unicode 字段串可以直接写入文件或打印到终端；
encode() 和 decode 默认使用 UTF8 编码，等等。

`True` 和 `False`

CPython 2 中，True 和 False 使用全局变量实现。 Python 3 将 True 和 False 设计为语言关键字，不再允许对其重新赋值。

`nonlocal` 关键字

Python 3 新增关键字 nonlocal 用于声明非局部变量（a variable in an outer but non-global scope）。

def foo():
    c = 1
    def bar():
        nonlocal c
        c = 12
    bar()

unpacking

>>> a, b, *rest = range(10)
>>> a, *rest, b = range(10)
>>> *rest, a, b = range(10)

Keyword-only 参数

# Python 2 and Python 3
def f(a, b, *args, option=True):
    ...

# Python 3 only
def g(a, b, *, option=True):
    ...

f(1, 2, option=True)    # Yes
g(1, 2, option=True)    # Yes
g(1, 2, False)          # No

> If you somehow are writing for a Python 3 only codebase, I highly recommend making all your
> keyword arguments keyword only, especially keyword arguments that represent "options".

字典和集合 comprehension

{x for x in stuff}          # Set comprehension
{k: v for k, v in stuff}    # Dict comprehension

有序字典

collections.OrderedDict 类可以实现有序字典。具体参见 PEP 372 。

集合字面量

{}              # Empty dictionary
{1, 2}          # set(1, 2)

`with` 语句

>>> with open('mylog.txt') as infile, open('a.out', 'w') as outfile:
...     for line in infile:
...         if '<critical>' in line:
...             outfile.write(line)

异常相关

新增了 OSError 类的子细，细化异常场景：

+-- OSError
|    +-- BlockingIOError
|    +-- ChildProcessError
|    +-- ConnectionError
|    |    +-- BrokenPipeError
|    |    +-- ConnectionAbortedError
|    |    +-- ConnectionRefusedError
|    |    +-- ConnectionResetError
|    +-- FileExistsError
|    +-- FileNotFoundError
|    +-- InterruptedError
|    +-- IsADirectoryError
|    +-- NotADirectoryError
|    +-- PermissionError
|    +-- ProcessLookupError
|    +-- TimeoutError

Python 3 强制要求所有异常类均需继承自 BaseException ，并且不再允许使用字符串异常（String Exceptions）。一般应用的异常可以使用 Exception 作为基类，而需要定义像 SystemExit ， KeyboardInterrupt 这类顶级异常时，才使用 BaseException 。

同时，需要捕获所有非顶级异常的场合的建议用法是： except Exception 。

PEP 3134 定义了新的链式异常（Exception Chaining）语法：

>>> try:
...     raise Exception("A")
... except:
...     raise Exception("B")
...
Traceback (most recent call last):
  File "<stdin>", line 2, in <module>
Exception: A

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
Exception: B

或者

>>> raise Exception("B") from Exception("A")

或者

>>> raise Exception("B") from None      # To suppress original exception context.

迭代器

在 Python 3 中， range ， zip ， map ， dict.values 等等函数的返回值都是迭代器。

Python 3 将 range() ， zip() 和 map() 等实现为实现了迭代器协议的类，同时移除了 xrange() 函数。

>>> type(range)
<class 'type'>

dict.keys(), dict.items() 和 dict.values() 返回实现迭代器协议的 view 对象。同时， Python 3 移除了 dict.iterkeys(), dict.iteritems() 和 dict.itervalues() 等方法。

异构类型比较

Python 2 中任意的不同类型间可以参与比较操作，这个特性可能会造成很多潜在的问题。 Python 3 在没有明显自然顺序（meaningful natural ordering）的数据类型进行大小比较操作（ <, <=, >=, > ）时会抛出 TypeError 异常。

# Python 2
>>> 1 > "2"
False
>>> zip > 1

# Python 3
>>> 1 > "2"
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: '>' not supported between instances of 'int' and 'str'

1 < '', 0 > None, None < None 等等表达式会抛出异常。
函数 sorted() 和 list.sort() 不再支持 cmp 参数。同时，参数 key 和 reverse 也被设定为 keyword-only 参数。
类不再支持 __cmp__() 方法。内建 cmp 函数也被移除（If you really need the cmp() functionality, you could use the expression (a > b) - (a < b) as the equivalent for cmp(a, b) ）。

`yield from` 关键字

asyncio 异步库的语法基础。它对 yield 关键字的语法进行了扩展，为 Python 方便地实现功能完整的原生协程提供了基础。详情参见 PEP 380 。

# Instead of writing
for i in gen():
    yield i

# Just write
yield from gen()

# Also
def dup(n):
    for i in range(n):
        yield from [i, i]

`memoryview` 类

>>> with memoryview(b'abcdefgh') as v:
...     print(v.tolist())
[97, 98, 99, 100, 101, 102, 103, 104]

函数注解

Python 解释器目前只将注解信息存储于函数对象的 __annotations__ 属性中，第三方库或者 IDE 可以利用这些信息实现其它功能。

def f(a: int, b: int = 2) -> int:
    ...

类型注解

ls: List[int] = []
dt: Dict[str, int] = {}

相对引入

Python 3 只允许 from .[module] import name 形式的相对引入（relative import）语法。其它不以 . 开始的模块路径都被当成绝对引入（absolute imports）处理。详见 PEP 328 。

具体说来，对于下面的包结构：

package/
    __init__.py
    subpackage1/
        __init__.py
        moduleX.py
        moduleY.py
    subpackage2/
        __init__.py
        moduleZ.py
    moduleA.py

下面的相对引入方式是合法的：

# A single leading dot indicates a relative import, starting with the current package. Two or
# more leading dots give a relative import to the parent(s) of the current package, one level
# per dot after the first.
from .moduleY import spam
from .moduleY import spam as ham
from . import moduleY
from ..subpackage1 import moduleY
from ..subpackage2.moduleZ import eggs
from ..moduleA import foo
from ...package import bar
from ...sys import path

`pycache`

.pyc 文件名添加解释器版本信息，并统一存储到包目录下的 __pycache__ 目录中。

可以通过模块 __cached__ 属性获取该模块的缓存文件路径；通过模块 imp.get_tag() 函数获取解释器版本标签。

>>> import collections
>>> collections.__cached__
'/usr/lib/python3.6/collections/__pycache__/__init__.cpython-36.pyc'
>>> import imp
>>> imp.get_tag()
'cpython-36'

`functiontools.lru_cache` 装饰器

提供 LRU 缓存功能（memoizing）的函数装饰器，可以节省函数内的比较昂贵的计算量，比如 IO 操作等。装饰器内部使用字典对象存储缓存数据，使用被装饰函数的参数做为查询键值（这些参数需要具有 hashable 特性）。

该装饰器为被装饰函数添加了 cache_info 和 cache_clear 属性。用户可以通过 cache_info() 获取缓存命中率等统计数据，调用 cache_clear() 清理缓存数据。

>>> import functools
>>> @functools.lru_cache(maxsize=32)
... def get(num):
...     return "HI"
...
>>> get(1)
'HI'
>>> get.cache_info()
CacheInfo(hits=0, misses=1, maxsize=32, currsize=1)
>>> get.cache_clear()
>>> get.cache_info()
CacheInfo(hits=0, misses=0, maxsize=32, currsize=0)

The threading module has a new Barrier synchronization class for making multiple threads wait until all of them have reached a common barrier point. Barriers are useful for making sure that a task with multiple precondictions does not run until all of the predecessor tasks are complete.

Barriers can work with an arbitrary number of threads. This is a generalization of a Rendezvous which is defined for only two threads.

`datetime.timezone` 类

datetime 模块新增了一个实现了 tzinfo 接口的 timezone 类。通过它可以可以为 datetime 对象添加时区信息。

>>> from datetime import datetime, timezone, timedelta
>>> datetime.now()
datetime.datetime(2017, 11, 23, 20, 32, 28, 388215)
>>> datetime.now(timezone(timedelta(hours=8)))
datetime.datetime(2017, 11, 23, 20, 33, 11, 393631, tzinfo=datetime.timezone(datetime.timedelta(0, 28800)))
>>> datetime.now(timezone(timedelta(hours=9)))
datetime.datetime(2017, 11, 23, 21, 33, 33, 458728, tzinfo=datetime.timezone(datetime.timedelta(0, 32400)))

`ContextDecorator` 类

contextlib.contextmanager() 装饰过的函数，可以用作上下文管理器和装饰器。这个双重功能通过 ContextDecorator 类实现。

@contextmanager
def track_entry_and_exit(name):
    ...
    yield
    ...

# Usable as a context manager
with track_entry_and_exit("widget loader"):
    load_widget()

# Also usable as a decorator
@track_entry_and_exit("widget loader")
def activity():
    load_widget()

`html.escape` 函数

escape 函数目前是 html 模块提供的唯一函数，它将 HTML 特殊字符进行转义。

>>> import html
>>> html.escape('x > 2 && x < 7')
'x &gt; 2 &amp;&amp; x &lt; 7'

`super` 函数

super() 函数不用指定参数就可以自动选择合适的类或实例。详情参见 PEP 3135

被删除函数

apply()
coerce()
execfile() - 可以使用 exec(open(fn).read()) 实现相同功能。
file() - 可以使用 open() 或者 io.open() 实现相同功能。
reduce() - 可以使用 functools.reduce() 或者 for 循环实现相同功能。
reload() - 可以使用 imp.reload() 实现相同功能。
dict.has_key() - 可以使用 in 操作符实现相同功能。

标准库变更

PEP 3108 有标准库变更的完整描述。下面只列出部分变更：

移除 bsddb3 包，相关功能可以使用 pybsddb <https://www.jcea.es/programacion/pybsddb.htm> 实现；

因为一些包/模块的命名方式有悖于 PEP 8 中的约定，将其更名：

旧包名	新包名
_winreg	winreg
ConfigParser	configparser
copy_reg	copyreg
Queue	queue
SocketServer	socketserver
markupbase	_markupbase
repr	reprlib
test.test_support	test.support

Python 2.x 有一个常见的模式：有些库存在使用纯 Python 实现的版本和使用 C 实现的版本（效率较高），比如标准库提供的 pickle 和 cPickle 模块。模块的使用者一般需要在应用代码中先尝试引入效率较高的实现，如果该实现不可用的话，再回退到纯 Python 实现的版本。Python 3 将这部分工作挪到了纯 Python 版本中。类似的模块还有：profile 和 cProfile , StringIO 和 cStringIO 等等。
将一些功能相关的模块合并成了包（模块名称也做了相应简化）：
- dbm (anydbm, dbhash, dbm, dumpdbm, gdbm, whichdb)
- html (HTMLParser, htmlentitydefs)
- urllib (urllib, urllib2, urlparse, robotparse)
- xmlrpc (xmlrpclib, DocXMLRPCServer, SimpleXMLPRCServer)
一些 PEP 3108 未提到的改动
- 去除了 sets 模块。可使用 set() 和 frozenset() 类作为替代。
- 从 sys 模块中去除了 sys.exitfunc() ， sys.exc_clear() ， sys.exc_type ， sys.exc_traceback 等。
- 从 array.array 类中去除了 read() 和 write() 方法。
- 从 thread 模块中去掉了 acquire_lock() 和 release_lock() 方法。
- 去掉了 new 模块。
- 模块 math 新增函数： isfinite(), expml(), erf(), erfc(), gamma(), lgamma() ， log2() 。
- abc 模块新增函数： abstractclassmethod() 和 abstractstaticmethod() 。

新增标准库

numbers - 该模块定义了数字类型相关的抽象类及抽象类间的层次关系。
io - 该模块用于处理多种类型的 IO 操作：文本 I/O, 二进制 I/O 和裸 I/O 等等。它为每种类型的 IO 都提供了多种存储方式。
- A concrete object belonging to any of these categories is called a file object. Other common terms are stream and file-like object.
- All streams are careful about the type of data you give to them. For example giveing a str object to the write() method of a binary stream will raise a TypeError. So will giving a bytes object the the write() method of a text stream.
```
# Text I/O expects and produces ``str`` object.
f = open("myfile.txt", "r", encoding="utf-8")
f = io.StringIO("some initial text data")

# Binary I/O (also called buffered I/O) expects bytes-like object and produces ``bytes``
# object.
f = open("myfile.jpg", "rb")
f = io.BytesIO(b"some initial binary data: \x00\x01")

# Raw I/O (also called unbuffered I/O)
f = open("myfile.jpg", "rb", buffering=0)
```

asyncio - This module provides infrastructure for writing single-threaded concurrent code using coroutines, multiplexing I/O access over sockets and other resources, running network clients and servers, and other related primitives.

# Taken from Guido's slides from “Tulip: Async I/O for Python 3” by Guido
# van Rossum, at LinkedIn, Mountain View, Jan 23, 2014
@coroutine
def fetch(host, port):
    r, w = yield from open_connection(host,port)
    w.write(b'GET /HTTP/1.0\r\n\r\n ')
    while (yield from r.readline()).decode('latin-1').strip():
        pass
    body = yield from r.read()
    return body

@coroutine
def start():
    data = yield from fetch('python.org', 80)
    print(data.decode('utf-8'))

faulthandler - 该模块可以程序错误时，显式输出进程各线程的 traceback 信息。
ipaddress - 该模块提供了创建或操作 IPv4/IPv6 的主机地址和网络地址的功能。
enum - 该模块为 Python 增加枚举类型。
pathlib - 面向对象的文件系统路径表达。
collections.abc - 容器类型的抽象基类。
ensurepip - 为当前解释器安装 pip 工具。 .. code:: python

python -m ensurepip
selectors - select 模块高级抽象。 Users are encouraged to use this module instead, unless they want precise control over the OS-level primitives used.
concurrent.futures - 提供了可以处理异步任务的进程池和线程池。