April 30, 2018

Type Annotations in Python

This is an edited live blog of a Python London presentation by Bernat Gabor of Bloomberg. Sorry it's taken so long to prepare. A great talk, very well presented. Thanks Bernat! Many thanks to Babylon Health for being a welcoming host.

PEP 484 introduced type hinting. Function annotations from PEP 3172 and variable annotations from PEP 526 have come together, and are supported by mypy. Why? Primarily to make code easier to debug, maintain and understand. An implementation supports annotation in Python 2.7, as structured comments, so there's no excuse for not using them!

What do annotations NOT provide?

  • Runtime type inference.
  • Performance improvements. 
In fact, annotations are effectively treated as commentary during Python parsing by the interpreter, other than ensuring that their syntax is valid. The mypy project can be used to verify that code calling functions conform to the type annotations thereon. You can also implement "gradual typing": the mypy linter will report errors if your code is type-hinted. It won't complain about untyped values and functions.
Type annotations as implemented by recent Python 3 releases (> 3.4) are the canonical way to do it, but require you to import all type dependencies, and the parser has to parse them. This adds a measurable time penalty, though 3.7's PEP 536 implementation will lead to increased speed.
For older Python implementations, annotations in comments will work under any Python version (simply being ignored as comments by those versions that don't understand them), but it's "kinda ugly" and creates noise around your program logic. They can also lead to compatibility problems with established linters.

A further option is to write interface or stub files - you can even create stubs for code you don't own and/or have access to the source of. This again works in any version because the stub files are simply ignored, but it requires no change to existing source and lets you use the latest Python features. It does, however, create a maintenance burden. Further, if your stubs don't match the code (if a function name is changed in the main file but not in the stub, for example), there are no checks to alert the programmer.

Any solution not using the latest syntax is likely to cause problems in the long-term, though it won't be impossible.

PEP 563, due in Python 3.7, will allow you to distribute any package with type information. Unfortunately, this will not allow annotation of local variables, only interfaces. Annotations can be incorporated into docstrings, but this isn't an especially good option, and isn't recommended.

So, what kind of types can you add? Nominal types (int, float, object, etc.) have generic containers such as Tuple[int, float], Dict[str, int], MutableMapping[str, int], List[int], Iterable[Text], and so on. Since these types are Python objects, this means you can alias types by assignment.

Further nominal types include callable and generics like TypeVar, as well as Any, which effectively disables type checking for specific names. PEP 544 will specify protocols, and this will allow the introduction of structural typing.

What are the gotchas?

  • Python 2 and 3 require version-dependent checks in code intended for both environments
  • It's difficult to handle functions that have multiple return types, encouraging programmers to short-circuit type checking. While the interpreter won't complain at the resulting necessary shenanigans, various linters will be unhappy.
  • Type lookup can be problematic, as the system allows the programmer to shadow the names of types without realising it because they use the same scopes as the runtime.
  • Subclasses can support supersets of the types supported by their superclasses, but not subsets of those types. Further, subtypes whose methods have signatures requiring additional arguments must make those additional arguments optional for type checking to succeed.
Type hinting is fun, but may cause you (like David Beazley) to wish you had a desk-side bridge to jump off. If all else fails, use # type: ignore to just shut the noise up, though this might be regarded as an admission of defeat.

PEP 257 defines how to annotate your code so Sphinx can create documentation that includes type information. Just install the sphinx-autodoc module and add it to to your conf.py file.

What's next? MyPy is getting close to a 1.0 release, and the main focus now is on improving performance and enabling incremental typing, defining and implementing an API and a plugin system to open up the ecosystem.

October 19, 2017

What's In a Namespace?

Python programmers talk about namespaces a lot. The Zen of Python* ends with
Namespaces are one honking great idea—let’s do more of those!
and if Tim Peters thinks namespaces are such a good idea, who am I to disagree?

Resolution of Unqualified Names

Python programmers learned at their mothers' knees that Python looks up unqualified names in three namespaces—first, the local namespace of the currently-executing function or method; second, the global namespace of the module containing the executing code; third and last, the built-in namespace that holds the built-in functions and exceptions. So, it makes sense to understand the various namespaces that the interpreter can use. Note that when we talk about name resolution we are talking about how a value is associated with an unadorned name in the code.

In the main module of a running program there is no local namespace. A name must be present in either the module's global namespace or, if not there, in the built-in namespace that holds functions like len, the standard exceptions, and so on. In other words, when __name__ == '__main__' the local and global namespaces are the same.

When the interpreter compiles a function it keeps track of names which are bound inside the function body (this includes the parameters, which are established in the local namespace before execution begins) and aren't declared as either global or (in Python 3) nonlocal.  Because it knows the local names the interpreter can assign them a pre-defined place in the stack frame (where local data is kept for each function call), and does not generally need to perform a lookup. This is the main reason local access is faster than global access.

Although the interpreter identifies local names by the presence of bindings within a function body, there is nothing to stop you writing code that references the names before they are bound. Under those circumstances you will see an UnboundLocalError exception raised with a message like "local variable 'b' referenced before assignment".

For non-local names, something very like a dictionary lookup takes place first in the module's global namespace and then in the built-ins. If neither search yields a result then the interpreter raises a NameError exception with a message like "name 'nosuch' is not defined."

Resolution of Qualified Names

In qualified names (those consisting of a sequence of names or expressions delimited by dots  such as os.path.join) starts by locating the first object's namespace (in this case os) in the standard way described above. Thereafter the mechanism can get complex because like many Python features you can control how it works for your own objects by defining __getattr__ and/or __getattribute__ methods, and because descriptors (primarily used in accessing properties) can cloud the picture.

In essence, though, the mechanism is that the interpreter, having located the object bound to the unqualified name, then makes a gettatr call for the second name (in this case, path) in that namespace, yielding another object, against which a further getattr call is made with the third component of the name, and so on. If at any point a getattr fails then the interpreter raises an AttributeError exception with a message such as "'module' object has no attribute 'name'."

Understanding Expression Values

Once you understand the mechanisms for looking up the values of names it becomes a little easier to understand how Python computes expression values. Once a name is resolved there may be other methods to apply such as __getitem__ for subscripting or __call__ for function calls. These operations also yield values, whose namespaces can again be used to lookup further names. So, for example, when you see an expression like

    e.orig.args[0].startswith('UNIQUE constraint failed')

you understand that the name e.orig.args is looked up by going through a sequence of namespaces and evaluates to a list object, to which a subscripting operation is applied to get the first element, in whose namespace the name startswith is resolved (hopefully to something callable) to a value that is finally called with a string argument.

Ultimately, by decomposing the expressions in this way you end up only dealing with one object at a time. Knowing how these mechanisms work in principle can help you to decipher complex Python code.

* Just type import this into a Python interpreter, or enter python -m this at the shell prompt, and hit return.

August 12, 2015

Pro tip: Use CDPATH in Your Shell

This is a tip I first picked up about thirty years ago (my God!) when I worked at Sun Microsystems and used the C shell fairly extensively. Fortunately there's no need to do so any more, as many of its more desirable features have been incorporated into bash.

There's nothing worse, even with tab-completion, than having to stab through a sequence of directories to find the location you want. If you're anything like me there are typically three or four main directories that I use for about 80% of the work that I do.

The CDPATH environment variable works pretty much like your PATH setting, except that instead of being used to locate executables it's used to locate directories. It comes into play when you issue a cd or pushd command. So, for example, on my personal machine my .bash_profile contains the following line:

export CDPATH=.:~/Projects:~/Projects/Python/

So when I try to change to a new directory the shell first looks in the current directory (it can cause real confusion if you don't look there first), then in my Projects directory, then in its Python sub-directory. So the command

pushd PytDj

takes me to my ~/Projects/Python/PytDj project directory with no need to specify the path. I estimate this saves me at least a minute a day, so over thirty years it's saved me a substantial amount of time. Try it, and see what you think.

June 14, 2015

A Short Musical(?) Introduction to Infinity

Wondering how best to convey the bigness of infinity, I came upon the idea of using a musical exposition. Please forgive the inadequate nature of the performance - like present-giving it's the thought that counts. It's a monologue on the nature of the smallest infinite cardinal. Enjoy. Or not ...