Sorry, you’re just not my type

Since the release of python 3.5 (September 2015), it has been possible to add type hints to your code. The full description of the goals and implementation can be read in PEP484, but it basically boils down to support for static type checking. Static type checking is extremely powerful and in this article I will look at some advanced techniques to get the most out of the typing module, discuss the benefits typing can bring you, and what typing might be able to facilitate in the future. In our organization, we try to incorporate typing in all projects and enforce it with pre-commit hooks. Typing can be introduced gradually and it makes understanding code much easier, especially when multiple developers are working on the same codebase.

The practice of typing in python consists of adding a type to a variable, like so:

a: int = 1

This has been introduced with PEP 484 and is inspired by a project called mypy. With mypy and type hints, you as a developer can do static type checking and the reason you would want to do this is so you can catch bugs before shipping your code. The following bug would be caught by the mypy type checker:

a: int = "a"

Assigning a string value to an int typed var would result in an error. The typing module introduces annotations to deal with more complex types:

These are some of the most basic types in the typing module, but the rabbit hole is quite deep. The documentation is a really good reference on the possibilities, though personally I think it is a little lacking on the topic of generics. My main criticism towards the documentation is that it makes generics look much more complex than they actually are and as a result obfuscates their use-case and the motivation for why you should use them. In my role as software engineer at Lab Digital, I deal with a lot of backend python projects. Introducing typing has helped me with onboarding myself on new projects. In this article I will be sharing some of the cool features and odd quirks that I have found with regard to static type annotations in python. I will not be discussing the more basic types as there are plenty of other resources that have done a really good job of explaining the basic usage and the value of using type annotations. Instead, I will look at the stuff that I don’t see in every article (along with some features from python 3.8) and try to shed some light on the use cases.

Dataclasses

Dataclasses were introduced in python 3.7. Though not necessarily related to typing, I personally find dataclasses work toward the same goal as typing in that they allow you to very concisely define your data structures. I mention them first because I will be using them in the more complex examples later on.

Where in the past I would have used a dict with a bunch of key/value pairs (because who has the time to write the boilerplate __init__ method), I now almost always opt for a dataclass as they not only tell you more about the available keys, but also about what you can expect from the values. They are also much cleaner than a regular class in python:

Classes (and by extension, dataclasses) are valid types for mypy. So in the following example we would get a mypy error:

I especially like dataclasses when dealing with json-like responses. Where I used to have to consult the API or documentation of the code to figure out what I can expect in the response, I can now just look at the dataclass definition. Personally I think it is good practice to keep dataclasses mostly for storing data, meaning that I try to not add functions to the class and definitely never alter dataclass properties in self methods. It is also important to note that the typing in dataclasses is not restrictive, meaning that the following would run without exceptions:

Animal(name=12345)

Mypy would catch this error, but python would not bat an eye. Python 3.8 also brings us the TypedDict annotation, which in many ways can fulfil the same use-case as a dataclass. I would recommend using dataclasses unless performance is a really big issue ( dataclasses have slightly more overhead); TypedDict lacks instance and class checks, and has some quirky behaviour during type checking which I will discuss later on alongside the other new features of python 3.8.

Type Aliasing and Complex Types

The first thing I want to address in regards to typing is type aliasing, specifically how it can be used to simplify data structures and how it can make code more readable. Aliasing is pretty much identical to the variable declaration:

Though typing often improves the readability of your code, it can also do the exact opposite. Consider the following example:

If you take the time to analyze this annotation you can figure out roughly what you can expect, but it is far from readable. Type aliasing allows us to simplify this annotation. With type aliasing we can rewrite this annotation to something like this:

The annotation is still the same, just broken apart into smaller and more identifiable chunks which makes it much more readable, especially if you are not interested in the entire data structure. But aliasing can take us even further, we can alias primitive types too! This can be tremendously useful in clarifying certain variables that may have obscure names.

Take the example above, maybe when this function was written the entire application dealt with only 1 type of id and this was fine, but let’s assume the app has grown and more ids are being used throughout. Refactoring may not be a good option because the function is re-used in many places or because insufficient test data exists to find all its implications. Commenting is a relatively good option, but personally I think comments are very easy to miss due to varying styles in commenting code.

In such a case I would prefer a type alias as you can immediately see what is expected from the function, rather than having to read its documentation. Apart from clarifying variable types, you can also add more context to your data structure. When considering StandardRow and SpecialRow from the example above you could think that the tuples are just inverted, but this doesn't have to be true:

Describing your data with meaningful names can add a lot of readability, however it can also introduce unnecessary complexity. Aliasing a primitive type to a custom type requires the person reading this code to know what that primitive type is or to have to locate it somewhere in the project. In this case you might be better suited to renaming StandardRow to something like PersonRow and SpecialRow to AnimalRow and you might achieve the same result in terms of code readability. Finding this balance in when to use type aliasing and when not can be quite hard and mostly depends on style.

Generics

Simple annotations and type aliasing can go a long way toward clarifying your data structures and your intents, but in some cases you have to deal with some abstract types, think of a function that accepts a Union of types and based upon the input type spits out a specific return value. Consider the following code:

When reading this code, we can assume the intention of the function is to rename a Person or an Animal. However, according to mypy the following would be valid:

To clarify this we can introduce generic TypeVars to the code. In terms of readability they might not add a whole lot when compared to Union, but they can be useful in static type checking and other tooling as we will see later on. A TypeVar is a generic type that can have 2 or more constraints (if it has a single constraint you are effectively type aliasing). We can introduce a TypeVar to the code example above:

This would have the following implications for rename_it:

The last example is valid according to mypy, however this is where one of the main shortcomings of mypy comes into play: dealing with control flow and generics at the same time. In the case of the last example, the following would also be valid:

It should be noted that mypy handles simple control flow just fine, the problem only surfaces when dealing with generics as in the example above. The lack of support does not mean that you should not use this pattern, as it still provides a lot of context to your function call.

TypeVars can be used in conjunction with aliasing for some very concise data type hints:

Protocols

Protocols effectively are what you would call traits or interfaces in other languages and are a derivative of generics. In the context of python you can think of protocols as static duck typing. A very simple example would look like this:

You can even add required variables to your protocol:

Protocols can be your best friend when dealing with code that has gone down the path of inheritance and mixins. If you are not the one who wrote it (or you wrote it more than 2 weeks ago), it can be really hard to reason about what is going on:

Then you end up looking at a function somewhere deep in the project that uses this Obj class and has a bug, and the function may look something like this:

In this case it is apparent that the override of fn_b is the problem, but the code might by untyped and then you would have to read through the function and reason about why it is doing what it is doing and in many cases this will also mean figuring out how the functions relate to other functions in the class and all of its parents. In actuality when looking at this function you care about 2 things only:

The object has 2 specific function definitions
One of them returns an int

We can introduce a protocol like so:

And suddenly mypy catches your error; the signature for fn_b is wrong. A very big caveat for this is that this only works for functions with a signature. So if we were remove the return type annotations, mypy would not catch this error.

An additional feature of protocols is that they can be checked at runtime when you apply the runtime_checkable decorator.

Unfortunately, this decorator only works for protocols and cannot be used with something like TypedDict for instance.

Hypothesis-auto

Hypothesis is a library that is inspired by Quickcheck from Haskell. It generates a bunch of test-cases for a function and finds a scenario where the function will yield undesired behavior or raise an error. This is called property-based testing. Introducing hypothesis tests into a codebase can be quite a chore, as you have to learn how hypothesis works, how to generate meaningful call strategies, and how to validate the outcomes. Suddenly quickcheck is not so quick anymore.

Luckily, someone created hypothesis-auto. This package takes typed functions as an input and automatically generates test cases by filling random values in for the typed function arguments and validating that the return type matches the annotation and raises no errors. Consider this function:

To generate a slew of test cases, we could add hypothesis auto testing:

Obviously this is a very simple example, but auto-hypothesis works with complex type hints too:

This particular case would pass all the generated tests, but let’s imagine that we have discovered a third type of Row in our dataset:

Just by changing the types, the hypothesis tests will find at least 1 failing condition and you will know that you have to refactor the sum_row function to deal with the scenario of being passed a VerySpecialRow. Below is another example that deals with dataclasses and custom validation:

These tests work for pure functions and may not be very applicable in web-dev. Personally I have a hard time finding the right use-cases for it as so much of web-dev deals with external api-calls, database transactions, and other stateful operations. That said, whenever I am dealing with pure computation/data transformation I always try to incorporate this in my work. Hypothesis auto also cannot infer protocol type annotations, so functions that make use of that cannot be tested adequately with this.

Final, Literal, and TypedDict

With the latest version of python (3.8 at the time of writing), you have access to Final, Literal, and TypedDict. I mention these separately for two reasons. First, they are only available in python 3.8 and it is unlikely you will be able to incorporate them into an existing project. Second, I have some small concerns about their implementation and the implication this has towards their intended goal.

A Final annotation indicates that a variable cannot be re-assigned, which sounds interesting from a functional programming point of view.

Literal is an enum-like annotation that indicates that a variable can only be one of a few values.

TypedDict is a way to annotate a dictionary.

Having played around with these for a bit, I have to say I am not all that excited about using them. Final seems incomplete to me. It does not work well with both dataclasses as well as typed dicts:

You are not really freezing an entire value from my point of view, which undercuts its value tremendously. Literal has some very select use-cases, such as dealing with string arguments for functions like open; specifying a Literal for all the possible read-modes can be pretty powerful. With that said, I cannot remember the last time I used a string argument when dealing with option-like structures; in all cases I would choose an Enum over a Literal. As for TypedDict, I don't really see the point all that well. The main argument is that it can be useful for legacy codebases where dataclasses are not available, but TypedDict ships with python3.8 and dataclasses come with python3.7. My main criticism is that the type-checking is not intuitive and 'lazy' (in terms of type-checking). Consider the following annotation:

I would assume that field_c is optional, but instantiating a dict with no field_c raises an error.

Furthermore, if a field is missing, a TypedDict ignores other type errors.

From my point of view, this behaviour makes TypedDict actually less reliable in communicating the intent of the data and for that reason alone I would avoid using it.

Closing Remarks and Resources

Next to hypothesis-auto, there are more projects that “abuse” the power of typing; deserializing json is one that comes to mind and makes a lot of sense. A python compiler in the works ( mypyc) that aims to compile python code to c modules. Typing can even facilitate macros.

Personally I think type-hints (and dataclasses) facilitate a much cleaner way of writing code and should be embraced by everyone, regardless of the ways in which they will be abused in the future. Being able to reason about the program flow just by looking at data types and not having to figure out what happens to your data with each function call is an immeasurable time-saver. Introducing typing to a project is also a great way of getting yourself familiarized with the various data-flows. Outside of its practical uses for maintaining code, I think forcing yourself to incorporate typing into projects forces you to think differently about data (and data flow), because you have to make all your intentions explicit. Pure functions are a consequence of good typing practices for instance. Things like hypothesis-auto are just a great bonus.

With typing being optional, it is also very easy to gradually introduce typing into projects. If you want to take it a step further, you could employ pre-commit hooks to check whether all new code is typed. It is important to note that mypy does not catch all errors. Especially when dealing with control flow and generics mypy can fall short and give you a false sense of security. There is no real way to mitigate the danger of this happening, I have looked into using multiple type-checkers at the same time, but this proved to be unreliable because there is no feature-complete type-checker yet. Pyre-check for instance deals better with control flow than mypy, but cannot reason about TypeVar generics as well as mypy and raises incorrect errors. The conclusion here is that you should be aware of the limitations of your type-checker before trusting it blindly. Personally, whenever I am dealing with a tricky/experimental annotation I will make a separate script where I can validate whether an error is actually caught by mypy before integrating the annotation into my code.

Relevant projects and further reading:

PEP 484: https://www.python.org/dev/peps/pep-0484/
mypy: https://github.com/python/mypy
pyre-check: https://github.com/facebook/pyre-check
typing docs: https://docs.python.org/3.8/library/typing.html
hypothesis-auto: https://github.com/timothycrosley/hypothesis-auto/
annotation-abuse (macros in python): https://tinkering.xyz/abusing-type-annotations/
dataclasses-json: https://github.com/lidatong/dataclasses-json

Thanks for reading my blog!

Tags used in this article:

pythonsoftware-developmenttechnology