A report from the conference
I attended the main portion of EuroPython, July 25-27, 2018. It’s a conference organized by the EuroPython Society. They’re headquartered in my home city of Gothenburg (℅ Open End AB) but the conference is ambulatory. The 2018 edition took place in Edinburgh.
Due to high environmental costs, I normally avoid flying. Shamefully, I took 5 flights to get to Edinburgh and back. I have nothing to say in my defence. I publish the following notes on the talks I attended mainly as an aid to memory.
I wrote this article before the organizers posted recordings of the talks. Their doing so is a great service to the community and the environment. As you can see below, the video work is very good, much better than PyCon 2018 and 80% as good as being there in the room. I missed a lot of interesting talks simply by attending others, so I have happily been watching more following my return home.
On an experimental
thredo (thread redo) library that replaces the standard-library
threading module with a more intuitive API and support for thread groups and elegant cancellation of tasks, except tasks actually running on the CPU. The back end is an
async event loop made possible by recent developments in Python. The library is on Github.
Personally, I am not attached to the concept of object-oriented threads for concurrency. Even if
thredo were to grow into a stable project, the Trio library (see below) appears more promising if you have a chance to make a fresh start and you need an intuitive framework over the new primitives. Nonetheless, it is a good thought experiment. Beazley’s style of speaking, with live coding in the Mu editor, was excellent. Alas he did not have time to explain the back end.
Starting from an xkcd strip on the coding style of a self-taught programmer, Procida criticized the attitude expressed in the strip, which is also common in the Python community and computer programming in general. Acknowledging that the failure of an artist is debatable while the failure of an engineer is not—the distinction between an aesthetic and a practical craft—Procida pointed out that although programming is practical, it is creative. In fact, programming will almost inherently minimize those aspects of the craft that are repetitive: Those get automated. This is in contrast to the craft of a surgeon or a pilot, which is technical but repetitive.
The quality of work in any craft can be discussed in terms of technical proficiency (technique), innovation (creativity), critical skills (judgement), etc. One aspect, sophistication, was stipulatively defined as the craftsman’s level of engagement with the current conventions of their profession. Such sophistication is a social phenomenon, usually stemming from an education. It is the opposite of naïveté.
Procida took Georges Braque as an example of a consummately sophisticated artist and then took the contrasting examples of Mike Disfarmer, Henri Rousseau and the Talking Heads to illustrate that new insights and approaches often spring from naïveté. Disfarmer’s photography is the best of these examples: The photographer worked in an obscure periphery with antiquated methods and was never recognized as an artist in his lifetime, probably not even in his own mind, but through perseverance (grit) and talent he made strikingly honest, beautiful portraits of ordinary people, enhanced by a lack of conventional poses etc.
Procida’s core argument is simple: Conventions can stifle innovation. It would be easy to extend this argument by adding that many programmers exhibit a guild mentality stemming from the higher difficulty and lower accessibility of programming in past decades and the desire of any guild to limit its own size, controlling supply and therefore wages (or social esteem). There are areas of programming, particularly security, embedded and high-performance applications, where the term “engineering” is easily justified and the naïve programmer would do more harm than good, but for the purpose of everyday utility programming, Procida’s argument is adequate.
Procida’s main project, Django, actually illustrates his point: Its initial design was somewhat naïve, created for the day-to-day needs of journalists at the Lawrence Journal-World newspaper. Each part of it (ORM models, views, URL routing) has a single purpose, to the point that newcomers to it often get a sense of inelegant bulk next to competitors like Flask. With years of refactoring to iron out some of the early problems, Django became a popular framework. The seemingly naïve simplicity of its individual, loosely coupled parts is optional, but is more of a strength than a weakness.
On benchmarking Python code, primarily with cProfile and
timeit for convenience,
sys for low-level hooks, and CPython bytecode disassembly with
dis to inspect actual low-level algorithmic complexity resulting from high-level Python code.
Very little of this was new to me. My main takeaways were:
List creation is faster than tuple creation and, although it takes more memory, excess memory consumed was less than 110% of tuple consumption in Kąkol’s example.
f-strings are much faster than
f-strings are also less prone to error and often less verbose, use them when you can.
A straight implementation of binary search, even in pure Python, is orders of magnitude faster than an idiomatic O(n)
in-keyword data membership check for a large collection, which is slightly faster than a loop of equality checks. This does not take into account that a binary search requires sorting before checking.
Surprisingly basic. I learned nothing new.
On Trio, an
async development framework. It provides “nursery” context managers that handle multiple coroutines in such a way that a single Ctrl+c covers the whole set and exceptions propagate as would be expected of a single-threaded program. Leblond showed an implementation of the Happy Eyeballs DNS fallback algorithm on Trio that was substantially lighter than either version in Twisted.
While it does not circumvent the GIL, Trio looks very promising as a general-purpose replacement for function-oriented use of threading and other earlier forms of concurrency.
An introduction to resolved PEPs and other features, plus brief notes on performance enhancements.
breakpoint() looks good. So do the
importlib.resources, and the provisional
Beyond what was mentioned in the talk I look forward to using
pathlib.Path.is_mount() in production, the shell-like
k flag for CLI use of the
unittest module etc.
On quantum computers approaching practical usefulness. The focus was on misconceptions around Hugh Everett's many-worlds hypothesis and the poorly conceived metaphor of quantum computing as free parallel computing, common in popular media. Shor’s algorithm was discussed as a sort of reverse Fourier analysis to solve prime number factorization by finding periodicity, based on one of Euler’s additions to number theory. I did not know that; very cool.
Radcliffe’s conclusion was that while this stuff is based on uncommonly reliable science, many engineering challenges remain before there is a useful number of qubits and appropriate algorithms available. Even then, encryption will not suddenly break.
If I ever need to program a quantum computer, I would be disappointed if I had to use Python rather than a Lisp dialect.
A good talk on technical debt and “code rot”. Cheung used Martin Fowler’s definition of code smells from Refactoring (1999), i.e. surface indication that corresponds to a deeper problem, usually debt.
As one reason to wash away such smells, Cheung offered the broken windows theory by name. She then showed a number of cheese-themed examples of comments as deodorant to compensate for poor nomenclature, dead code, code duplication, needless conditional complexity etc. She offered bad names in particular as a smell for diffusion of responsibility.
Cheung advocated guard clauses (e.g.
if not has_money: return None), enums (over primitive constants with special significance), keyword arguments over positional arguments (on the general Python principle that explicit is better than implicit), and the single responsibility principle. She mentioned related tools developed at Yelp, her employer: Undebt and a related “branch debt” tracker that combines several metrics—such as looking for
noqa tags—but which I did not find online.
Personally, I enjoy tinkering to improve code that already “works”. I don’t argue for this on the basis of a spurious criminological theory, and I recognize that at worst, cleaning code can be mere pedantry or procrastination. I advocate clean code mainly because it makes it easier and more fun to read the source and reason about its effects, which is enormously important in any maintenance work.
A shared talk over remote link to Antarctica, on scientific computing and other facets of life at Concordia Base. This sort of talk provides no technically useful knowledge of Python but is excellent for building esprit de corps; conference attendees were clapping and waving enthusiastically for each of the speakers, who were caught in midwinter isolation.
The history of Python package indexing, centralization and the new (2018) Warehouse. This was an excellent illustration of how community engagement and funding are needed to run the package repository services that have become standard in popular modern programming languages.
Anderle suggested using Bower over minified jQuery etc. to manage front-end requirements, running
bower install from a requirements file outside your JS code.
yarn compete as package managers,
grunt similarly compete as development task managers for linting, compiling Less and much else. Anderle suggested starting with Webpack, which is less featureful but can often replace the larger package managers. Webpack compiles complex dependencies to static assets.
Angular/Ember are MVCs for one-page apps, React/Vue.js are mainly front-end. Test tools are similarly varied. Django Pipeline and Django Compressor were mentioned without description but seem promising enough.
An introduction to
ExpressionWrapper. The official documentation on some of these topics is thin. Donchev did not cast much light on them.
At one point, Donchev solved a common type of problem resulting from the Django ORM through code duplication. Observe the slide with
real_length = F('length') * Value(0.8); this duplicates a multiplication—a contrived example symbolic of any processing of stored raw data—defined separately in
Song.real_length in the same example. Generally, it would be better to factor out the manipulation step as a reusable “DRY” function, but it is hard to do so in a way that covers both pure Python and a
Value expression that transpiles to SQL. I asked Donchev about this in the Q&A following the talk, but evidently I did not make myself clear. If anybody knows a good method, please drop me an email.
This talk addressed the basic issue I raised at the preceding session, namely how to organize a Django project to avoid code duplication, without answering the more specific question I had about database queries versus non-portable data manipulation.
The talk presented the HackSoft project style guide focused on business logic, defined as constraints and relationships in an application: Everything except frameworks and utilities. Georgiev reasoned that such logic should not go in templates for obvious reasons, it should not go in model methods except to define the literal data model and its database-level relationships, and most especially it should not go in the
save method on a model because standard methods like that should not have side effects. (
Business logic shouldn’t go in views either, because they’re supposed to define an API for HTTP. A Django application may reasonably need to support multiple APIs, like REST, CLI and internal calls. This is certainly true of the Django application that serves the article you are now reading.
When using the Django REST framework plugin for terse CRUD, data is going to be saved by a specified serializer. In this case, additions to a REST view should not go into customizations of the serializer. Georgiev also said not to use a REST class API view to create objects (e.g. POST).
What Georgiev proposed doing instead was to create a module for your app called
services.py to centralize non-data-model business logic in a manner roughly analogous to the way a normal non-Django Python application might do it. This module should be filled with keyword-only functions that are type-annotated and use domain-specific nomenclature. These functions should work mostly with models. For example, you might have a
services.create_user() function that instantiates
User with all the side effects you want. Every non-trivial operation that writes to the DB should go in
services, centralizing core logic for all uses elsewhere.
Georgiev proposed a similar
app/selectors.py that centralizes fetching from the DB, but personally I would prefer following the modular pattern already established in Django, which would make
services (perhaps better named
logic?) a package, in which case
services.create_user() might become
services.create.user() with a corresponding
services.select.user(). This was mentioned in the Q&A.
Boundaries were fairly well defined. For example, the style guide—incidentally based in part on ideas from the Ruby/Rails communities—suggests that a relationship between two models can exist in a property on one of these models but if it spans multiple relations, it should go in
Following this pattern, which is partly based on Gary Bernhardt’s functional core/imperative shell distinction, decision-heavy core logic testing becomes fast, not hitting the DB. API testing takes on an integrative role, with heavy dependencies, and is correspondingly slow. General exceptions should be caught at the API level. Tasks and forms were not covered in the talk and circular imports were acknowledged as a potential problem.
The Q&A session brought up custom
QuerySet classes as an alternative. Service functions could then wrap such managers as necessary, but OOP strikes me as more difficult to reason about and would not serve as many purposes as a service library.
I enjoyed this talk. There is nothing sexy about expanding Django’s already verbose separation of concerns into yet more nested modules, but I will certainly try this approach in my next Django app.
On a pre-alpha project to build a Python interpreter in Rust, with incomplete syntax support and no roadmap for C extensions. The implementation was built to replace CPython directly, not to gradually migrate the existing reference implementation’s C library.
Performance for a simple nested loop was 5x slower than CPython, which is “promising”. No debugger had been applied yet and no disassembly. The project, if pursued, would probably be person-years of work away from deployable maturity but the talk was very pleasant, humble and enlightening.
On less-used features of
unittest.mock including specs, locking, and safety features for making assertions on mocks without accidentally creating a new mock instead. In my opinion this talk did more to illustrate the weaknesses of the standard module than anything else, but it was good either way.
On applying Jupyter etc. to gather and digest data and communicate effectively. According to Ozsvald, merely graphing data is a good start for a medical diagnosis, finding orangutans, or discovering that Skopje’s air is terrible. Python being popular in science, there are several conferences devoted to Python and data science.
While it would have served as a good general introduction to planning APIs, for me, this talk was mainly an introduction to the metadata-heavy and badly named “JSON API” standard. Laks held this up as predictable and widely used, with standardization of pagination, common and self-documenting parameters, errors with machine-readable offender ID etc.
For most applications, Laks recommended cursors over page numbers in pagination since cursors can be session-specific, allowing inferences from context. The apparent bloat in “JSON API” can be alleviated with compression. When there is lots of data, that data does not have to be in JSON format; your API endpoint may return a light, standard-compliant JSON object that contains a link to a tgz or whatever serves best.
Laks mentioned GraphQL as an alternative. He also discussed OpenAPI, Swagger, and OAuth 2.0, for which there is a Django module. The Django REST framework plugin provides documentation autogeneration features.
oauthlib is BSD-licensed and deprecated, the newer
authlib is LGPL and maintained.
On the subject of versioning, Laks talked about two patterns:
Accept Header versioning (implemented in the Django REST framework plugin as
AcceptHeaderVersioning) with metadata in HTTP headers. Hard to test.
In-URL versioning (
URLPathVersioning), embedding an API version ID in your URL. This is less clear semantically but is generally to be preferred, which means you should stick a
v0 or a
v1 in your initial routing paths.
Transformations between versions should be handled by the server translating to use the latest version for all operations and translating responses back to the requested version. In Q&A, an attendee brought up the Sunset HTTP Header, for informing a user about API service deprecation; it is not widely used.
By way of case studies, Laks talked briefly about AWS and Google speech recognition services as good examples with published idiomatic Python SDKs and predictable structures.
On control of Docker dependencies and mock Flask servers as
pytest fixtures, using the example of a production system (a banking rule engine). An impressive demo with obvious practical applicability where the inertia can be motivated.
On metaprogramming using abstract syntax trees, specifically with Python’s built-in
ast package, or
astor which does better visualization and round trips, plus Jupyter notebooks, where
%%showast can draw the trees visually.
showast, were authored by Stevens himself. Not explored in this talk: Green Tree Snakes and
python-ast-explorer.com, among other projects mentioned only briefly.
As an example use case,
pytest looks attractive because it applies AST manipulation for finding hidden state and authoring good error messages; your elegant-seeming
pytest cases are dynamically rebuilt by the framework, becoming significantly less elegant before they actually run. As another case,
if statements embedded in multilevel list comprehensions can exist at any level (with performance consequences) and you can learn this just by looking at their ASTs. There are limits; ASTs ignore whitespace, underscores in large integers etc., and they drop type info but not type annotations. The full syntax tree includes whitespace, so it’s better for roundtripping than is the AST.
When it comes to static analysis,
inspect.getsource is a good starting point with an object selected for analysis. Thomas Kluyver’s
astsearch then allows you to find, for instance, any variable being set to 1 (
?=1), or any number that is not assigned to anything, or all decorated functions containing a
for loop. For the infamous PEP 572 which led to Guido van Rossum’s resignation as BDFL shortly before the conference,
astpath can programmatically find places in your code where the new functionality is applicable, using xpath syntax on the command line.
astpath to lint code, for instance with context-specific deprecation warnings. In this role it might extend
flake8. There is a more primitive
ast.NodeTransformer in the standard library that will actively change your code, but if this needs to replace one statement with multiple statements it is necessary to provide a wrapping
if 1: or some similarly neutral umbrella node, which can make the output verbose and suboptimal.
Stevens used the example of programmatically checking, at an API boundary, that Python data passed into Google’s
protobuf will meet the strict type checks of that framework or else provoke a warning. For any such transformation you may want to maintain the original, terse hand-crafted code and the transformer separately from the result, as with
pytest test cases where the transformed version isn’t source-controlled. Stevens mentioned an alternative, which would be to implement the AST manipulation in a decorator, but this makes diffing the original and the production code slightly harder than it probably should be.
asttools.quoted_template does this semi-elegantly.
Another use case is also related to static analysis: Checking the validity of embedded DSL code ahead of runtime. AST analysis can’t do this particularly well with mere strings (e.g. SQL, xpath, regex) but with tools like
pony.orm to generate SQL from Python or
xpyth to generate xpath, DSLs are exposed to static analysis. (SQLAlchemy is possibly less powerful than
pony.orm for this purpose but Stevens was not sure about that, recommending
pony for its idiomatic Python syntax.)
Another case again is retrofitting future functionality; in the Q&A Stevens mentioned having ported
yield from into Python 2.7 by programmatically injecting its verbose equivalent from the reference documentation. Real-world usage has otherwise been mostly in linting.
For validating unit tests, Stevens suggested using
cosmic-ray: A tool that randomly mutates your code through AST manipulation, thus checking that your regression tests will actually catch regressions as expected.
This was a great talk: Novel (to me) and very technically dense but presented with humour and verve. All hail Prometheus. Still I have to gaze longingly at Lisp. Python was clearly not built for metaprogramming.
If you want to watch this talk, first watch “How to use web-sockets in Python” by Anton Caceres. I did not do so at the conference, but Jones brought it up. Caceres demonstrated the use of Tornado for asynchronous web sockets, as well as Django Channels, wherein Django is served asynchronously, mainly for protocols other than HTTP. (In the Q&A for Jones’s talk, an attendee mentioned Sanic as another alternative, an async framework and server in combination, the same sort of combo that led to the development of WSGI for loose coupling. Sanic has its own novel API, unlike Quart.)
Jones’s talk was mainly an introduction to ASGI as an asynchronous alternative to WSGI, with several servers available. The fastest—though least featureful—of these servers is Uvicorn, about 7 times faster than raw Flask and three times faster than the combination of Flask with Gunicorn and Eventlet for a web server that just returns a trivial string. Daphne is popular in the Django Channels community.
Quart is a drop-in replacement for Flask that uses ASGI. You can port a Flask app to it by adding the
await keywords. Web sockets are easy and require no extensions. The Quart test client is modelled after Flask, Quart tries to match the private API of Flask, and Quart will even try to monkey-patch itself over Flask when it is used with Flask extensions, so it isn’t necessary to port all of the individual extensions. There are native Quart extensions for OpenAPI (REST) and cross-origin resource sharing.
All of this makes Quart suitable for streaming servers. For simpler uses where Quart is behind a server that supports HTTP/2 (e.g. Hypercorn, Daphne) Quart allows
request.push_promises.add(...) to prepare CSS and other auxiliary files beyond the HTML before the client knows to request them, saving time and interactions over the wire.
The difficulty of mixing synchronous and asynchronous functions in one application is the whole reason for Quart. In my opinion, it is a sad state of affairs that a near-complete replacement should be needed for this purpose (compare Stackless Python), but the job looks well done and Quart is fully type-hinted to ease the transition, supporting
mypy. Jones was talking to the Flask authors about a possible merge, and mentioned Trio (see above) and Curio as frameworks that Quart could possibly support in the future, which sounds nice but probably unnecessary.
Cautionary tales of p-hacking and misrepresentations. This had nothing to do with Python specifically, but I love a good talk about scientific scepticism. No subject is more broadly applicable.
I enjoyed the conference very much. The atmosphere was excellent. I thank my employer, Icomera, for allowing me and a colleague to go, and I thank the many volunteers who organized it all. However, it was a huge amount of greenhouse gas emissions for three days of talks and three lovely summer nights of sightseeing, haggis and battered Mars bars.
These notes are not likely to save future emissions. In retrospect, given the damage I did by flying and the hours I wasted at airports, I personally would have been better served by a couple of days off work to watch the videos. If you want to fly to a conference, you had better make sure it’s for high-bandwidth activities you can’t do nearly as well on the Internet. For EuroPython, that means networking, recruitment and hands-on work, particularly classes and sprints on open-source projects where you get to interact synchronously with your fellow developers.