I was fortunate enough to attend PyCon 2022 this year in Salt Lake City (thanks BetterUp!). This is the annual Python conference, put on by the Python Software Foundation. It was my first time going and I had few expectations going into it. I really enjoyed my time, and learned a number of new things and ways of thinking. Here are some highlights and learnings from a subset of the talks I attended.
|Łukasz Langa||Type Annotations||This was a pretty technical deep-dive into Python’s typing system. It was a good reminder to me that this is a (useful) feature in Python that is here to stay. Check out mypy for implementation details. This was my first time hearing of Łukasz, who is a core developer of Python and known by many.|
|Trey Hunner||Python Oddities Explained||A good reminder that Python is quirky, and to always be on the lookout for things that didn’t match your expectations. See also Python’s Horror Show from Pablo S (talk below)|
|Brandt Bucher||Structural Pattern Matching||Pattern matching (new
|Reuven Lerner||Understanding Python attributes||
Ask yourself, how does a class object, an instance, an instance variable, an attribute, and a descriptor all relate? This is surprisingly complex question, in my opinion. It’s one of those oddly confusing computer science-y things (similar to the reasons why I didn’t want to study CS). Class objects support two kinds of operations: attribute references and instantiation. With the former, you can access class attributes and variables. With the latter, once we have an instance object, we can only perform attribute reference. There are two instance attribute types: data attributes (sometimes called instance variables or class members) and methods. Instance variables are data unique for an instance, and class variables are data shared by all instances of the class. But what about a descriptor?
A descriptor is what we call any object that defines `__get__()`, `__set__()`, or `__delete__()` … The main motivation for descriptors is to provide a hook allowing objects stored in class variables to control what happens during attribute lookup.
If you’re thoroughly confused, I understand. I recommend the Python docs tutorial on classes as a starting point for clarification.
|Paul Ganssle||What to Do When the Bug Is in Someone Else’s Code||Somewhat relevant to something I’ve been doing, a good new way to look at OSS and how to work with it. The speaker discussed five ways that you can overcome bugs in code that you don’t maintain. In order of best to worst: patching upstream (fix the bug, then make a PR on the project), wrapper functions (encapsulate buggy code in updated code), monkey patching (assign the global variable a fixed version), vendoring (clone a copy, then patch it locally), and maintaining a fork (fork and patch a copy of the project). I’m guilty of some of the worse behaviors, but now I’ve got some perspective on why they’re “bad” that I didn’t have before.|
|Nir Barazida||Dock your Jupyter Notebook||This talk proposed hosting your Jupyter-based data science and research projects in Docker images. I’d thought of running scripts in containers, but this was specifically about running notebooks. The speaker introduced docker-stacks. I thought this was all very fascinating for creating reproducible research. However, I tried to quickly implement this with my own work and it wasn’t so easy. I still find Docker tricky as soon as you want to do even the slightest customizations to the image and runtime. Also, Conda overcomes many barriers to reproducibility and Docker seems to only contribute marginally.|
|Maria Jose Molina Contreras||Creating an indoor air quality monitoring and predictive system||This was an exposition of a data science project. The speaker basically linked up some CO2 and climate sensors to a microcontroller and slapped a prediction algorithm on top. A couple learnings. First, open the windows, get some air inside your room/office! Second, machine learning projects don’t have to be crazy complex. You can get detailed, but it’s not necessary to make a point. The most interesting projects are the ones with real world consequences, not just the titanic dataset or housing prices. Third, so much effort goes into everything that sets up any prediction model. Always remember that.|
|Sara Issaoun||Hands down one of the best talks of the weekend. Dr. Issaoun’s talk was an inspiration, and for an extremely complex subject (astrophysics, astronomy, blackholes, etc.) she made the audience (or at least me) feel like both experts and curious children. If I had to pick one talk to re-attend, this would be the one. It was the only one where I went up to talk to talk to the speaker at the end.|
|Peter Wang||Peter gave an introduction to and live-demo of PyScript, “a framework that allows users to create rich Python applications in the browser using HTML’s interface.” This was another extremely inspiring talk. I don’t know exactly how I would use Python-in-the-browser but I can imagine it will open a world of possibilities, similar to Node. I’m very eager to see where this project goes, and perhaps sometime soon I’ll give it a try and write up my experience.|
|Fred Phillips||Hooking into Imports||This speaker introduced the idea of “hooking” into the Python import process. For example, it might be necessary to create a blocklist of packages that can’t be imported; or load package code from a remote database. In both cases, it can be helpful to modify and overload the default package search and execution process. I currently have no use case for it, but it was good to learn about what’s happening under the hood. I had never considered the process with which Python resolves and loads the modules we import.|
|Antoine Toubhans||Flexible ML Experiment Tracking||Data Version Control (DVC) is something I’ve been thinking about for years, and just haven’t tried it out yet. It seems like it could help solve a number of common data science and machine learning problems. I just need to learn it. I’ve decided to try it out in a current project I’m working on. There is a learning curve, but not near as confusing as something like Docker; more comparable to git. So far I’m finding some value in the experiment tracking functionality. I don’t really want to implement my own visualizations to track experiments, as was proposed in this talk (with streamlit), and I’m hoping the maintainers of the DVC library add in more viz tools.|
|Ryan Kuhl||GraphQL||I had heard of GraphQL before but haven’t spent anytime learning about it. Therefore I thought this would be a good crash-course on the tool, but it wasn’t. GraphQL is a way to query any database (although it works well for graph DBs especially). My takeaways could be summed up as, this is a fascinating technology with a clear implementational benefit over other querying languages, but it seems highly engineered, clunky, and probably only worth it if you’re dealing with a lot of data in a production environment.|
|Pablo Galindo Salgado||Making Python better w/ Errors||Python 3.11 and 3.12 will contain some big makeovers in the traceback and error diagnostic capabilities in Python. I’m looking forward to having clearer and more specific errors and exceptions.|
|Kelly Schuster, Sean Paredes||Python like a 12-year-old||The speakers were two Python educators, who have experience teaching Python to middle schoolers. Their top learnings from working with these kids: Be curious! Take risks! Kids think broadly, unlike us adults. Engage all senses, not just what you can see (e.g. build something with your hands). Always be on the lookout for unexpected behavior. When solving a problem, think: what’s the worst way to do this? (to force yourself to think in others’ shoes.)|
|Jeremiah Paige||Intro to Introspection||Python has many tools available for debugging, and although Python can be frustrating with its lack of verbosity sometimes, we must remember what we have at our disposal to dissect issues. For example, Python has the
|Joseph Lucas||Serialization||This talk introduced serialization and ways to do this in Python. Serialization refers to breaking down data and objects (in Python, or elsewhere) and storing it with the intent to deserialize it later for use. The speaker first talked about the built-in pickle module. This is a Python-specific module for serializing objects and data structures in a compressed byte-stream. Unlike JSON (another serialization format), pickles are binary, not unicode; pickles are not human readable; pickles can represent a wide number of Python objects; and deserializing pickles can pose an execution safety risk, unlike JSON. If we’re trying to serialize an object that pickle cannot handle, there is the dill module.|
|Tetsuya “Jesse” Hirata||Productionize Research-Oriented Code||I went into this thinking it would cover how to write research code for your production engineers. Instead, it was how to read research code from your researchers as an engineer. This was a pleasant surprise, and was almost sort of a lesson in empathy. This reinforced my belief that my code and work is most valuable when it can be (quickly and easily) taken to a place of impact. Some reminders for myself: write clean code, and document it; modularize when possible; separate loading, cleaning, processing, and modeling; if you have time, look for ways to refactor.|
|Cillian Kieran||Open-Source Tools For Data Privacy||The Ethyca fida folks have laid out a really appealing taxonomy of data privacy in the way of privacy-as-code. I love data privacy, and I lvoe categorizing things, especially data, so this is like candy to me. I don’t know if I would try this out in reality, as it seems like a pretty heavy lift, but maybe one day.|