I did one of the keynotes at PyData Berlin yesterday, called "How to trick a neural network". I spent the first while talking about tricking neural networks, and then we talked about how having black box models can be dangerous, and a few strategies for making black box models more interpretable. There’s a ton of interesting work there, and here are a few links:
- if you want to understand exactly how we tricked a neural network, Breaking Linear Classifiers on ImageNet uses exactly the same technique, but on a simpler linear classifier instead of on a neural network. In general Andrej Karpathy’s blog is great.
- the blog post I wrote about how to trick a neural network
- the original paper I read called "Explaining and Harnessing Adversial Examples" . This paper is pretty short and I found it easy to read.
some fun art projects that generate images with neural networks:
- deep dream
- A Neural Algorithm of Artistic Style and the corresponding website deepart.io . Someone told me that the "artistic style" paper is actually super readable and interesting!
In general, I became more convinced at this conference that sometimes people do really cool things (like training a machine learning model to play space invaders), and those things are actually possible to reproduce for yourself if you read the paper and invest a bunch of time! I think it can often take like 2 months to fully read and understand the paper and get it to work, but it’s not impossible, which is cool.
We talked about how machine learning sometimes doesn’t work or can have unintended effects. I mentioned
- Carina Zona: Consequences of an insightful algorithm
- this paper by Google on building reliable production machine learning systems is great Machine Learning: the high interest credit card of technical debt
Here are a few methods to get more interpretability out of black box machine learning models:
- this paper called A Model Explanation System has some really good ideas and is very short. We’ve used some of these ideas at work and it’s been helpful so far.
- someone linked me to this other model explanation system called LIME which I haven’t looked at yet
- a package for scikit-learn that produces more interpretable results (the "Bayesian rule list" classifier)
This problem of how to make a complicated machine learning model more interpretable definitely isn’t solved, and I’d love to have more conversations about it. If you have other useful links you think I should include, let me know!
thanks to Piotr Migdal & Thomas Friedel for sending me links!