Micropackages and Open Source Trust Scaling

This post was originally published on this blog

Like everybody else this week we had fun

with the pad-left disaster .

We’re from the Python community and our exposure to the node ecosystem is

primarily for the client side. We’re big fans of the ecosystem that

develops around react and as such quite a bit of our daily workflow

involves npm.

What frustrated me personally about this conversation that took place over

the internets about the last few days however has nothing to do with npm,

the guy who deleted his packages, any potential trademark disputes or the

supposed inability of the JavaScript community to write functions to pad

strings. It has more to do with how the ecosystem evolving around npm has

created the most dangerous and irresponsible environment which in many

ways leaves me scared.

My opinion query quickly went from “ Oh that’s funny ” to

This concerns me ”.

Dependency Explosion

When “pad left” disaster stroke I had a brief look at Sentry’s dependency

tree. I should probably have done that before but for as long things work

you don’t really tend to do that. At the time of writing we have 39

dependencies in our package.json . These dependencies are strongly

vetted in the sense that we do not include anything there we did not

investigate properly. What however we cannot do, is also to investigate

every single dependency there is. The reason for this is how these node

dependencies explode. While we have 39 direct dependencies, we have more

than a thousand dependencies in total as it turns out.

To give you a comparison: the Sentry backend (Sentry server) has 45 direct

dependencies. If you resolve all dependencies and install them as well

you end up with a total of 65 packages which is significantly less. We

only get a total of 20 packages over what we depend on ourselves. The

typical Python project would be similar. For instance the Flask framework

depends on three (soon to be four with Click added) other packages:

Werkzeug, Jinja2 and itsdangerous. Jinja2 additionally depends on

MarkupSafe. All of those packages are written by the same author however

but split into rough responsibilities.

Why is that important?

  • dependencies incur cost.
  • every dependency is a liability.

The Cost of Dependencies

Let’s talk about the cost of dependencies first. There are a few costs

associated with every dependency and most of you who have been programming

for a few years will have encountered this.

The most obvious costs are that packages need to be downloaded from

somewhere. This corresponds to direct cost. The most shocking example I

encountered for this is the isarray

npm package. It’s currently being downloaded short of 19 million times a

month from npm. The entire contents of that package can fit into a single


module.exports = Array.isArray || function(a) { return {}.toString.call(a) == '[object Array]' }

However in addition to this stuff there is a bunch of extra content in it.

You actually end up downloading a 2.5KB tarball because of all the extra

metadata, readme, license file, travis config, unittests and makefile. On

top of that npm adds 6KB for its own metadata. Let’s round it to 8KB that

need to be downloaded. Multiplied with the total number of downloads last

month the node community downloaded 140GB worth of isarray. That’s half

of the monthly downloads of what Flask achieves measured by size.

The footprint of Sentry’s server component is big when you add up all the

dependencies. Yet the entire installation of Sentry from pypi takes about

30 seconds including compiling lxml. Installing the over 1000

dependencies for the UI though takes I think about 5 minutes even though

you end up with a fraction of the code afterwards. Also the further you

are away from the npm CDN node the worse the price for the network

roundtrip you pay. I threw away my node cache for fun and ran npm install

on Sentry. Takes about 4.5 minutes. And that’s with good latency to npm,

on a above average network connect and a top of the line Macbook Pro with

an SSD. I don’t want to know what the experience is for people on

unreliable network connections. Afterwards I end up with 165MB in

node_modules . For comparison the entirety of the Sentry’s backend

dependencies on the file system and all metadata is 60MB.

When we have a thousand different dependencies we have a thousand

different licenses and copyright files. Really makes me wonder what the

license screen of a node powered desktop application would look like. But

it’s not also a thousand licenses, it’s a huge number of independent


Trust and Auditing

This leads me to what my actual issue with micro-dependencies is: we do not

have trust solved. Every once in a while people will bring up how we all

would be better off if we PGP signed our Python packages. I think what a

lot of people miss in the process is that signatures were never a

technical problem but a trust and scaling problem.

I want to give you a practical example of what I mean with this. Say you

build a program based on the Flask framework. You pull in a total of 4-5

dependencies for Flask alone which are all signed off my me. The attack

vector to get untrusted code into Flask is:

  • get a backdoor into a pull request and get it merged
  • steal my credentials to PyPI and publish a new release with a backdoor
  • put a backdoor into one of my dependencies

All of those attack vectors I cover. I use my own software, monitor what

releases are PyPI which is also the only place to install my software

from. I 2FA all my logins where possible, I use long randomly generated

passwords where I cannot etc. None of my libraries use a dependency I do

not trust the developer of. In essence if you use Flask you only need to

trust me to not be malicious or idiotic. Generally by vetting me as a

person (or maybe at a later point an organization that releases my

libraries) you can be reasonably sure that what you install is what you

expect and not something dangerous. If you develop large scale Python

applications you can do this for all your dependencies and you end up with

a reasonably short list. More than that. Because Python’s import system

is very limited you end up with only one version of each library so when

you want to go in detail and sign off on releases you only need to do it


Back to Sentry’s use of npm. It turns out we have four different versions

of the same query string library because of different version pinning by

different libraries. Fun.

Those dependencies can easily end up being high value targets because of

how few people know about them. juliangruber’s “isarray” has 15 stars on

github and only two people watch the repository. It’s downloaded 18

million times a month. Sentry depends on it 20 times. 14 times it’s a

pin for 0.0.1 , once it’s a pin for ^1.0.0 and 5 times for

~1.0.0 . Any pin for anything other than a strict version match is a

disaster waiting to happen if someone would manage to push out a point

release for it by stealing juliangruber’s credentials.

Now one could argue that the same problem applies if people hack my

account and push out a new Flask release. But I can promise you I will

notice a release from one of my ~5 libraries because of a) I monitor those

packages, b) other people would notice a release. I doubt people would

notice a new isarray release. Yet isarray is not sandboxed and runs

with the same rights as the rest of the code you have.

For instance sindresorhus maintains 827 npm packages . Most of which are probably one

liners. I have no idea how good his opsec is, but my assumption is that

it’s significantly harder for him to ensure that all of those are actually

his releases than it is for me as I only have to look over a handful.


There is a common talk that package signatures would solve a lot of those

issues but at the end of the day because of the trust we get from PyPI and

npm we get very little extra security from a package signature compared to

just trusting the username/password auth on package publish.

Why package signatures are not the holy grail was

covered by Donald

aka Mr PyPI. You should definitely read that since he’s describing the

overarching issue much better than I could ever do.

Future of Micro-Dependencies

To be perfectly honestly. I’m legitimately scared about node’s integrity

of the ecosystem and this worry does not go away. Among other things I’m

using keybase and keybase uses unpinned node libraries left and right.

keybase has 225 node dependencies from a quick look. Among those many

partially pinned one-liner libraries for which it would be easily enough

to roll out backdoor update if one gets hold of credentials.

If micro-dependencies want to have a future then something must change in

npm. Maybe they would have to get a specific tag so that the system can

automatically run automated analysis to spot unexpected updates. Probably

they should require a CC0 license to simplify copyright dialogs etc.

But as it stands right now I feel like this entire thing is a huge

disaster waiting to happen and if you are not using node shrinkwrap yet

you better get started quickly.


转载本站任何文章请注明:转载至神刀安全网,谢谢神刀安全网 » Micropackages and Open Source Trust Scaling

分享到:更多 ()

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址