I recently had to set up a new postgres instance at work. While we’re already using Heroku Postgres for production loads, we had some requirements for this one that couldn’t be solved without being database admin, and thus had to go self-hosted. That process in itself is fairly straightforward and not worthy of a post in itself, but this instance was one we’d use to run ad-hoc analytics queries on, and thus had to be able to log on to from anywhere. Also not a problem in itself, Postgres can run all connections over TLS, thus we’d be able to both securely authenticate ourselves to the server and verify that we’re talking to the correct server. Just a matter of creating a self-signed certificate, configuring it with the key on the database and put it in
~/.postgresql/root.crt , and use
?sslmode=verify-ca when we’re connecting. But as with everything, this is easy in retrospect – as I started on this I had no idea this was how Postgres did certificate validations. But now I know a bit more, so indulge me for a minute and let me try to explain how this all fits together, then we’ll see how some Amazon RDS and Heroku Postgres compare in terms of authenticating their connections, before we try to break stuff.
As with most TLS-based protocols, the onus is on the connecting party to authenticate the server. The server dictates whether it accepts authentication over unencrypted connections or not, but how the connection is established and verified is up to the client. When the client connects it picks a mode to use when verifying the connection, specified with the connection parameter
sslmode , which defaults to
prefer . From the excellent documentation on TLS modes we see that this simply first attempts to establish a TLS connection, and if that fails falls back to connect in the clear. We can specify looser modes like
disable that will respectively first try plaintext and then TLS, or disable it completely. But simply connecting over TLS doesn’t give us any indications about the trustworthyness of the connection, because we haven’t specified a trust root to use. The most familiar TLS-tunneled protocol people know of is HTTPS, where the trust root is usually pre-installed on your operating system or web browser, and includes hundreds of certificate authorities (CAs) which can sign certificate requests for any domain on the internet. For Postgres there is no such collection of certificate authorities, thus you have to manually specify your trust root. If you don’t put anything in
~/.postgresql/root.crt or point the connection to a different root through the
sslrootcert parameter, any certificate presented will be considered valid (if using any mode aparty from
This means that in the face of an active attacker, the default settings provide no guarantees of either confidentiality, integrity or authenticity (the CIA triad ). And to be clear, your threat model doesn’t need to include three-letter agencies before active network attackers become a problem, anyone with a $99 Pineapple can probably position themselves in the middle of your traffic if you ever use WiFis without a VPN. I’m guessing that includes quite a lot of developers that like to work out of coffee shops, co-working spaces, and any other location where it’s reasonable easy for anyone to get access. Thus we must do something to authenticate our connection, which luckily is as easy as putting the database’s certificate in
~/.postgresql/root.crt . This will authenticate all outgoing connections from your machine against this trust root, so if you only have an established trust root for some of your databases (and for some reason can’t establish a trust root for the rest), put the certificate somewhere else and use the
sslrootcert parameter to indicate to
psql where your trust root is. This illustrates the difference between the modes
verify-ca , the two former modes will authenticate the connection if a trust root is present, but will accept in anything if a trust root is absent.
verify-ca will fail without a specified trust root.
The difference between the modes
verify-full is less obvious and there’s only a difference if you have multiple databases. If you only have one database you’d just generate a self-signed certificate for it and specify that as your trust root,
verify-ca would ensure that the only accepted connection would only be to that database (or potentially to multiple databases sharing the same certificate and key). But if you have multiple databases you might find that managing the trust root becomes burdensome since it must be updated everytime a database is added or removed, and thus you should probably create your own certificate authority to sign database certificates. Anyone can be their own certificate authority, it’s just a matter of creating a self-signed certificate with privileges to sign other certificates and put that as your trust root, and now any database that can authenticate itself with a certificate signed by your CA will be accepted. However, with mode
verify-ca this does not guarantee that you’re talking to the database you tried to connect to, only that the database has a certificate issued by the CA in your trust root. If you tried to connect to
db1.example.com , the certificate issued to
db2.example.com would be considered valid for that connection. If these provide the same service you’d probably trust them both equally much, but if
db2 has to be publicly accessible while
db1 is internal only,
db2 has a much higher risk of being hacked and thus have the private key for its certificate compromised. In that case you would not trust them equally much, and either split the trust so that the two are issued by different authorities and thus have different trust roots, or you need domain validation as well, which is provided by the
verify-full option, which ensures that the hostname you’re trying to connect to is the same as the Common Name (CN) field in the database’s certificate.
Comparing Amazon RDS and Heroku Postgres
While there are lots of companies offering hosted Postgres, I’m only going to compare two popular choices (because I’m lazy), Amazon RDS and Heroku Postgres. RDS makes it fairly easy to do things correctly, on the instance details page there’s a group dedicated to security options, where you can see the CA that signed the instance’s certificate.
Unfortunately there’s no link to download the CA certificate, but a quick search will lead you to this page , which explains how to download the certificate and configure the connection. I wrote a small script to dump the certificate presented by a database here , let’s use that to take a look at the certificate our RDS instance presents:
$ ./postgres_get_server_cert.py mitm.chjg7xaetv8u.eu-central-1.rds.amazonaws.com Certificate: Data: Version: 3 (0x2) Serial Number: ab:eb:fd:ef:19:b0:83:df:bc:44:b1:3e:52:9f:71 Signature Algorithm: sha1WithRSAEncryption Issuer: C=US, ST=Washington, L=Seattle, O=Amazon Web Services, Inc., OU=Amazon RDS, CN=Amazon RDS eu-central-1 CA Validity Not Before: Jun 5 04:50:59 2016 GMT Not After : Mar 5 22:03:31 2020 GMT Subject: CN=mitm.chjg7xaetv8u.eu-central-1.rds.amazonaws.com, OU=RDS, O=Amazon.com, L=Seattle, ST=Washington, C=US Subject Public Key Info: Public Key Algorithm: rsaEncryption Public-Key: (1024 bit) Modulus: 00:8d:29:7c:de:a2:bf:69:28:61:63:70:e2:19:c9: da:d3:66:14:0a:50:e6:fc:8c:42:21:e8:9b:74:dd: f8:42:31:1c:4d:b4:37:89:30:91:eb:db:1c:2d:7d: bc:12:01:be:0f:04:8d:fe:11:69:ee:b3:e1:3f:cb: <...>
The database certificate has the correct hostname in the Common Name, and you’re encouraged to use
sslmode=verify-full in the docs. Apart from using sha1 and 1024-bit RSA in 2016, everything A-OK for RDS in terms of configuring their Postgres connection.
Now let’s turn to Heroku. Heroku runs on AWS, and thus should provide equally good security, right? Not entirely. Well, not at all, actually. Let’s take a look at the certificate for our Heroku instance:
$ database_url=$(heroku config:get DATABASE_URL) $ ./postgres_get_server_cert.py "$database_url" / > | openssl x509 -noout -text Certificate: Data: Version: 3 (0x2) Serial Number: 11658775476923737329 (0xa1cc4a86821324f1) Signature Algorithm: sha256WithRSAEncryption Issuer: CN=ip-10-99-YY-XX.ec2.internal Validity Not Before: May 12 22:27:13 2016 GMT Not After : May 10 22:27:13 2026 GMT Subject: CN=ip-10-99-YY-XX.ec2.internal Subject Public Key Info: Public Key Algorithm: rsaEncryption Public-Key: (2048 bit) Modulus: 00:cc:10:26:b4:b5:c7:e4:45:08:ce:c3:0c:23:25: 0d:cd:40:9b:05:ed:e2:fb:df:69:89:8d:35:1c:38: 47:98:72:a8:44:47:a1:47:95:ac:64:19:66:69:97: d2:5f:03:46:17:00:61:d6:38:77:2d:31:33:ca:2d: b0:53:82:cf:09:02:37:87:f6:73:53:ab:1a:f2:04: <..>
Well, at least they do sha-256 and 2048-bit RSA, right? Apart from that we see that the database has a self-signed certificate (issuer is the same as subject), which practically means that there’s no reliable trust root to use when connecting, as Heroku moves the databases around between hosts as part of maintenance. Thus adding the above certificate to your trust root would break your app once the database is moved around. We can see this from the Heroku Toolbelt too, here they set the
PGSSLMODE environment varible (equivalent to the
sslmode connection parameter) to
require , before forking out to
psql , without any specified trust root. This is particularly bad for Heroku, because unlike Amazon, they provide tools like
heroku pg:psql to easily get a database prompt from basically anywhere, which is very convenient for doing ad-hoc stuff on the database. Problem is, since this is a CLI tool it’s very likely it’ll be run from developer laptops and not only from the dynos, and thus is very susceptible to a Man-in-the-Middle (MitM) attack on the connection. This is fortunately fixable, if Heroku creates their own CA like Amazon has done, they can sign their databases’ certificates and ship the CA certificate in an update to the Toolbelt, after which all connections through
pg:psql can be authenticated with full verification of both CA and hostname. If they also provide the CA certificate through the database details page on the dashboard, other external applications could also authenticate their connections to the databases, but until then they recommend using
heroku run bash -c 'psql $DATABASE_URL' . Running psql through bash on a dyno will ensure that the TLS connection is established on their network, which means that an attacker would have to be between the dyno and the database, a harder challenge than being between your laptop and the database.
Proof of concept
But while words are nice and well, let’s see if we can demonstrate the issue further and develop a proof-of-concept of an attack on a Postgres connection that doesn’t authenticate towards a trust root. Since we’ll be hacking ourselves for demonstration purposes, we can avoid the step of trying to position a device in between ourselves and the database, and rather instruct our app to connect through our MitM-service and see if we can establish a connection to the database. If the MitM-service works and we don’t provide a trust root for the connection this should be entirely transparent to the application, but the MitM-service should have captured the credentials needed to connect to the database. This means that we need to know a couple of things; firstly how the server authenticates a client, and secondly how the wire protocol looks to be able to parse messages from both the client and the database.
While postgres supports many ways of authenticating clients , including Kerberos, client TLS certificates, LDAP and PAM, the two I’m assuming are most popular are password and MD5. Both of these require the client to prove their identity through a password, but while the former method sends the password in plaintext on the wire, the latter performs a double-hashing of the password with a connection-specific salt, preventing both sending the password in the clear and replay attacks.
While MD5 is not the strongest hash algorithm available, I have no desire to try to brute-force Heroku’s 26-character passwords from a 64 character alphabet, since that equals roughly 2^156 bits of entropy. Luckily you’d on average only have to try half of the candidates before you find the correct one, but that’s still nothing I can complete before the next episode of Game of Thrones airs tonight. Thus since we’re going to impersonate a server anyway, let’s just ask the client for the plaintext password, and then compute the hash ourselves if the database asks for it. Thus the setup will look like this, with TLS on all connections:
+-----+ +------+ +-----------------+ | App | -- Password -> | MitM | -- MD5(password) -> | Heroku Postgres | +-----+ +------+ +-----------------+
So how does the Postgres wire protocol look? A presentation by Jan Urbański at PGCon 2014 describes the relevant details, all messages exchanged are of the format
|char tag|int32 len|payload| , except for a special startup message which determines which version of the protocol will be used and sends connection details like username and target database, which is on the format
|int32 len|int32 version|[payload]| . There’s three major protocol versions supported by the database, but we’re only going to bother with version 3.0, which is currently the latest version, introduced in 2003. The protocol version is encoded with the major version in the upper 16 bits of the int, and the minor version in the lower 16, meaning the format allows 65536 major and minor versions. Should probably be enough for a couple of years still, given that Postgres has only gone through three major protocol versions in its 19 years of existence. There’s also the magical version
1234.5679 , which is used to signal that the client wants to connect over TLS.
The payload of the startup message is null-separated key-value pairs, with a trailing null. If the database is configured to require authentication for the given database it’ll respond with an authentication request (tag
'R' ), specifying one of the possible methods mentioned earlier. Heroku’s Postgres only asks for MD5, thus that’s the one we’re going to implement. The client responds with a password message with tag
'p' , and a payload which in the case of MD5 will be a hash
md5(md5(password + username), salt) . These three messages concludes what we have to understand to MitM the connection, since after we’ve intercepted the
'p' from the client and encoded it as a MD5
'p' to the server, we simply forward the response from the server to the client, letting it reply with whether the password was accepted or not. We could inspect this message to determine whether the password was valid or not, but that’s left as an excercise for the reader.
We’re thus playing both sides of the initial phases of the protocol, we’re performing the server side with the client until it has sent us the password, and then we perform the client part of the protocol with the submitted password to the actual database. After that we don’t care much, but there’s lots of fun that could be had at this stage, like replacing all urls in a response with lolcat pictures, subtracting 1 from all numbers or whatever you find fun on a Sunday evening.
The script is provided here . To MitM our own Heroku connection we’ll give the script the IP of the database, and then route all local DNS for the database to localhost. If you want to test this with other applications simply give the script the hostname of your database, and configure your app with localhost as the database host.
The example database I’ve configured for the project has hostname
ec2-54-163-238-215.compute-1.amazonaws.com , which conveniently encodes the IP as part of the hostname, thus I’ll start the script like this:
./postgres_mitm.py 220.127.116.11 , and I’ll put the following line in
127.0.0.1 ec2-54-163-238-215.compute-1.amazonaws.com . This will ensure that when
psql tries to look up the IP address of the database it will be directed to
127.0.0.1 , but since our script is started with the target IP it will be able to connect.
How does this fare?
# Terminal 1 $ heroku pg:psql ---> Connecting to DATABASE_URL psql (9.4.6) SSL connection (protocol: TLSv1.2, cipher: ECDHE-ECDSA-AES256-GCM-SHA384, bits: 256, compression: off) Type "help" for help. mega-mitm::DATABASE=> select user; current_user ---------------- fxutzshsavlfyj (1 row) # Terminal 2 $ ./postgres_mitm.py 18.104.22.168 2016-06-05 01:16:15,740 [INFO] Listening for connections 2016-06-05 01:16:40,809 [INFO] Intercepted auth: postgres://fxutzshsavlfyj:Yp1DA<..>eMOz7JfAtu@22.214.171.124:5432/d4ajdorhb758hq
Success! Let’s see how it turns out if we add the database’s certificate as the trust root.
# Terminal 1 $ ./postgres_get_server_cert.py 126.96.36.199 > ~/.postgresql/root.crt $ heroku pg:psql ---> Connecting to DATABASE_URL psql: SSL error: certificate verify failed # Terminal 2 $ ./postgres_mitm.py 188.8.131.52 2016-06-05 08:32:36,623 [INFO] Listening for connections 2016-06-05 08:32:46,759 [INFO] Client had an established trust root, could not intercept details.
Excellent, we can prevent the MitM by pinning our database’s certificate, since the random certificate presented by the MitM will no longer be accepted as valid. Note that doing this for Heroku means your app will suddently break when your database changes host as part of Heroku maintenance, do not rely on this in production!
Heroku has been made aware of these issues but have not given any indication as to when a solution is available, until then either use a VPN whenever you’re running
heroku pg:psql over WiFi, or use `heroku run bash -c ‘psql $DATABASE_URL’.
: Woot, Postgres turns 20 on July 8th.