The Border Gateway Protocol (BGP) exchanges routing information between autonomous systems. Routers use it to locally decide, among a set of neighboring routers, which router to send IP (and other) traffic based on the target network prefix. In our BGP blog post, we describe how BGP selects routers based on the best path selection.
At Datapath.io, we built our own implementation of BGP. This is needed, because we use the protocol in a non-standard way: we alter the best path selection on a per-customer base. Datapath.io accomplishes this in the following way.
High Speed vs Open Source
If the features of the protocol – they vary a little depending on the router vendor – do not meet the requirements of your use-case you have to write your own implementation of BGP, or ask your vendor to add the features you need (do not waste time trying J). For the sake of this post, we will stick to the first option.
Why? Because, there are tons of excellent open source implementations of the protocol that can be used as a base to start on: Quagga, BIRD, OpenBGP, ExaBGP, XORP… All of them reached a grade of maturity that allows to use them in production. But there is more to it.
BGP usually runs on routers that route hundreds of Gigabits of traffic per second. In order to achieve such speeds, router vendors employ special ASICs (application-specific integrated circuits). Those ASICs are specifically made to match and change bits in protocol headers at a very high rate. They are expensive to produce for router vendors. However, their customers tend to buy software features as to solely look at the specs of the ASICs. This is why router vendors bind their software to the ASICs and you are probably stuck with the features your vendor sells. They will not let you change the software on your hardware. Does that mean that there is no open source software that can be run on routers?
The software-defined networks (SDN) principle promises to change this: decoupling of the data-plane (vendor silicon) and control-plane (custom software) with an SDN protocol (like OpenFlow) in between. We use OpenFlow and let me say it right away: we love it! It allows us to run our very own implementation of BGP (and other protocols) on a dedicated server that programs the data-plane within our switch from remote. That puts us in a comfortable position, because every time we change something with our software, we can roll it out to our servers without changing anything at the switch.
How do you actually run BGP on top of OpenFlow?
Take a look at this picture. On the left side there are the routers of our transit providers like Cogent, Level3, GTT and Hibernia. They all connect to ports on our HP 5406zl2 switch. On dedicated servers, we run multiple apps that add features to the OpenFlow switch.
First, there is the FIB Handler App. It takes care of the database that defines which customer is routed over which link to the outside world per destination network prefix. This app employs two methods for doing this: installRoute() and removeRoute().
The BGP Router App actually hosts our BGP implementation. When it starts up, it installs a rule within our OpenFlow switch which actually says: “Whatever you see related to BGP, please hand it to me using the PACKET_IN/PACKET_OUT channel”. The PACKET_IN/PACKET_OUT channel is a feature of OpenFlow. It can match on header fields of Ethernet frames and IP packets. The app then receives full Ethernet frames with BGP data units inside from the switch. Now, we can use the BGP data units and talk with the other routers from within our BGP Router App., right?
Nope! IP packets can be fragmented, which means they are distributed over multiple frames, or, IP packets might not arrive in order or be retransmitted. Because of this, we need a full-featured network stack. There are Open Source libraries with implementations, but our software runs on an operating system which already has a full-featured network stack implemented: Linux. Let’s make a detour via the Operating System.
To employ the Operating System’s network stack, we use the TAP interface. To the operating system, the TAP interface looks like a physical network card, but frames can be pushed to and pulled from it using API calls from user-space. The operating system then delivers a reliable, ordered and de-duplicated stream of BGP data units back to our BGP Router App using the socket API. Inside our app, we are then doing our BGP magic. The results are pushed out to the FIB Handler App and installed on our OpenFlow switch.
You have now seen a solution which enables us to run our own implementation of the Border Gateway Protocol on our own dedicated servers. To achieve this, we use the OpenFlow protocol to gain access to BGP data units on our HP 5406zl2 switch. The data units are delivered as frames, so we curate them using a detour via the operating system. The operating system provides a full network stack that does the heavy lifting for us. Using the socket API we get the curated stream of BGP data units back to our system. Per-customer selected routes are translated to OpenFlow flows and placed inside our switch using the OpenFlow protocol.