BGP protocol has allowed network operators to apply and enforce the most varied inter-AS routing policies during the past 30 years. It is amazing how this protocol efficiently sustained the ever-increasing number of subnets and AS’s, as well as the evolution of the Internet from a mostly hierarchical structure made of customers and providers to a structure where peering and IXPs become more important every day.
Despite all its good qualities, BGP shows several vulnerabilities which, if exploited, can cause ripple effects all over the Internet. The root of the problem is that BGP was conceived in an early development stage of the Internet when there were only a few players. Consequently, its design didn’t consider protection against deliberate or accidental errors, so malicious or misconfigured sources can potentially propagate fake routing information all over the Internet, exploiting this lack of protection. Even worse, the source of fake or malicious routing information could be either a real BGP peer or a fake peer, since BGP runs on TCP/IP and is consequently subject to every classic TCP/IP attack such as IP spoofing.
Malicious Route Hijacking
Prefix hijacks are deliberate intentional generation of bogus routing information. The attacker could announce routes to disrupt the services running on top of the IP space covered by the routes, or hijack the traffic to analyze confidential information flowing towards that service. The attacker could also simply announce routes with a crafted AS path to show fake neighboring connections in famous websites, like the BGP toolkit of Hurricane Electric. Or even worse, the attacker could hijack the traffic to manipulate the flowing packets at his/her will, or simply want to exploit unused routes to generate spam.
Unintentional Route Leaks
Route leaks are unintentional generation of bogus routing information caused by router misconfigurations, such as typos in the filter configuration or mis-origination of someone’s else network (fat finger). Even if unintentional, the consequences of a route leak can be the same as the prefix hijacks.
Let’s consider a scenario to better understand how route leaks can occur. In the diagram above, AS 5 is a normal network operator which simply applied wrong BGP filters, such as “accept everything from my provider, announce everything to my provider.” This is sadly not an uncommon case, and it is an error that several AS’s can do when switching from a single provider (where this rule works fine) to multiple providers (where this rule would make the AS a transit of each provider).
Due to that mistake, now AS 5 will propagate everything it receive from its provider towards another provider, clearly against the valley-free property. This piece of routing information will then spread all over the Internet and AS’s will start routing traffic depending on the result of the BGP decision process of each AS.
Now think again about the 60,000 AS’s in the Internet and imagine that AS 4 is a rural service provider with few resources, both technical and economic. This would mean that the upstream connection he/she bought from his/her providers is probably very limited, thus making the two links a bottleneck in this route leak scenario. In this case it is possible that AS 5 will not be able to handle the amount of traffic directed to P, causing not only an additional delay, but also several packet losses.
This was the case of the route leak on June 24, 2019. In that scenario, misconfigured routes announced by DQE Communications were picked up Allegheny Technologies, and then forwarded to Verizon due to bad filtering. When Verizon accepted these routes and advertised them to its own neighbors, it created a bottleneck that spread to other networks, including Cloudflare, AWS, Facebook, Comcast, T-Mobile, Bloomberg, Fastly, and private networks belonging to at least nine American banking institutions.
Outages like these bring the spotlight back to monitoring strategies and how the right monitoring strategy can mitigate and even prevent such incidents. Visibility at different layers of the application is necessary, and that’s only possible with proactive synthetic testing from the end users’ perspectives (wherever those end users may be). By only monitoring from cloud locations or from limited geographical locations, you’re bound to miss critical network-level data.
Comprehensive end-user monitoring is the only way you can take control during a major incident; this monitoring strategy is what answers crucial questions such as: