Why does my blog HTTP 503 Service Unavailable when I click around using Firefox on iPhone?

Updated on

After migrating my blog to a new host, I noticed my website would crawl to a halt while I clicked around. I would see hangs ranging from 4 seconds to 15 seconds. If I click enough, I’d eventually see HTTP 503 Service Unavailable. Why was visiting my website causing it to hang? Here’s my debugging story.

It was strange. In my previous post, I shared some tests measuring my website’s response times and the numbers were beautifully performant. Yet, I was seeing and feeling something very different. CURL would return within one second, but my real-world experience was terrible. So what was the difference?

At first, I just assumed that it was because I’m on a shared instance and something was going on with my host. Maybe they were getting DDOSed or another website on the host was using too many resources. After contacting support, they told me I was occasionally running up to my CPU and memory limits.

My first thought was: “that’s really weird”. I’m pretty much the only user of my own blog, so why am I already hitting resource issues? Was a search engine crawling my website eating up all my resources? Was someone DDOS’ing me? How was I able to send so many CURL requests successfully but fail after clicking a few links?

After playing around with my website a bunch, I realized I could only trigger this failure from Firefox on my iPhone. It won’t reproduce in Chrome or Safari on the same phone. It won’t reproduce on any browser including Firefox on my Mac.

I loaded up the access and error logs to see what’s going on. What I found in the error logs was A LOT of requests to apple-touch-icon.png and apple-touch-icon-precomposed.png. All of these returned HTTP 404 Not Found. These are favicon alternatives on iPhone for home page stuff. I hadn’t set a site icon yet, so they were correctly erroring out. I suspect what this blog post suggests: WordPress spends a lot of time processing 404s. I believe this is something core to WordPress because this occurs even when I disable all my plugins. Either way, when I set my site’s icon, I no longer get the CPU run-up from. Problem solved. 🙂

But wait, there’s more!

While investigating this, I dug into my raw access logs and found each time I loaded a page, I was making almost 60 requests to my server! I made nearly a thousand requests in a minute by just clicking around my website.

I decided to break down the requests.

  • 64% were to /favicon.ico, /apple-touch-icon.png, or /apple-touch-icon-precomposed.png which either resulted in HTTP 200 with zero bytes, or HTTP 404 because I didn’t have a site icon set.
  • 22% were to static content (CSS, pictures, etc) which seemed pretty normal.
  • 7% was for the original page, also pretty normal.

For every page load, I’m requesting some form of the site icon 15 times. But why?

Who’s asking?

Digging at the user-agents for the favicon.ico and apple touch icon requests shows 90% come from the user-agent “com.apple.WebKit.Networking/8614.2.9.0.10 CFNetwork/1399 Darwin/22.1.0” rather than Firefox. Not super helpful. This is the default NSURLSession user-agent for WebKit, the default web view producer.

I also noticed a weird user-agent that appears in 23% of all my requests: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_1) AppleWebKit/601.2.4 (KHTML, like Gecko) Version/9.0.1 Safari/601.2.4 facebookexternalhit/1.1 Facebot Twitterbot/1.0

Specifically, I noticed it’s facebot, Facebook’s Messenger preview generator but also weirdly twitterbot. That particular user-agent seems to make 25% of all requests to my blog. It seems to be an internet crawler, but the IP address for the request is coming from my device? I guess it’s client side, but what client is triggering that? Surely not Firefox, the privacy-oriented browser. After Googling for some tie, I didn’t find anything suggesting that that was the case. But I don’t have anything Facebook related embedded in my website either, so where was this coming from?

At this point, I kind of wonder if it’s a syncing thing. When I open the page on my phone, maybe Firefox also opens it on my laptop and that triggers a flood of requests for the site icons that didn’t exist. Or maybe my device is syncing to Firefox servers? But the requests are coming from my device, so this probably is not server related. Or maybe it’s because I have other open tabs and it’s refreshing all the favicons?

One weird thought I had was maybe my adblocker is preventing me from reproducing this on my Mac. I was wrong, but it was easy to check.

I decided to disconnect from Wi-Fi on my phone so that I can isolate which requests are coming from my iPhone versus my MacBook. If it’s syncing, I’m not sure who is making which request and the user agents really are not helping. Since they’re connected to the same Wi-Fi access point, their public IP will be the same. If I drop to cellular data, I’ll get my cell carriers IP address instead, which I can check with https://whatsmyipaddress.com.

So who is making all these requests? Definitely my iPhone.

I still don’t get how this all ties together. Why is a Facebook web crawler meant to generate the metadata in Messenger links querying my website when I visit it in Firefox while syncing is enabled? Why is twitter involved?

So What Triggers It?

I decided to make a list of actions that might be triggering this bug and do them with a 45 second gap in between so I can isolate the logs easier.

  • Open Firefox
  • Make a new tab
  • Type aggressivelyparaphrasing.me into the URL bar
  • Open my site
  • Refresh
  • Sync tabs
  • Close tabs

Turns out, every one of these actions triggers the flood of requests. That’s right, opening and closing my tabs triggers a flood of requests to my website. As long as my website is open in any tab, opening or closing any other tab will trigger a flood. This made me suspect that other websites are also getting this flood of requests whenever I do anything in Firefox.

Another thing I realized is that after these actions, about 8 seconds later, a larger flood of requests come in. I believe this is the background sync for tabs, but that’s just a guess.

So with all this in hand, I filed a bug which pretty soon got duped to another more active one. I’m not alone!

I think this is a good example of where filing a bug is super helpful. You can see here that I was really struggling to figure out what was going on. Other people are probably struggling too and experiencing the same thing but worse because their websites are so much more popular. For me, I lucked out because I could super easily isolate my traffic and reproduce this with my own device on demand. Others will just see their error and access logs. Maybe they’ll be able to guess the right keywords and the issue will rise to the top, but I think it’s esoteric enough that this would be hard to come by. Filing a bug just get’s us closer to sharing the knowledge that this is a problem. Filing a bug helps people centralize on where the problem is and show impact. It gives people a place to say “I’m not alone”.

Why all the computing?

At some point along the way, I started asking myself, why is my site unable to handle all these repeated requests? Why is my CPU spiking and eventually becoming unavailable when a client requests a missing favicon.ico 40+ times?

At first, I thought it might have been something specific to 404. I decided to install some tools to help me investigate. code-profiler can breakdown time spent on a request by the plugins processing time. Xdebug is a more classical profiler that I probably would have preferred but I had trouble configuring it on my host because I don’t have access to my php.ini.

I was able to reproduce the issue with just CURL by just running the request 40 times in parallel.

for i in {1..40}; do curl https://aggressivelyparaphrasing.me/missing.ico & done

However, after profiling, testing, and measuring, I realized that a request that 404’s costs the same as an uncached request. The profiles between an uncached request and a 404 are very similar, as are the times.

The problem is that the result of a 404 is always uncached with WP Super Cache, my caching plugin. I confirmed this by checking for the header and the metadata that the cache usually provides. After looking at some of the settings, I don’t think caching 404 is configurable.

So what now?

I filed the bug, I’ve isolated the issue, it’s someone else’s code. I can’t directly address the bug, but it did expose several interesting issues with my website. In this situation, it’s important to focus on things I can do. The five why’s (pdf) is a good framework to arrive at those actions because we tend to have less control over the higher level reasons. Some say this is how you get to the “root cause” but I think every level is a root cause where mitigations should be considered. In my case, I can’t directly control things like “PHP uses a lot of CPU for requests” or “Firefox on iPhone is spamming me with requests”. Instead, here’s a list of things I did or considered doing.

I didn’t have a site icon, which meant that requests for the site icon (favicon.ico, etc) will return HTTP 404 Not Found. I addressed this by adding a site icon. This cleared up my error logs and and reduced the amount of PHP running for these requests.

The missing icon was really just a trigger of another problem: 404’s are expensive. Part of why 404’s are expensive is because they aren’t cached. I tried to configure this in my current cache but it doesn’t look like a feature. As an alternative, I decided to try the cache that my host recommends: LightSpeed Cache. Empirically, it does cache the result of a request even if it 404’s, so this should resolve any similar bug where someone somewhere is requesting the same resource over and over again even if it doesn’t exist. Success!

While this solve the current problem of missing site icons, someone could just request random pages to force 404’s and take down my site. I tried to optimize my 404 page. However, in profiling, I didn’t find any plugins I could disable cheaply enough, the best savings was about 30% of the runtime. I tried it but I didn’t feel the difference, even though it was slightly better. Network latency just made that gain too small. I felt like I needed something that would make a magnitude of a difference. I also didn’t find any code in my 404_template.php that took much time. Even making it static made little measurable difference since most of the time is spent in WordPress or plugins. In this, I considered several options but nothing seemed good enough.

The last item I’ve arrived upon is my “Shared Unlimited” hosting plan can serve maybe a dozen uncached requests per second. It still opens me to a denial-of-service attack wherein someone just requests a bunch of uncached pages, or even random pages generating 404’s. Maybe I could optimize my 404 page, pay for more serious hosting, use something other than WordPress, or use some sort of CDN like Cloudflare. But hey, this is just a hobby site. I think it’s time for me to move on.

Category:

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *