UESPWiki:Administrator Noticeboard/Archives/Blocking Rogue IPs at the Server

A UESPWiki – Sua fonte de The Elder Scrolls desde 1995
Semi Protection
This is an archive of past UESPWiki:Administrator Noticeboard discussions. Do not edit the contents of this page, except for maintenance such as updating links.

Blocking Rogue IPs at the Server

Given that the site is generally struggling so much, it really irks me to notice IPs that are clearly trying to systematically download large fractions of the site (even though they keep getting blocked by 503 errors), or that are repeatedly trying to post spam (even though the attempts are getting all getting blocked by captcha). Yet these unquestionably bot-controlled IPs keep showing up in the server logs. For example, 72.165.35.198 was denied access by the server 353 times during a 12 hour period today; some of the articles that this IP was so interested in obtained included <sarcasm>highly popular</sarcasm> pages such as Category:Oblivion-Factions-Nine_Divines-Primate and Special:Recentchangeslinked/Oblivion:Esbern (each was denied 6 separate times). And these were just the requests denied by mod_limitipconn (denied because the IP was trying to open too many connections at the same time).

Using iptables, it is possible to completely block certain IPs. This is a block at the server level, not just at the wiki, and completely denies the IP all access to the site. The IP would no longer be able to view a single wiki page, view any of the old site, view the forums, or anything else. If used against a legitimate user, that user would have no way to contact the site to point out the mistake. It's a pretty extreme measure, but one that has been used in a few past cases (as documented at Bad Addresses).

So what I'd like to throw open for debate is: Should we start blocking a few more of these IPs? And if we want to start doing it more widely, should there be a protocol in place to prevent the possibility of an IP used by a real reader from getting blocked?

A few ideas:

  • Before blocking an IP at the server level, add a message to the IP's talk page. For example "Unusual server activity has been reported for this IP, as a result of which we believe that this IP is being used by a bot to monopolize system resources. To protect the site, this IP address is about to be completely blocked from any further access to UESP. If you have been directed to this page because you are using this IP address, please post a message here immediately to tell us that a legitimate reader is using this IP."
  • If after an hour (?) no responses appear on the IP talk page, and the IP is clearly continuing to download site content, then proceed to block.
  • Keep track of the IP, date, and time of all such blocks on Bad Addresses (tweak the table format perhaps, or add a new table to mark the start of this new protocol).
  • After sufficient time has elapsed (one week? one month?), lift the block, again recording the info at Bad Addresses.
  • As long as the IP resumes its suspicious activity, continue to reinstate blocks. I'm really reluctant to impose such an extreme block on a permanent basis. I think it's worth the small amount of extra effort to lift any such blocks periodically, even if the block just needs to be reinstated again the next day.

As for what types of behaviour would trigger this, unfortunately, I'm not sure that it's easy to come up with a clear set of rules. I think it will ultimately have to be a judgment call on the part of the person who makes the block. However, an IP would have to trigger numerous error messages (hundreds) over a period of several hours. We clearly want to avoid at all costs blocking a legitimate user who just hit refresh too many times while trying to get a page to load when the site was busy. Also, I'd say the downloaded pages would have to appear "unusual"... which is where the judgment comes in.

At the moment, the only person who can do iptable blocks is Daveh. If we wish to move forward with this, I'd like to request that I also be given permissions to add/delete IPs. If other admins notice highly suspicious behaviour from an IP in the server logs, they could post the user talk page warning and add a request (e.g., at UESPWiki talk:Bad Addresses); then Daveh or I could take care of the actual block.

Until we try it, it's hard to say whether this will have a noticeable effect on site performance. Worst case, it will at least reduce the frustration of seeing bots show up in the server logs when you're unable yourself to connect to the site. Even in the best case, I doubt it will fix all the server slowdowns (I'd like to believe that the majority of the connections to the site are coming from legitimate users rather than bots!), but maybe it can at least make it so that the site no longer refuses to respond to anything for 15 minutes at a time.

(P.S., I've also been posting a series of other more mundane/technically obscure suggestions for performance tweaks at UESPWiki talk:Upgrade History. So this isn't the only option for how to improve the site's performance.) --NepheleTalk 03:20, 17 January 2008 (EST)

Support: Not sure if this is a voting one but hey... As we discussed earlier, I'm in favour of this. I'm not going to deny that such an extreme measure makes me feel a bit nervous but I can't think of anything else that's going to have the desired effect and the safeguards you've mentioned seem adequate. My only remaining concern is that it's yet more work being loaded on to you and Daveh. –RpehTCE 04:44, 17 January 2008 (EST)
As an addendum to that, I'd suggest that any IP already blocked, say as a nonsense bot or for span, can be added immediately without the hour waiting period. If they're blocked, a legitimate user would already have appealed. I'm seeing several known nonsense bots accessing the site and it seems a waste of time to ask them if they'll be inconvenienced :-) –RpehTCE 06:13, 17 January 2008 (EST)
What would a blocked IP see if they tried to access the site? If it's some sort of error message (404, 503, etc.), can we customize that error message to explain to them exactly why they've been blocked, and maybe give them a means of contacting someone to contest it? I mean, I'm all for going gung-ho against bots whenever possible, as anyone knows who's seen some of my more extreme suggestions for dealing with them, but leaving people without any explanation or way to contest a block makes even me a bit nervous. I know it's possible to make your own 404, 503, etc. error messages instead of using the browser-default, and it seems to me that this would be one way to at least leave some sort of recourse on the off-chance that a legit user is somehow affected. (It's possible that a legit user might have a trojan that is running from their IP, or that a proxy could fake its IP from another location, or even that certain dymanic IPs which get moved around and used by many separate locations might be affected in this way.) All of our other methods of blocking, such as those used on Nonsense/Spam bots and other open proxies, all of them still allow the blocked IP to post on the talk page if they wish to contest the block, but this would prevent any such chance, and has the potential to affect legitimate users if we're not extra careful about it. --TheRealLurlock Talk 13:43, 17 January 2008 (EST)
I have seen scripts that automatically IP block an address at the server level my monitoring server logs for DoS like events (like the ones Nephele was talking about). This sort of block results in no error page (that I'm aware of)...its just like the server does not exist (the web server never sees the request). The 503 error page results from the web server DoS module kicking in but if the client is running some sort of download software (or whatever) it probably wouldn't make any difference. Perhaps a temporary automatic IP block (for a few days) is more appropriate in such an event. -- Daveh 13:56, 17 January 2008 (EST)
It's quite clear from the error logs that whatever bots are involved here are basically ignoring the 503 error message. They just keep trying again and again until they get the page they're trying to access. So it seems likely that ultimately the iplimitconn isn't doing anything to limit the number of IP connections; in fact, it's really doing the opposite since the bots will now make 5 or 10 HTTP requests instead of 1 to obtain a single page. To the extent that it's true that the bots keep trying, it may not be doing anything to limit the bandwidth use either, because they still get the document in the end. Not to say that iplimitconn is doing nothing. At least it's slowing down their requests: the downloads are spread out over a longer period of time, and in the meantime more regular users can get in (hopefully).
I'm also concerned, although I haven't been able to confirm it yet, that when the bots are blocked by iplimitconn, the bots are somehow forcing the 503 connection to stay open until apache forcibly times out the connection. It is clear that when the site gets busy there is a problem with incoming "R" requests hanging in "R" mode for a full 300 seconds; when one quarter of the site's connections are stuck open for 5 minutes at a time that's definitely going to have an impact on site accessibility. Unfortunately, the apache server status doesn't allow you to see the IP address of these "R" requests so I can't confirm where they're coming from. All I can say is that times when a lot of "R's" show up in the server status reports do correspond to times when a lot of iplimitconn blocks show up in the error logs (which admittedly could also just be that when the site is busy, there's more of everything going on).
In any case, staring at the logs too much over the last few days does make me think that we need something that's more effective against these bots. Even if it's only a short term measure until we can find other ways to improve the site performance: if the site was running smoothly 100% (or even 95%!) of the time, I wouldn't really care about them. But right now, it seems to me very likely that legitimate readers (and editors) of the site are being denied access because of these bots every time there's a site slowdown. I'd much rather take the (small chance) of locking out a real person with a bot-infested computer than continue to very certainly turn away real users day after day.
More specific comments:
  • It's true that IPs who have already been blocked as nonsense bots on the wiki probably don't need an extra message. But as I've been pondering the feedback, I think it may still be worth adding the extra message just in case there is a legit user who never cared about the wiki block, but suddenly notices the problem when he loses all access. In this case, it wouldn't necessarily be an ahead-of-time warning, but more of an after-the-fact explanation once the user gets access again (yes, the wording of the message would also need to be tweaked accordingly... assuming we go with the manual approach instead of some newfangled automatic apache mod!). Also, just to be clear, I don't think we need to go through and do a server-level block on every IP that's ever been used by a nonsense bot. I'd say that only bots that continue to show up in the logs need further action (and, again, only with temporary blocks that get reinstated as long as activity continues).
  • We could customize the 503 error messages that are currently being displayed when IPs get blocked. Which might in fact be helpful, since it's clear that most editors don't know what the messages mean when they first see them.
  • The whole point of a server-level block is to completely prevent our computer from having to do any work at all. Apache (the web server that provides all HTTP responses) never even needs to see the connection, and therefore apache doesn't need to waste any of its resources deciding how to respond. Therefore, it's not really possible to provide a friendly explanation message. Thus the caution about extended length blocks and trying to notify ahead of time.
--NepheleTalk 19:57, 17 January 2008 (EST)
I too have been wondering about those lingering 'R' connections which I don't recall seeing before, at least in the amounts there has been lately. If you're familiar with netstat you can login into the server and do more specific lists of IPs connected to the server. While I haven't noticed anything recently I have used it in the past to catch 'bad' addresses DoSing the server in some manner. For example:
   netstat -an | grep ESTABLISHED | sort -k5 | more
lists all established connections sorted by IP. Note that its not too unusually to see a few IPs with a dozen connections since the OB/SI maps can easily generate a dozen server requests for each view. -- Daveh 09:22, 23 January 2008 (EST)
OK, I just happened to catch one of the particularly suspicious clusters in action. On server-status 20 R connections appeared within 10 seconds of each other and, when I noticed them, had been lingering for 199-208 seconds. The server was otherwise pretty quiet (only 17 other active requests) and had been quiet for a while, so it's unlikely that these were triggered by the server getting bogged down. When I used the netstat command, lo and behold, there were 20 established connections from 89.128.216.85. Then in the process of writing this, even more R's appeared, and netstat is showing a huge burst of connections from both 24.201.104.51 and 89.128.216.85. Neither of these IPs is being reported by server-status (i.e., the connections do seem to correspond to the unidentified Rs). In netstat, both the sendQ and recvQ columns are 0 for all of these connections which (if I'm reading the man pages properly) says that neither direction claims that more data needs to be sent. Most of the other established connections had non-zero values in the sendQ column.
The final interesting piece of the puzzle is checking the error_log file for apache. Just doing a grep on the last 5000 lines of the error log, 89.128.216.85 is only showing up once as being blocked by apache for exceeding the connection limit; 24.201.14.51 is showing up 6 times, but all from 4 hours ago. (Both do come up more times as I scan deeper back into the error log). Which means that I'm not sure that iplimit is doing anything about these connections. I'm guessing that iplimit is waiting for the IP to send a request before trying to block or get rid of them (since the iplimit criteria are all based upon which files are being requested). As long as they just hang there, the server's letting them monopolize our connections until finally the connection times out.
Everything seems to confirm that the lingering R connections can be tied to one or two IPs that are misbehaving. And our current measures aren't doing much to control these IPs. --NepheleTalk 02:58, 24 January 2008 (EST)
Last night I specifically checked during a time when the site was quiet to be sure there weren't other extenuating factors; on the other hand, it meant that having two IPs block 40 of our 100 connections for 5 minutes at a time wasn't really interfering with any other readers. Today I figured I'd snoop as the site got busy and confirm whether the same activity is happening when the site starts to bog down.
In the server-status snapshot, all 100 connections on the server are now busy. 55 of those connections are lingering R connections that are more than 2 minutes old. netstat shows 28 established connections from IP 71.180.214.84 and 27 established connections from IP 81.214.45.167. Neither IP is visible in server-status, so these two IPs are indeed responsible for all 55 R connections. With them blocking more than half of our connections from legitimate users, it's no real surprise that we're all having trouble accessing the site. In the time it's taken me to write this, all of those 55 connections timed out. But now 71.180.214.84 is back again with 65+ connections, from that one IP alone. Needless to say, server-status is completely clogged up using all 100 connections, but the vast majority are Rs.
Just to do some quick math: the server averages more than 30 requests per second. If one of these IPs blocks 20 connections for 300 seconds, that's nearly 10,000 requests that are unable to get through each time one of these IPs attacks us. And from what I've seen in the logs, these IPs keep doing it time and time again for hours. We really need to find a way to get rid of these pests. --NepheleTalk 13:06, 24 January 2008 (EST)