3/01/2008

Search engine Redirect Spammers get a free Ride on Google, Live.com and Yahoo

Since around 2006, the spammers have been using a technique that essentially gets specific pages ranked high in the search engines for certain search terms. Then when the search engine serves results to an unwitting user, instead the spammer site embeds some script or a an HTTP refresh with a zero time delay, or a 302 redirect into the search result.

Here is just one relatively innocuous example (don't click it unless you want to examine the script, which is javascript obfuscated) <script src="http://cappa.pl/sutra/js/random"></script>. if embedded in a search result link or description, this will redirect you to some sex site or whatever who has actually PAID for this service. Yes, apparently the search engine hijackers actually make their clients pay for this stuff, in effect getting free money for no effort other than a little javascript programming. This particular little scriptlet currently appears in a google search 11,800 times , 111,000 times in live.com, (Yikes - 111,000 times!)  and 24,100 times in Yahoo! Unfortunately it is just one of may others. The search engines know this stuff is there, but they haven't been successful in completely removing it. Even if they do, the spammers will just change the url anyway. It seems to me, though, that they aren't even trying!

These scripts may be anesthetized to some degree in your browser showing search results, but if you consume any of these searches via RSS, the little boogers will be very much alive. The answer of course is to remove <script.. tags and their contents from all your results - both the link and the description field.

The following snippet is one easy way to do this (C#):

 

description =
                  Regex.Replace(description,
                                @"</?(?i:a|script|img|style|h1|font|h2|h3|h4|b|input|blockquote|pre|div|table|tr|td|span)(.|\n)*?>",
                                "");

You can put whatever HTML tags you want in the OR list of the Regex expression.

I find it difficult to understand why the major search engines still are permitting this crap to pollute their results after several years. Personally, I think that SPAM -- in any form -- is simply the lowest of the low. I don't understand why the broader community of Internet users can't seem to band together and find a way to put these scumbags out of business.

I mean, if somebody is firing rockets into your backyard with the sole purpose of killing people, you have a right to fight back, don't you? That's basically what the spammers are doing, in my view. Frankly, I think we are a bunch of big Chicken-Shits, because we cannot seem to fight back. We deserve it, for being a bunch of shitless morons allowing lowlife criminals to take over OUR Internet!