Jacques Mattheij

Technology, Coding and Business

Microsoft's Bing versus Google, some observations

I’m going to try to not rehash any of the points that have already been made ad nauseum in the tech press elsewhere, instead I’ll try to lift out some things that so far stood out for me that I think have not had any attention at all. Both of these have to do with the only piece of numerical data that we have about this whole thing.

The first centers around the ‘gameability’ of Bing and the success rate that Google claims to have had with their purposeful injections of gamed pages in to the Bing index.

If a team of Google engineers with all their knowledge of how search engines work are not able to achieve more than a 9% injection rate on search terms for which there is absolutely no competition whatsoever then I think that Microsofts’ engineers should pride themselves on having the best spam rejection system that is currently in play.

Think about the odds of that being a coincidence: 91% of the bogus pages that Google tried to inject into Bing’s index did not make it. (see further below for an alternative explanation of this percentage).

That’s a false positive rate that is still annoying but we’re not talking about your average SEO tactics here, Google tried to exploit a loophole in the Microsoft toolbar as their injection vector (a thing that Microsoft for sure didn’t see coming) and still their success rate was miserable. So low a rate of return for the amount of effort invested (20 of the best search engine engineers in the world for an extended period) actually that garden variety SEO types would have given up crying long ago. If Bing really thought that much of Google as a signal then the percentage would be 100, not 9.

The next thing that I’ve noticed that strikes me as odd in this whole saga of 900 # gorillas slugging it out by throwing binary bananas at each other is that nobody seems to have clued in to one pretty important aspect here: Google does not always link to the search results. It used to be that way, but it hasn’t been that way for quite a while now. Presumably to make it easier for google to track what the users click on themselves (precisely what they accuse Bing of doing, only on their site), and analytics gives them this kind of data for the rest of the web.

For example: If I search for Egypt one of the links on the first page points to: http://www.google.com/url?url=http://www.time.com/time/world/article/0,8599,2045882,00.html&rct=j&sa=X&ei=9a9KTe35DMjR4gbPz-ToCw&ved=0CGIQ-AsoADAF&q=egypt&usg=AFQjCNFVM8yOQvDnZcH-2M_o78ryZ4LjKg

Which samples the click by having me go to google first, then doing a redirect to the target site.

After you’ve clicked that link you end up on:

http://www.time.com/time/world/article/0,8599,2045882,00.html

By way of a javascript redirect.

So to ‘copy’ the Google result you’d have to decode that URL or wait for the user to click and and catch the redirect, both of those would require Google specific code.

There is a lottery system here that determines whether or not you get to see the real link or the one that does the redirect, sometimes you get only google links, sometimes only direct links to the target websites, sometimes a mix of both.

Of course it is possible for bing to decode that url but that’s not the same as picking up what the user clicked on. What they clicked on is a URL that points to Google. So if the explanation that Microsoft has given to date is accurate then some of those links in Bing should point to Google, not to some other site. This in my opinion is proof positive that Microsoft really does have some Google specific trickery in the toolbar more than anything that I’ve read so far. It is also possible that the 91% that didn’t ‘make it’ was actually because they were pointing to google rather than to the target. Of course Bing does not like to link to its competitor and filtering out www.google.com/url can’t be that hard.

These two items need more scrutiny I think. Is Bing really harder to game than Google? Does Bing ‘copy’ the result when the user clicks on it but only when the link is not to google? If it is as Google claims that Bing ‘copies results’ (which has been successfully used as the term under which this was pushed in to the media) then you’d expect a much higher rate of success and as long as Google can’t claim even 10% success rate I think Bing is for now off the hook. After all if it were a ruthless copying then Bing would have 100% of those pages, not a meager 9%. If those result pages contained ‘sampling’ links that would be a pretty good explanation of why they aren’t in the Bing index and that would be even more damning for Bing.

If that was the case then Google should bring it up.

If the URLs that Google uses are being decoded in the Bing toolbar or on Bings servers then that’s proof positive that Bing really does copy Google’s results to some extent or at a minimum that they ‘cleanse’ URLs found by the toolbar to spot redirects. If the evidence is in the toolbars then some clever hacker of the Daeken variety should be able to dig it up, if it was done on Bings servers then maybe some Microsoftie can do an upload to wikileaks with the offending source file. The final alternative, that it’s a non-Google specific translation of 301 style redirects to the actual link would be enough to get them off the hook on that particular aspect.

I have no dog in this fight (I root for the UnderDuck) but I don’t remember being so interested in a fight between two of the giants. The tone of voice of this battle and the speed with which it develops is something that you don’t get to see often in the technology world.

Finally, I think that the largest copyright violation institution in the world accusing another party of copying a bunch of clicks is absolutely hilarious. Google Images, Google Books, Google News, YouTube, the google cache, the list is literally endless and the amount of data is simply staggering. Google images takes this to a new low by showing the original page framed on the google.com domain.

Google claims permission but hiding behind ‘you had no robots.txt file’ does not cut it for me. Google takes content left, right and center, and then passes that content off as its own by aggregating it in its own services and serving that content up from the Google.com domains.

We’re not talking about orphaned content here, we’re talking about content that lots of people and companies have created at considerably cost and hope to make a living of and that Google services have crawled and scraped from all over the web.

For Google to get upset about all of 9 crummy pages that made it after a lot of effort in to Bings index seems a serious case of the pot calling the kettle black. Before someone brands me a Microsoft fanboy, I do not use ‘Bing’, have a Microsoft free environment here and I believe that companies should compete without snooping in each others kitchen. But I also believe strongly that those who have no sins should be the ones to cast the stones and Google is certainly not without them. The web is built up out of links and if there is info available to the Bing toolbar about links that can improve Bings search results then Microsoft would be crazy to ignore that data, if Microsoft/Bing and Google want to make a gentlemens agreement to never scrape each others indices directly or indirectly then I’m all for it, but currently such an agreement is not in place and to suggest that Microsoft has done something dishonorable is simply ignoring that they do what they’ve always done, fight the fight with all means at their disposal. That’s why they are still around after all this time. If Google has a case here they should sue instead of this ‘trial by media’.

I’m wondering if the timing of Schmidt stepping down has anything to do with this, according to the time line as presented by Google you could make the case that Schmidt decided that he did not want to be at the helm when this particular tactic was brought to bear.

Whatever the real story is about all this I am quite convinced that there is a lot more here than meets the eye, and the truth is hidden somewhere in that 9:91 ratio.