Jacques Mattheij

Technology, Coding and Business

What you could do if you were google and had their databases

I’ve been wondering for a long time what exactly the added value of Google+ is to Google, and why it is that they are pushing it as hard as they do. After all Google has tried ‘social’ before and many voices have been raised effectively saying that Google should concentrate on their core business, search and leave social to others. But what if they had no choice? What if they actually needed social, and needed it badly?

This article outlines a hypothetical way in which Google+ could be of major strategic value for Google in its core business, search.

Ingredients:

  • A search engine + associated index

  • An analytics package

  • A social network

  • A very large number of servers

  • A group of talented developers

Suppose that you have a search engine that is mostly algorithmic in nature, and that you have redefined the way people navigate the web. Sooner or later you’ll find that the very web that you relied on to generate your index in the first place starts to rot from the inside out. This is a classical case of the observer changing the thing they observe, it is impossible to avoid this effect for two reasons:

  • people that relied on links will now rely on your search engine reducing the value of links between sites

  • people that realize that you value links will create lots of them still further reducing the value of links between sites

Something simply has to give, at some point the value of links will be so low that they are no longer a significant input to the search algorithm, and there are only so many alternative inputs that you can come up with using the content found on the web. But what to do instead?

From here on this is pure speculation…

An analytics tool would give you a way to check who is viewing what, but it does not tell you much about the viewer. A social network doesn’t tell you much about what people are reading but it does tell you something about their reputation, typically people with higher status in a social network and that belong to certain professional circles could be assigned a degree of expertise in a variety of subjects. More divisions between skill sets and knowledge area would give you a finer grained idea of this, which is nicely matched up with circles.

Combining the components analytics + social network gives you an idea of which people in the various social circles of what standing within a certain subject are reading which web pages and other articles.

This can then be used as a way to shore up the degradation of the value of links because after all, it helps a lot to know who gets their information from where. The typical ‘spam’ page will be closed within a split second but someone that is professionally engaged in a certain field carefully studying an article in that same field is likely a good indicator that such an article is at least worthy of some notice.

Exactly what weights to assign to the inputs from a social network combined with analytics data about who visited which page for how long is a difficult thing to work out but I’m sure that if you used some kind of feedback based system where a number of guide users were followed closely and the output of the search engine would be optimized using the data gathered that you could get this to a point where there would be a significant boost in the quality of the output of the search engine. This sort of datamining is exactly what Google excels at and it is stuff the technologists employed there would love to build. I know I would ;)

After all, there is no algorithm as good at discriminating bad content from good content as the human mind. If putting people in the traces to solve computationally hard problems worked for image classification then given a large enough userbase you could do the same thing for search, if you had a way to establish what people view and what their reputations are. There is still a feedback loop in here caused by the effect of people searching for a large part of their information via Google itself but with access to the analytics information of a very large portion of all the websites out there Google can simply subtract itself out for those visits that it was directly responsible for. That way the things that are left are professional communications between people (say email or DM) and personal collections of links and links followed from sites the user (and not some algorithm) deemed important.

If my hypothesis is correct then it is very well possible that every user in Google+ is actually now an unpaid Google employee whose actions are used to influence the google search results as a way to stop further erosion of the results due to spam, and it would explain why Google is promoting Google+ as hard as it does.

It would be nice to have a way to falsify this, one of the main reasons why I believe this might be the case is because it dovetails nicely with some of the changes to the google privacy policy that were made recently.