Algorithmic filtering and why you don’t see what everybody else sees

By Kittie Walker | 31 August 2015 |

A search engine is not unlike a black box; we ask it a question, it performs some magic behind the scenes and then spits out an answer for us. Part of this transaction is an assumption on our part, namely that the answer will be honest.

And for the most part, it is. After all, if the search engine doesn’t provide useful results, we won’t use it. However, because of algorithmic filtering, each of us may get a different answer because the search engine knows what we like. This phenomenon is called the “filter bubble” and it is important for marketers to understand when creating targeted campaigns.

What is an algorithm?

In the context of a search engine, an algorithm is a comparatively large amount of computer code whose task it is to decide (i.e. calculate) how relevant a piece of content is to your search terms. Aside from the search terms, it’s given some details about you and your search history, and in addition to finding what it thinks you’re looking for, it will also sort and filter those results based on what it thinks you want to see.

Our online lives are now seamless with our offline lives, for all intents and purposes. We participate in leisure and work activities online without differentiating between the two. The term “in real life” is no longer as relevant as it used to be, if at all.

As new media has been adopted and the digital world absorbed into our private sphere, algorithmic filtering has been normalised, bringing some order to the daily deluge of data we’re exposed to online.

On the surface, algorithmic filtering would appear to be a good thing. After all, it filters out excess noise from the availability of too much information. It also enables people to connect, renew and form relationships and communities with like-minded people from across the globe.

These are all good and positive things. We have easy access to people, education and information we could not have dreamed of having thirty years ago, but there is also a downside to the way in which algorithms do the filtering.

Based on some recent conversations with clients and peers, I wanted to take a deeper look at some of the issues we face as algorithms become more deeply embedded in our interactions online. To start with, the lack of transparency in how they work, our obliviousness to how data is served to us and the danger that we are constantly being fed what we want to see and hear, with no other worldviews entering our filter bubbles.

The mass of data on the web means we need algorithms

With at least 4.96 billion pages indexed and 181,414,052 active websites on the World Wide Web, it is clear we need an efficient way to search for and retrieve data which is relevant to us. The main search engines (Google, YouTube, Facebook, Bing, Yahoo — powered by Bing and Ask; not strictly a search engine now — in the Western World) complete this function for us using algorithms to interrogate the data.

A plain search with no personalisation towards relevance would return every page on the web indexed for the term we’re searching for. This is how the original search engines worked, but with more and more data held, both relevant and non-relevant, it is no longer an efficient method for retrieving information.

Search engine companies have had to devise algorithms to deal with the ever-increasing complexity to keep providing us with relevant information. Google, for instance, uses several algorithms culminating in a matrix which returns results personalised to you in an order where the top ten results displayed are as diverse as possible so that you can then delve deeper by being more specific with your search terms.

This semantic search algorithm works well and learns from the choices you make; noting when you click through to a particular site and the length of time you spend on it. The more you use that site and the longer you stay on it, the more likely you are to have this site returned to you in future search results.

The same type of filtering happens when you pick films to watch on Netflix. You’re served up suggestions based on your own previous choices, but alongside this algorithm is another type, called a “collaborative filtering algorithm”. This filtering of the data is done by extrapolating the viewing or usage preference of other users to predict what you may like as well. Many recommender systems such as Amazon, eBay and, to an extent, the search engines, also use this method to serve you up alternative suggestions.

Like any other algorithm, the search engines are constantly evolving in order to stay relevant to their users. Some significant algorithms have been added recently by most major search engines to make search results even more personalised to us. These are localised results based on your physical location at the time of searching (determined by GPS, chip in your device or IP address), who you have in your social networks and what you’ve liked or shared within those networks.

The only way to do carry out non-personalised searches today is to use specialist search engines, if you know where to look for them. But this type of search engine isn’t widely used by the general public and is in most cases less useful.

So, are they a good or a bad thing?

On the whole, algorithms have more benefits than downsides, because they allow search users to find the information they want in the most efficient way possible. All you need to know is the search term you are looking for and the answer to your question is just a search query away.

You will never be returned every result relevant to your search term because the search engines can only return the pages they have indexed. Not all the search engines index the same pages nor do any of them index every page they crawl. They use algorithms to decide whether a page is worth indexing at all.

So, the database you are searching is incomplete by design, because not everything on the web is relevant, or even unique, not to mention safe (in other words, free from viruses, malware and spam).

However, this is not as ideal as it might seem at first glance. There are several reasons to be concerned about the information served up via search results in the main search engines and across social media sites.

1. Algorithms are not transparent

How an algorithm works is a closely guarded secret and proprietary to the company that owns it. On top of this, we do not always seem to recognise that an algorithm is being used and that the information we are receiving is filtered in accordance to our interests, our friends’ interests, our physical location, by what we own, by whom we support and what we have liked and shared across the web in the past.

Every set of search results is unique, crafted by the algorithm specifically for us. This is not something we seem to be aware of or question in the search results, but we do if the newsfeeds in our social networks appear to be manipulated. It is a good thing to have the noise cut through for us, but it is a bad thing we are not always concious of the full extent to which the information we receive is manipulated.

2. The internet was created to be free and open

Some of the ideas behind the web came about from very idealistic notions, particularly in the case of individual freedom. It was hailed as the ultimate democratic tool, one that could be used beneficially to engage the populace and create a true democratic process.

With the advent of Web 2.0 and the rapidly increasing sophistication of the search algorithms, we are living in filter bubbles. Most of the information we find on the web via search is information which reinforces our existing views and we are unlikely to be exposed to unfamiliar ideologies, and therefore it follows that a truly democratic process cannot exist..

From another perspective, the algorithms also negatively impact the web’s ability to allow citizen journalism or internet forums to become the fourth estate in the public sphere. If our internet filter bubble is “autopropaganda, indoctrinating us with our own ideas, amplifying our desire for things that are familiar, leaving us oblivious to the dangers lurking in the dark territory of the unknown”, as Eli Pariser tells us, then each user will remain firmly in their own sphere online with no one holding each other to account.

3. Algorithms are designed for commercial purposes

They are not just here to be of assistance. Users do not pay for the privilege of using search engines, although the engines do have to provide exceptional results to the user so they keep coming back.

The organisations who own the search engines, like the newspapers before them, make their money through the selling of advertising space. And in the way The Financial Times can put a premium on advertising in its pages, so can the search engines by making sure the adverts they serve in the search results and by placement on third-party websites are relevant to the user carrying out the search.

This means that they need to collect as much information about the user as possible. They also rely upon social capital from the networks you build online. They can use algorithms to determine your relationship with someone and how that might influence your purchasing.

For instance, we often raise two types of social capital online: “bridging” and “bonding”. Bridging includes ties to people across disparate races, religions, genders, classes and so on, whereas bonding is the psychologically deeper link of like-minded people with a commonality of interests and hobbies. If a marketer can determine via an algorithm someone’s influence over you, they can use that information.

In conclusion

Algorithms in and of themselves cannot be either good or bad. Because of the information overload in the digital world, they are a necessity, as we could not possibly filter the information on our own. But like everything else, algorithms can be subverted, whether intentionally or unintentionally, to serve a specific agenda.

The main issue isn’t that we don’t know about algorithms (although awareness of them isn’t as complete as it ought to be) but rather that we don’t understand their reasoning.

We know that search engines do something to return results to us but we don’t know what. We know our results won’t be exactly the same as the person next to us. But, we don’t know the reasons behind either. It may be that, to us, the convenience of getting fast relevant results positively outweighs the collection of personal data, and perhaps we don’t mind remaining in what can be thought of as ideological comfort zones.

As the web continues to expand, our reliance on algorithms will become more and more absolute to deal with a mounting avalanche of noise and find the increasingly less readily available content.