Creating a Network-wide ad-blocker using Pi-hole - Part 1


Creating a Network-wide Ad-blocker using Pi-hole - Part 1

I'm going to be honest, I really hate modern online advertising. The idea of it is simple and harmless, having a space on the internet costs money, some times a lot of money depending on how popular that space is. Therefore, placing some basic adverts on a site which generates some income, enables the owner/s to utilise the money for the running costs of the website. However, in the wild-west of the modern internet, advertisements are not so basic. Websites and online ads this day in age track a lot of things, from clicks and mouse hover to what you have recently purchased whilst doing your online shopping. This data is collated about you, and can then be used by advertisers to create a unique profile. This then means they can choose particular adverts which they think you are more likely to interact with. They will also most likely have quite a substantial psychological profile of you.

You may think that this is fine, the data is anonymous, right? Why would I care that some advertising company has information about my likes, dislikes, what websites I frequently visit etc. if they can't connect that information to my real-life identity? Well, I'm afraid to say the bad news is that they probably can. Not only do these websites and advertisers collect the aforementioned data, they are also able to (and usually do), collect your IP address. Whilst identification through IP address doesn't usually hold up in court due to several complications, it can be enough to track someone across the internet and connect it to a real world identity/ location (this effect can be diminished by use of a VPN, there will be more information on that in a later post). If you would like to read more about these issues, a balanced article can be found on wired here.

Whilst it can be very difficult to stop this from happening totally, there are things you can do to try and mitigate the amount of information being collected and sent to these companies. One such thing is to block the advertisement's websites from which they're loaded - and also block the connection to websites where the advertising data is sent. This series of blog posts will be about doing just that using software called Pi-hole.

What is Pi-hole?

Pi-hole is an ad blocker. If you are familiar with browser plugins such as adblock plus, or ublock origin then you can imagine it as a plugin like that, but it works across your whole network. This means that it will be as though these plugins are installed on all your devices on your home network, be it smart TVs, mobile phones, tablets etc.. If you do not know what these plugins are then all you need to know is that they are able to block connections of communications to and from websites based on their domain name. To do this we should probably understand a bit about domain names and the Domain Name System.

What are Domain Names and the Domain Name System?

If you already know the answer to this question then you can skip over this section, if not and you want to know a bit more about how Pi-hole works then read on.

Just like in the human world, in order to communicate with each other, computers need addresses to be able to send information back and forth. Unlike addresses in the human world which are made up of words, names and numbers, addresses in the computer world are totally made up of numbers. Each website you visit is being hosted on a computer, or maybe even a group of computers. Therefore when you visit a website you are connecting to another computer using an address. To humans , this address will be the domain name made up of words (i.e google.com). To make it easier for computers this will be the IP address (numbers). A DNS server translates human friendly domain names to the more computer friendly IP addresses.

So in a world without DNS if you wanted to visit a website like eff.org (the website of the Electronic Frontier Foundation, a non-profit defending your digital freedom), you couldn't just type the domain name (eff.org) into your browser, you would have to type the computer's IP address from which the site was being served. Typing the IP address for a website doesn't usually work for an assortment of reasons, but this is a simplified theory behind domain names and DNS.

What has this got to do with Pi-hole and how it works?

When you visit a web-page, it is extremely rare that you are requesting data from that single domain (or computer). For instance going to a the BBC homepage not only connects you to a computer with the files for the bbc.co.uk website, it also connects your computer to a computer with the domain name of mybbc-analytics.files.bbci.co.uk/reverb-client-js/reverb-0.3.1.js. Which I am presuming, given the name of the domain, is the computer/ site from which the good 'ole BBC are hosting a JavaScript file which helps them analyse the traffic going to their main site (it is very common for sites to use third party JavaScript files for many things ).

So what happens if instead of the DNS server on our network pointing to the correct computer which is serving that analysis JavaScript file, it says that the computer with that domain name is a computer on our network? Then that site would not be able to load whatever document it was trying to load from the 'proper' computer and instead would load nothing. This is the premise of Pi-hole, it will look through a list of 'blacklisted' domains which are not to be trusted/ loaded and if a web-page is trying to connect to one of them it will tell the browser 'it's me, and there's nothing here'.

I hope this post has helped some people understand the intricacies of online advertising, and why you may want to minimise the connections from web pages to such advertising and tracking. Stay tuned for the next post in this series where we will get into the nitty gritty of installing Pi-hole on a Raspberry Pi single board computer!


Subscribe

If you want to get updated with the latest blog entries directly to your inbox, put your e-mail address in the form below!