URL filtering with Pi-hole
Into the Funnel
Supporting browser plug-ins, network-based DNS blockers like Pi-hole help protect you against online tracking and unwanted content.
One episode of the award-winning TV series Futurama depicts the Internet as a metaverse in which advertising banners attack users' avatars like birds of prey: "The Internet! My God! It's full of ads!" Even without a metaverse, Internet users today are tracked by trackers and cookies and flooded with unwanted advertising. But users can protect themselves against this flood of advertising. There are various methods of evading tracking by advertisers, confusing trackers, and keeping unwanted content out of websites. With the help of the free Pi-hole [1], this article looks at a couple of effective approaches that help protect you against unwanted content at the server, network, and client levels, while minimizing the threat of phishing at the same time.
Proxy Filter with Problems
In the early 2000s, the proxy filter was the best way to protect yourself against unwanted content and threats from viruses and Trojan Horses from the web. Clients do not request the content of a website directly from the web, but pass the request to a central proxy server such as Squid. The server then retrieves the content, stores some of it in a local cache, and returns the information to the browser. In times of limited bandwidth, proxies were popular mainly because of their caching function, which meant that less information needed to be retrieved over slow Internet connections. Plugins such as squidGuard blocked unwanted content at the proxy level, while other extensions inspected the content of websites directly and checked for malware.
The proxies' work was made more difficult by an important security feature: HTTPS. Encrypted protocols need to pass through a proxy server without change, meaning that their content cannot be filtered, unless you break the encryption. This method, SSL Bump, is still used, especially by large companies: The proxy server terminates the SSL connection of the accessed website and inspects, filters, and caches the decrypted content. For communication with the client browser, the proxy then encrypts the data again, but uses its own certificate for this purpose. For a scenario such as this, the administrator needs to modify the configurations of all the browsers on the LAN so they accept the proxy's certificate for all URLs.
The biggest problem with any proxy filter is not technical but legal. It is a de facto man-in-the-middle attack that decrypts all Internet communication of all users. As soon as a user logs on to private services such as home banking, shopping, or social media on his company PC, the company proxy also decrypts and stores the private data such as passwords or shopping cart content. This intrusion into the privacy of users is not permissible. For a filter of this kind, the employer needs an agreement that has been approved by the works council and signed by all employees, and that, for example, generally excludes private use of company PCs. It would also be possible to create filters that completely block access to services such as banking, shopping, and social media.
As an alternative, the company could provide a separate, unfiltered WLAN without a connection to the company network for the users' private devices. In addition, modern filtering proxies such as Squid can use rule-based forwarding (peek and slice) in addition to simply "bumping" SSL connections. A ruleset decides which connections are broken and examined by the proxy and which are tunneled directly to the client without decryption. This, in turn, would allow users' private traffic to pass through untouched. But a setup like this renders the proxy ineffective as a security measure. We will not be looking at a Squid setup with bump, peek, and slice in this article, but instead investigating a solution that is legally far less problematic.
Not Ordered, Not Picked Up
Another method for keeping out unwanted content does not filter the packets returned from the Internet but, instead, the outgoing requests. When looking at the HTML code of a web page with advertising, it quickly becomes apparent that the advertising banners do not originate from the addressed target server itself. Instead, the pages embed HTML that includes links to advertising services, as well as tracking cookies that point to advertising providers. These deep links do not point to IP addresses, but to the DNS names of the operators or subdomains.
This means that the user's browser itself actively requests these banners and trackers, after resolving the DNS address of the embedded link. This is where DNS filtering comes in: It uses blacklists for unwanted URLs and refuses to deliver the IP addresses of these URLs to the client. Instead of an IP address, the filter DNS simply returns 0.0.0.0. Therefore, the browser does not even request the integrated URL from the Internet. The space normally occupied by the advertising banner remains empty and the trackers do not receive any feedback from the client. But be careful: For DNS filtering to work, clients and their browsers must use the network's default DNS resolution and not use their own DNS servers and methods.
Pi-hole as a DNS Filter
As an alternative to the options discussed so far, Pi-hole filters out DNS requests for unwanted URLs on the fly, hiding advertising content and trackers (Figure 1). Pi-hole is one of the most popular DNS filters. As the name suggests, the tool started life as a piece of software for the Raspberry Pi, but Pi-hole runs reliably and quickly on all other platforms, even when deployed on a larger network.
Some IT managers are reluctant to use Pi-hole because they do not want to replace their existing DNS server and transfer its configuration to Pi-hole. This is especially the case if the existing DNS server resolves local addresses and services such as Kerberos and is perhaps also integrated with the DHCP service. However, because the DNS protocol has no problems with proxy forwarding, a Pi-hole setup does not need to replace the existing service at all; instead it can act as a kind of overlay – even on the same machine, like in my example.
My setup uses an existing dnsmasq server on the application server running RHEL 8. The service provides the LAN with IP addresses via DHCP, lets physical and virtual systems boot via PXE over the network, and resolves local domain names. The dnsmasq service prefers to use the public Quad9 service 9.9.9.9 as its upstream DNS. Unlike Google's open DNS service on 8.8.8.8, Quad9 does not log all incoming DNS requests including source IP addresses.
Besides the dnsmasq service, the application server is now also running Pi-hole, in a Podman container. In principle, there are two options for running two DNS servers on the same machine. If Pi-hole runs in a container without its own IP address, the existing dnsmasq service must switch to a port other than 53. Alternatively, you can let the Pi-hole container operate on a bridge network and therefore with its own IP address. For this example, I chose the second approach, because my application server uses a whole bunch of other Podman containers with their own IP addresses anyway.
Buy this article as PDF
(incl. VAT)
Buy Linux Magazine
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters
Support Our Work
Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.
News
-
Juno Tab 3 Launches with Ubuntu 24.04
Anyone looking for a full-blown Linux tablet need look no further. Juno has released the Tab 3.
-
New KDE Slimbook Plasma Available for Preorder
Powered by an AMD Ryzen CPU, the latest KDE Slimbook laptop is powerful enough for local AI tasks.
-
Rhino Linux Announces Latest "Quick Update"
If you prefer your Linux distribution to be of the rolling type, Rhino Linux delivers a beautiful and reliable experience.
-
Plasma Desktop Will Soon Ask for Donations
The next iteration of Plasma has reached the soft feature freeze for the 6.2 version and includes a feature that could be divisive.
-
Linux Market Share Hits New High
For the first time, the Linux market share has reached a new high for desktops, and the trend looks like it will continue.
-
LibreOffice 24.8 Delivers New Features
LibreOffice is often considered the de facto standard office suite for the Linux operating system.
-
Deepin 23 Offers Wayland Support and New AI Tool
Deepin has been considered one of the most beautiful desktop operating systems for a long time and the arrival of version 23 has bolstered that reputation.
-
CachyOS Adds Support for System76's COSMIC Desktop
The August 2024 release of CachyOS includes support for the COSMIC desktop as well as some important bits for video.
-
Linux Foundation Adopts OMI to Foster Ethical LLMs
The Open Model Initiative hopes to create community LLMs that rival proprietary models but avoid restrictive licensing that limits usage.
-
Ubuntu 24.10 to Include the Latest Linux Kernel
Ubuntu users have grown accustomed to their favorite distribution shipping with a kernel that's not quite as up-to-date as other distros but that changes with 24.10.