Safer Internet Searches

YaCy as a Solution

One of the biggest differences between YaCy and Searx is that YaCy runs independently of other search engines. YaCy creates its own distributed index. Just like in torrent files that use distributed hash tables (DHTs), you keep your own part of the tables.

To run YaCy, you need to set the amount of space that you will allow YaCy to occupy on your system, although the installation script has a default. Like Searx, you can use a Docker image to run YaCy. YaCy offers three different Docker images: amd64, arm64v8, and arm32v7.

To install YaCy with Docker, use the standard values found on YaCy's web page:

docker run -d --name yacy -p 8090:8090 -p 8443:8443 -v yacy_data:/opt/yacy_search_server/DATA --log-opt max-size=200m--log-opt max-file=2 yacy/yacy_search_server:latest

These standard values help you manage resource usage. Once the server is running, you can also access a management interface from your browser. If you want to be able to use the management interface from another computer, you need to set an administrator password. If you lose the password, you will need to go back to the command line in the root of the YaCy directory and run:

bin/password.sh

This command will handle changing the password, whether your server is running or not.

You can also clone the GitHub repository and compile the binaries [11]. Confusingly, the GitHub repo does not mention at the top that you must compile before running the standard script (startYACY.sh).

YaCy needs Java. When you download the GitHub repo, you need ant to compile. You'll find the details further down in the GitHub document. If you need to install YaCy on multiple machines, you can create a Debian package directly with the compiler.

Configuring YaCy

Whichever method you choose for installation, you need to set up some values to get the most out of your system. First, you should specify how you want to use YaCy. For the most basic configuration, you set an interface language, name, and search use case (Figure 6).

Figure 6: Under Basic Configuration, you can setup your interface language and use case.

The search use case sets the type of search. An internal search will just find files on your network; more common is a search of the entire YaCy community.

In the YaCy Administration dialog, you can edit all your settings, including working memory, disk space, and more.

Clicking on RAM/Disk Usage & Updates lets you adjust the settings for working memory and disk space. The default memory for the Java Virtual Machine (JVM) is set to 600MB.

The other values in the RAM/Disk Usage & Updates dialog save you from running out of disk space. You can use the Steady-state minimum option to disable crawls when free disk space falls below a specified minimum megabytes. This will only be an issue when you have the ports open and you collaborate with the index or when you start your own crawl. HTCache configuration lets you control the size of the content retrieved via HTTP or FTP; the default size is 4GB.

Putting YaCy to Work

Once you've configured YaCy, you can start a crawl from any web address. From the Administration dialog, click on Load Web pages, Crawler and enter the web address. YaCy will look through all the documents on the server and index them for you. You can use this to index your own internal network or add your new web page to the common index.

In addition to private searching, YaCy lets you share your search engine with others. You can customize YaCy for your website. Click on Portal Configuration to set color, title text, and even the logo that appears above the search box. From here, you also can see what the search engine will look like with your customizations.

If you use YaCy seriously, you should consider contributing to the YaCy index. To do this, you need to open your port to other peers on the network. In particular, you'll need to open port 8090, which is usually blocked by default.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Charly's Column – Searx

    It goes against the economic rationale to assume that commercial search engines have the best interests of users at heart when it comes to data protection and use. Sys admin Charly has found an alternative.

  • Introduction

    This month in Linux Voice.

  • Tech Tools
  • ddgr Search

    Since 2008, DuckDuckGo has been making waves as an efficient and much more private search engine alternative to Google. The unaffiliated command-line tool ddgr is designed to make running DuckDuckGo searches from the terminal a breeze.

comments powered by Disqus