Safer Internet Searches
YaCy as a Solution
One of the biggest differences between YaCy and Searx is that YaCy runs independently of other search engines. YaCy creates its own distributed index. Just like in torrent files that use distributed hash tables (DHTs), you keep your own part of the tables.
To run YaCy, you need to set the amount of space that you will allow YaCy to occupy on your system, although the installation script has a default. Like Searx, you can use a Docker image to run YaCy. YaCy offers three different Docker images: amd64
, arm64v8
, and arm32v7
.
To install YaCy with Docker, use the standard values found on YaCy's web page:
docker run -d --name yacy -p 8090:8090 -p 8443:8443 -v yacy_data:/opt/yacy_search_server/DATA --log-opt max-size=200m--log-opt max-file=2 yacy/yacy_search_server:latest
These standard values help you manage resource usage. Once the server is running, you can also access a management interface from your browser. If you want to be able to use the management interface from another computer, you need to set an administrator password. If you lose the password, you will need to go back to the command line in the root of the YaCy directory and run:
bin/password.sh
This command will handle changing the password, whether your server is running or not.
You can also clone the GitHub repository and compile the binaries [11]. Confusingly, the GitHub repo does not mention at the top that you must compile before running the standard script (startYACY.sh
).
YaCy needs Java. When you download the GitHub repo, you need ant
to compile. You'll find the details further down in the GitHub document. If you need to install YaCy on multiple machines, you can create a Debian package directly with the compiler.
Configuring YaCy
Whichever method you choose for installation, you need to set up some values to get the most out of your system. First, you should specify how you want to use YaCy. For the most basic configuration, you set an interface language, name, and search use case (Figure 6).
The search use case sets the type of search. An internal search will just find files on your network; more common is a search of the entire YaCy community.
In the YaCy Administration dialog, you can edit all your settings, including working memory, disk space, and more.
Clicking on RAM/Disk Usage & Updates lets you adjust the settings for working memory and disk space. The default memory for the Java Virtual Machine (JVM) is set to 600MB.
The other values in the RAM/Disk Usage & Updates dialog save you from running out of disk space. You can use the Steady-state minimum option to disable crawls when free disk space falls below a specified minimum megabytes. This will only be an issue when you have the ports open and you collaborate with the index or when you start your own crawl. HTCache configuration lets you control the size of the content retrieved via HTTP or FTP; the default size is 4GB.
Putting YaCy to Work
Once you've configured YaCy, you can start a crawl from any web address. From the Administration dialog, click on Load Web pages, Crawler and enter the web address. YaCy will look through all the documents on the server and index them for you. You can use this to index your own internal network or add your new web page to the common index.
In addition to private searching, YaCy lets you share your search engine with others. You can customize YaCy for your website. Click on Portal Configuration to set color, title text, and even the logo that appears above the search box. From here, you also can see what the search engine will look like with your customizations.
If you use YaCy seriously, you should consider contributing to the YaCy index. To do this, you need to open your port to other peers on the network. In particular, you'll need to open port 8090, which is usually blocked by default.
Buy this article as PDF
(incl. VAT)