Scientific computing with a crypto mining rig

The Test Candidate

We purchased a mining rig with a backplane and separate motherboard at auction for EUR750. The system did not work reliably at first. The power supply worked, but it was too loud and smelled unhealthy. The eight installed NVIDIA P106-090 mining cards from 2018 with PCIe 1.1 x4 were OK. We treated them to a new case, memory, motherboard, processor and, to be on the safe side, a new power supply for another EUR350.

We wanted to compare the performance of this used mining rig with a high-end professional system. The professional hardware we chose for comparison was a 2020 system with eight NVIDIA A100 cards and PCIe 4.0 x16. The cost for this professional system was more than EUR75,000, which was 100 times more expensive than the mining rig we bought at auction.

GPU-focused systems are optimized for computation-intensive operations, so we wanted to stay with that basic scenario in our tests. We tested two different use cases:

  • Scientific computing using the BOINC crowdsource computing platform [4]
  • Machine learning with the PyTorch deep learning framework [5] and a well-known test dataset to teach the system to distinguish between dog and cat images

A cheap used mining rig that sells for one percent of the cost of an advanced computer system would be a big advantage, but we were realistic. We had no illusions a EUR750 mining rig would outperform the high-end commercial system in an absolute sense. We were more curious about whether it was competitive in delivering computing power per cost. In other words, if an option delivers one tenth of the computing power but it comes at only on one hundredth of the cost, there are scenarios where it could be a viable alternative.

We were also aware that the different components of the design would affect performance in different ways. The two systems didn't just have two different GPUs. The difference between the PCIe 1.1 x4 bus and the PCIe 4.0 x16 bus also seemed significant, as well as the differences in the CPUs. For a few of the tests, we experimented with putting the GPUs from the mining rig into the newer system to isolate the GPU as a variable.

BOINC Benchmarks

We picked out three BOINC-based crowdsource projects that support GPUs. Einstein@Home [6] uses data from the LIGO gravitational-wave detectors, the MeerKAT radio telescope, the Fermi Gamma-ray Space Telescope, as well as archival data from the Arecibo radio telescope to look for spinning neutron stars (often called pulsars). The professional system with eight A100 cards needed 300 seconds in this test. The mining rig took 2,000 seconds per work unit, which is more than six times as long, but again, the professional system was 100 times more expensive.

Was the superior performance of the professional computer due to the GPU or the faster processor with faster and wider PCIe bus? To find out, we installed the P106-090 cards from the mining rig into the professional system. Despite the faster processor and the 4x instead of 1x PCIe channels, the P106-090 cards ran only one percent faster when installed on the faster system. Einstein@Home allows multiple work units to share a GPU. We would have expected that processing two work units at once would lead to a performance advantage, but calculating two jobs on one card also doubled the computing time, so it did not yield an advantage.

The prime number search with PrimeGrid [7] requires virtually no CPU interaction with the cards (less than two percent CPU load). The P106-090s of our test system required between 916 and 925 seconds (CudaPPSsieve) and about 4,500 seconds (OCL_cuda_AP27). The A100s in the professional rig completed the task in about one tenth of the time in each case.

For the third BOINC test, we selected a benchmark program for the Folding@home biomedical project [8] and launched it simultaneously on several GPUs. The benchmark measures how many nanoseconds of a process in nature the computer can model within one day. With single precision, the mining rig's P106 GPUs managed 59 ns/d when placed in the professional system, whereas the A100 achieved 259 ns/d. With Double Precision (not supported in the hardware on the P106) it was 159 ns/d on the A100, while the P106 achieved just 3 ns/d.

PyTorch

PyTorch is an open source machine learning framework. We put together a manageable script that uses a neural network to classify images on a varying number of graphics cards (or just on the CPU). To do this, the images must be transported to the graphics cards and, if the results are distributed over several cards, they also need to be merged again at the end. During training, the models also need to be updated on all cards.

The CPU is not something you can do without in machine learning projects with GPU support. On the contrary, it actually becomes more and more important as the number of compute cores increases. It first prepares the data for the GPU and then summarizes the GPU results. If you distribute the workload over many GPUs, the processor can definitely become the bottleneck for which the graphics units will have to wait. How much the communication between GPU and CPU can be reduced depends on the application. If the data can be represented as a matrix and the application is based on operations on matrices or between them, GPUs are hard to beat.

We assumed for our study that the number of cards used does not affect the quality of the predictions. We actually did not pay any further attention to the quality of the prediction, as it can depend on a variety of factors, such as the quality of the training dataset or the size of the batches. We exclusively looked at the number of images that could be trained or classified (evaluated) every second with the given hardware.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • PCIe SSDs

    A PCIe SSD can accelerate your system considerably, but you need to do your homework and choose the right product for your computer.

  • Fighting COVID-19 with BOINC

    Linux and the BOINC distributed computing platform help researchers fight the COVID-19 virus.

  • Xeon Phi

    The Xeon Phi accelerator card from Intel takes an unusual approach: Instead of GPUs, the Xeon Phi features a cluster of CPUs for easier programming.

  • Mining Monero

    The Monero cryptocurrency lets you get in the game without spending thousands on hardware. We'll show you how.

  • NVidia Driver Install

    A terminal-based solution helps ease the frustration of installing NVIDIA drivers.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News