Setting up a data analytics environment in Linux with Python

Down in the Mine

Article from Issue 259/2022

Author(s): Emil J. Khatib

The Knowledge Discovery in Data Mining (KDD) method breaks the business of data analytics into easy-to-understand steps. We'll show you how to get started with KDD and Python.

Data analytics is a major force in the current zeitgeist. Analytics are the eyes and ears on a very wide variety of domains (society, climate, health, etc.) to perform an even wider variety of tasks (such as understanding commercial trends, the spread of COVID-19, and finding exoplanets). In this article, I will discuss some fundamentals of data analytics and show how to get started with analytics in Python. Finally, I will show the whole process at work on a simple data analytics problem.

A Primer on Data Analytics

Data analytics uses tools from statistics and computer science (CS), such as artificial intelligence (AI) and machine learning (ML), to extract information from collected data. The collected data is usually very complex and voluminous, and it cannot be interpreted easily (or at all) by humans. Therefore, the data on its own is useless. Information lies hidden within the data, and it takes many forms: repeating patterns, trends, classifications, or even predictive models. You can use this data to uncover insights and build knowledge of the problem you are studying. For example, suppose you wish to measure the traffic in a parking lot that is monitored by a network of IoT sensors covering the whole city. Reading a single occupancy sensor doesn't say anything about the traffic on its own. Neither do the readings of all the parking sensors of the city without any more context. But the timestamped percentage of occupied places within the monitored parking lot does tell us something, and we use this information to derive insights, such as the times of day with maximum traffic.

Learning the mathematical background and analytics tools is only half the journey. Field expertise (experience on the problem that is being studied) is equally important. Some data scientists come from a statistics background, others are computer scientists who pick up the statistics as they go, and many are people starting from a field of expertise who need to learn both the statistics and the computing tools.

[...]

Use Express-Checkout link below to read the full article (PDF).

Buy this article as PDF

Download Article PDF now with Express Checkout

Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES

Print Issues

Digital Issues

SUBSCRIPTIONS

Print Subscriptions

Digital Subscriptions

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

News

Nitrux 6.0 Now Ready to Rock Your World

DEBIAN , Desktop , Nitrux

The latest iteration of the Debian-based distribution includes all kinds of newness.
Linux Foundation Reports that Open Source Delivers Better ROI

Community , open source , Software

In a report that may surprise no one in the Linux community, the Linux Foundation found that businesses are finding a 5X return on investment with open source software.
Keep Android Open

Android , apps , open source

Google has announced that, soon, anyone looking to develop Android apps will have to first register centrally with Google.
Kernel 7.0 Now in Testing

Kernel , Linux

Linus Torvalds has announced the first Release Candidate (RC) for the 7.x kernel is available for those who want to test it.
Introducing matrixOS, an Immutable Gentoo-Based Linux Distro

Gentoo Linux , matrixOS , Operating Systems

It was only a matter of time before a developer decided one of the most challenging Linux distributions needed to be immutable.
Chaos Comes to KDE in KaOS

KDE , Plasma

KaOS devs are making a major change to the distribution, and it all comes down to one system.
New Linux Botnet Discovered

botnet , Security

The SSHStalker botnet uses IRC C2 to control systems via legacy Linux kernel exploits.
The Next Linux Kernel Turns 7.0

Encryption , Kernel

Linus Torvalds has announced that after Linux kernel 6.19, we'll finally reach the 7.0 iteration stage.
Linux From Scratch Drops SysVinit Support

Linux From Scratch , Systemd

LFS will no longer support SysVinit.
LibreOffice 26.2 Now Available

libreoffice , office suite , open source

With new features, improvements, and bug fixes, LibreOffice 26.2 delivers a modern, polished office suite without compromise.

Setting up a data analytics environment in Linux with Python

Down in the Mine

A Primer on Data Analytics

Buy this article as PDF

Buy Linux Magazine

Related content

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

News

Nitrux 6.0 Now Ready to Rock Your World

Linux Foundation Reports that Open Source Delivers Better ROI

Keep Android Open

Kernel 7.0 Now in Testing

Introducing matrixOS, an Immutable Gentoo-Based Linux Distro

Chaos Comes to KDE in KaOS

New Linux Botnet Discovered

The Next Linux Kernel Turns 7.0

Linux From Scratch Drops SysVinit Support

LibreOffice 26.2 Now Available

Setting up a data analytics environment in Linux with Python

Down in the Mine

A Primer on Data Analytics

Buy this article as PDF

Buy Linux Magazine

Related content

Subscribe to our Linux Newsletters Find Linux and Open Source Jobs Subscribe to our ADMIN Newsletters

Support Our Work

News

Tag Cloud

Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters