Saving and evaluating network paths in Neo4j

A Relationship Thing

Article from Issue 164/2014
Author(s):

The Neo4j graph database is much better suited than relational databases for storing and quickly querying nodes and their mutual relationships. If your circle of friends is not wide enough to warrant a graph-based application, you might just want to inventory your LAN.

Modeling structures like the social graph of Facebook, connections to friends and their acquaintances, or your follower structure on Twitter is surprisingly difficult with traditional databases. Trying to map a network path – easily represented with squiggles and arrows on a whiteboard – with a relational model inevitably leads to performance-hungry join statements, the natural enemy of responsive websites.

The Neo4j [1] graph database natively stores graph models and offers fantastic performance – as long as you don't overcook the complexity of the queries. Its generic storage model consists of nodes and relationships. Both can possess attributes; for example, a node that represents a person could contain a name field for storing the name or carry a relationship is_friends_with and its intensity (best_friend, casual_friend).

Cypher Query Language

The Neo4j query processor takes inquiries in the SQL-style Cypher language, rummages through the data located in the database, and quickly returns results that Cypher also filters and processes in SQL style (i.e., sort, group, etc.).

After you install the GPL-licensed Neo4j Community Server (there's also a commercial enterprise version), it listens on port 7474 for commands either received via REST or using the newer simple JSON processor. The client can be programmed in several dozen languages, including the CPAN REST::Neo4p module for Perl.

The Debian package offered on the Neo4j site [1] also includes a handy command shell: neo4j-sh. You can use it to run commands similar to the interactive MySQL client to insert new data into the model and extract stored information via Cypher queries.

Declaratively Powerful

Cypher is, like SQL, declarative: You can specify the results you are looking for, but you don't need to define procedural statements to describe how exactly to find them. Match statements define which data are of interest (e.g., "Find all data" or "find all relations of type is_friends_with) where clauses then reduce the number of matches; for example, the requesting user may only be interested in people who are 18 years or older.

Subsequent processing steps remodel, sort, or collate the data. Even running further match statements against the results list is permitted, as well as intermediate actions to generate new data on the fly.

The graph of the home network in Figure 1 is intended to illustrate some practical queries. Networks actually represent a popular task for Neo4j with nodes and relations. To determine whether a router can easily reach the open Internet via other nodes, the database often needs to find an open path from A to B via craftily connected nodes. This can cause a performance implosion on relational systems, but can often be tackled with ease using graph databases.

Figure 1: The components of a home network – this Perl column feeds them to a graph database for path analysis.

Hand-Reared

For example, to add the router named internal in Figure 1 to the database and assign it the LAN IP 192.168.2.1, you would just do this in the Neo4j shell:

neo4j-sh (?)$ CREATE (router {name:"internal", lan_ip:"192.168.2.1"});

After creating another new node named merger for the gateway relation between the internal router and its gateway, a Cypher query locates both nodes and defines the connection with Cypher's own ASCII art syntax:

neo4j-sh (?)$ MATCH (a), (b)
> WHERE a.name = "internal" and b.name ="merger"
> CREATE (a)-[r:gateway]->(b);

The match operation finds two nodes, which it assigns the aliases a and b. Because no other search pattern exists in the match clause, this applies to all the nodes in the database. However, the following WHERE clause restricts the results to two precisely named nodes, and the CREATE statement uses the syntax -[...]-> to draw an arrow with a name between the identified nodes, thus creating a relation of the gateway type.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Skydive

    If you don't speak fluent Ethernet, it sometimes helps to get a graphical view of what your network is doing. Skydive offers visual insights that could reveal complex error patterns.

  • Kaspersky Polishes Mail Gateway

    Russian security specialist Kaspersky has reworked its anti-spam product for Linux and Unix servers.

  • Perl: isp-switch

    When an Internet provider goes down, users suffer. Alternatively, users can immediately switch to another ISP. We’ll show you a Perl script that can help you reconfigure your computer to make the switch.

  • KNIME

    They say data is "the new oil," but all that data you collect is only valuable if it leads to new insights. An open source analysis tool called KNIME lets you analyze data through graphical workflows – without the need for programming or complex spreadsheet manipulation.

  • Charly's Column – Munin

    What do you do if the Munin system monitoring tool does not have a native Munin client for a device? Sys admin Charly has a solution.

comments powered by Disqus