Perl script as a sniffer with built-in statistics

What Am I?

Listing 1 implements the Top-style script in Figure 2 and uses the flexible AnyEvent event framework and the TopCapture and TopGUI modules explained later. These modules capture the network packets and present them in a dynamic Curses UI. Line 8 starts the capture process with tshark in TopCapture using its start() method.

To prevent the GUI from displaying the IP of the current host, which is included in each packet as either a target or sender address, line 15 attempts to identify this address by opening a socket with Google's web server as a target (but without opening a connection) and then uses sockhost() to retrieve the IP address, which the network stack added as the sender.

This method works perfectly, even if multiple network interfaces are available on the host, because a newly built socket automatically uses the correct interface. If line 33 finds the host ID in a packet, the output is suppressed.

The timer defined in lines 20-42 runs the callback defined by the cb option and then uses stats() in TopCapture to pick up the current packet counter value and a hash indicating how often the acquired DNS names each occurred in the packets.

GUI in the Cuckoo's Nest

At the end of Listing 1, the GUI launches; this is implemented by the CPAN Curses:UI:POE module, an event module that uses POE, which competes with AnyEvent. However, they run alongside each other just fine, because AnyEvent can coexist with just about any event code. POE's event loop called in the TopGUI module's start() method as of line 44 in Listing 1 also processes, inadvertently, the entries hooked in by AnyEvent, such as the timer and the packet collection callbacks.

Listing 2 uses the AnyEvent::DNS and AnyEvent::Run modules from CPAN for asynchronous name resolution of IP addresses and to fire up an external TShark process. To save space, line 6 calls Moo, which secretly adds a new() constructor. The three instance variables, ip_counts, packet_count, and dns_cache, store the frequency of occurrence for all IP addresses, the total number of all analyzed packages, and a cache that maps IP addresses to the hostnames returned when reverse lookups were returned by the DNS server. The ip_counts and packet_count variables can be reset using the reset() method.

Listing 2

TopCapture.pm

 

Only Full Rows

For the running event loop to continue ticking steadily, the start() method in line 18 of Listing 2 is not allowed simply to call tshark and wait for the results to come back; instead, it uses AnyEvent::Run, which in turn calls fork(), to ramp up a child process and actuate the on_read callback each time a snippet of text appears on the standard output.

It is not individual bits that are of interest to the application, however, but only full rows: push_read() in line 35 thus pushes the single bits back into the buffer and tells AnyEvent not to call the callback defined by line until a complete line is ready for consumption.

As soon as this happens, TopCapture branches to the line_process() method, which begins in line 45. When it gets there, line 50 first checks whether the output actually contains two IP addresses (ip.src and ip.dst, as required for further processing) or the introductory comment, which tshark also emits.

Hostnames resolved by DNS leave a better impression than a list of bare IP addresses. Having said this, the data often arrives faster than an external DNS resolver can process it. This explains why Listing 2 first uses the dns_cache hash to store the IP address while using AnyEvent::DNS to call the DNS server asynchronously and carry on processing further packets. When the DNS server responds, some time later, Listing 2 jumps to the callback from line 62 and adds the result to the entry in dns_cache, thus saving time-consuming lookups for IP addresses that arrive later on.

These reverse lookups, which query the DNS resolver for the hostname given its IP address, do not always work, because some IP addresses – such as the one from your own network or your local ISP – often do not have a name record. In this case, dns_cache simply leaves the IP address as is.

If you use the Chrome browser, you will see that it often communicates with IPs that the DNS server resolves to the strange-looking domain *.1e100.net. A quick check on the Internet shows that this domain is none other than Google; 1e100 is an allusion to the number with 100 zeros, which is called a googol. In other words, Chrome happily and frequently calls the mothership.

The stats() method (lines 74-82) returns the number of previously analyzed packets and a reference to a hash, which assigns resolved hostnames to packet counters. Listing 3 implements the Top-style GUI with Curses, and communicates with the main program via event() (sending messages) and reg_cb() (receiving messages in callback) methods inherited from Object::Event. The start() method builds the GUI, starting with a header top, a list box lbox, and a footer bottom.

Listing 3

TopGUI.pm

 

In the header, Listing 3 increments the counter of all previously analyzed packets as the values trickle in. A q quits the program. Like all keyboard events, Curses fields them by means of set_binding() in line 42. Pressing the c key resets all the counters; the list of top hosts disappears and is assigned new values the next time the timer expires. The DNS cache is, however, preserved to save on costly lookups of unchanged data.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy Linux Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

  • Tshark

    The simple and practical Tshark packet analyzer gives precise information about the data streams on the network.

  • Capture File Filtering with Wireshark

    Wireshark doesn’t just work in real time. If you save a history of network activity in a pcap file using a tool such as tcpdump, you can filter the data with Wireshark to search for evidence.

  • Core Technologies

    Learn what's going on in your network, using Linux and its arsenal of packet capture tools.

  • Security Lessons

    Building a network flight recorder with Wireshark.

  • An Essential Sys Admin and Security Tool

    Wireshark fills the gap between security and system administration for those who need to know more about what’s flowing through the wires or over the airwaves in the corporate network.

comments powered by Disqus
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs
Subscribe to our ADMIN Newsletters

Support Our Work

Linux Magazine content is made possible with support from readers like you. Please consider contributing when you’ve found an article to be beneficial.

Learn More

News