Perl script as a sniffer with built-in statistics
What Am I?
Listing 1 implements the Top-style script in Figure 2 and uses the flexible AnyEvent
event framework and the TopCapture
and TopGUI
modules explained later. These modules capture the network packets and present them in a dynamic Curses UI. Line 8 starts the capture process with tshark
in TopCapture
using its start()
method.
To prevent the GUI from displaying the IP of the current host, which is included in each packet as either a target or sender address, line 15 attempts to identify this address by opening a socket with Google's web server as a target (but without opening a connection) and then uses sockhost()
to retrieve the IP address, which the network stack added as the sender.
This method works perfectly, even if multiple network interfaces are available on the host, because a newly built socket automatically uses the correct interface. If line 33 finds the host ID in a packet, the output is suppressed.
The timer defined in lines 20-42 runs the callback defined by the cb
option and then uses stats()
in TopCapture
to pick up the current packet counter value and a hash indicating how often the acquired DNS names each occurred in the packets.
GUI in the Cuckoo's Nest
At the end of Listing 1, the GUI launches; this is implemented by the CPAN Curses:UI:POE module, an event module that uses POE, which competes with AnyEvent
. However, they run alongside each other just fine, because AnyEvent
can coexist with just about any event code. POE's event loop called in the TopGUI
module's start()
method as of line 44 in Listing 1 also processes, inadvertently, the entries hooked in by AnyEvent
, such as the timer and the packet collection callbacks.
Listing 2 uses the AnyEvent::DNS and AnyEvent::Run modules from CPAN for asynchronous name resolution of IP addresses and to fire up an external TShark process. To save space, line 6 calls Moo
, which secretly adds a new()
constructor. The three instance variables, ip_counts
, packet_count
, and dns_cache
, store the frequency of occurrence for all IP addresses, the total number of all analyzed packages, and a cache that maps IP addresses to the hostnames returned when reverse lookups were returned by the DNS server. The ip_counts
and packet_count
variables can be reset using the reset()
method.
Listing 2
TopCapture.pm
Only Full Rows
For the running event loop to continue ticking steadily, the start()
method in line 18 of Listing 2 is not allowed simply to call tshark
and wait for the results to come back; instead, it uses AnyEvent::Run, which in turn calls fork()
, to ramp up a child process and actuate the on_read
callback each time a snippet of text appears on the standard output.
It is not individual bits that are of interest to the application, however, but only full rows: push_read()
in line 35 thus pushes the single bits back into the buffer and tells AnyEvent
not to call the callback defined by line
until a complete line is ready for consumption.
As soon as this happens, TopCapture
branches to the line_process()
method, which begins in line 45. When it gets there, line 50 first checks whether the output actually contains two IP addresses (ip.src
and ip.dst
, as required for further processing) or the introductory comment, which tshark
also emits.
Hostnames resolved by DNS leave a better impression than a list of bare IP addresses. Having said this, the data often arrives faster than an external DNS resolver can process it. This explains why Listing 2 first uses the dns_cache
hash to store the IP address while using AnyEvent::DNS to call the DNS server asynchronously and carry on processing further packets. When the DNS server responds, some time later, Listing 2 jumps to the callback from line 62 and adds the result to the entry in dns_cache
, thus saving time-consuming lookups for IP addresses that arrive later on.
These reverse lookups, which query the DNS resolver for the hostname given its IP address, do not always work, because some IP addresses – such as the one from your own network or your local ISP – often do not have a name record. In this case, dns_cache
simply leaves the IP address as is.
If you use the Chrome browser, you will see that it often communicates with IPs that the DNS server resolves to the strange-looking domain *.1e100.net. A quick check on the Internet shows that this domain is none other than Google; 1e100 is an allusion to the number with 100 zeros, which is called a googol. In other words, Chrome happily and frequently calls the mothership.
The stats()
method (lines 74-82) returns the number of previously analyzed packets and a reference to a hash, which assigns resolved hostnames to packet counters. Listing 3 implements the Top-style GUI with Curses, and communicates with the main program via event()
(sending messages) and reg_cb()
(receiving messages in callback) methods inherited from Object::Event. The start()
method builds the GUI, starting with a header top
, a list box lbox
, and a footer bottom
.
Listing 3
TopGUI.pm
In the header, Listing 3 increments the counter of all previously analyzed packets as the values trickle in. A q quits the program. Like all keyboard events, Curses fields them by means of set_binding()
in line 42. Pressing the c key resets all the counters; the list of top hosts disappears and is assigned new values the next time the timer expires. The DNS cache is, however, preserved to save on costly lookups of unchanged data.
« Previous 1 2 3 Next »
Buy this article as PDF
(incl. VAT)