Securing VoIP networks

Safe Call

Article from Issue 99/2009

Author(s): Christoph Egger , Author(s): Michael Hirschbichler

Eavesdropping on conversations on a LAN is easier than ever thanks to insecure VoIP installations. You don't need to bug restaurant booths or tap phone lines – standard Linux tools are all a hacker needs.

Many small to mid-sized enterprises simply connect their new VoIP systems to existing LANs. Road warriors call in via the Internet, and remote branches use a standard connection to reach the mother ship. Unfortunately, this kind of installation doesn't provide anything in the line of VoIP infrastructure security.

In this article, we look at some of the special concerns affecting VoIP and describe some strategies and optional protocols for protecting voice communications.

A Typical VoIP Connection

Session Initiation Protocol (SIP, RFC 3261) [1] is the most popular open VoIP standard for initiating, negotiating, and managing VoIP connections. In combination with Session Description Protocol (SDP, RFC 4566) [2], which handles audio or video codec negotiation, SIP transmits information about the connection between the calling parties. Once the connection is established, the parties send and receive data using the Realtime Transport Protocol (RTP) [3].

Suppose user A in domain A wants to initiate a VoIP connection with user B in domain B. User A sends an Invite request (Figure 1) to its own provider's SIP proxy server. The proxy performs a DNS lookup to ascertain the SIP proxy for domain B, then sends the request to the domain B SIP proxy. The SIP proxy server in domain B checks its own location database for the IP address and port that user B registered with it and forwards the request to this host.

Any response from user agent B, which could be Ringing or an OK when the recipient accepts the call, is transported back along the same network path.

After setting up the call with SIP/SDP, the system establishes a direct RTP connection between user agent A and user agent B. In some VoIP scenarios, providers use an intermediate media proxy to avoid issues with Network Address Translation (NAT) (Figure 2). The advantages of a media proxy are that at least one of the partners does not reside behind a NAT gateway and both terminal devices have counterparts with public IP addresses.

To add the media proxies transparently, your own SIP proxies replace their counterpart's IP addresses in the SDP payload with the media proxy's IP address and port. This convenient technique simplifies the task of getting around NAT devices and firewalls. But at the same time, this approach seriously affects your options for encrypted transmissions. (The topic of NAT traversal is extremely complex, especially in the context of VoIP, SIP, and RTP. More detailed information is available on the web [4].)

What To Do About SIP

Plaintext signaling and its hop-by-hop architecture make SIP very difficult to harden. Because data packets traverse multiple hops, administrators find it difficult to achieve end-to-end security that guarantees availability, confidentiality, and integrity. Each hop en route can add, remove, or manipulate headers. This design makes it impossible to sign the headers.

Signing the SDP payload as an alternative also turns out to be unsatisfactory because SIP includes confidential signaling information in the header. An attacker could, say, manipulate a packet's From header to alter the caller ID, affecting the validity of the payload signature.

The authors of RFC 3261, the current SIP standard, were very much aware of the security issues surrounding SIP, and they responded by adding SIP Secure Schema (SIPS) [5]. A user agent that supports SIPS expects Transport Layer Security (TLS) [6] as the underlying protocol for signaling between two hops to the last SIP proxy server.

TLS-based encryption brings an additional layer of security to SIP. SIPS, however, does not provide a complete solution. The problem with this approach is that user agent A has no way of knowing whether each hop along the path has transmitted the request via a separate TLS-secured connection. The standard only defines TLS security to the SIP proxy server in domain B. The connection between the proxy and user agent B is unencrypted.

As an alternative, user agent B can act as a TLS server, but few clients actually implement this technique. Currently, developers are working on a standard that resolves this issue by letting user agent B set up a TLS tunnel to proxy B and keep the tunnel open for incoming requests.

Another problem facing SIP is that, if UDP is used as the transport, it is easy for attackers to replay a request with manipulated headers to hijack the user's identity and make calls. To prevent this kind of attack, most SIP proxy systems support user authentication. Authenticating against your own SIP proxy server provides protection against identity theft and misappropriation of resources. When the user agent registers, or when a call is set up, systems typically use an http digest-based handshake as the authentication algorithm.

Keep in mind, however, that authentication alone does not protect against man-in-the-middle attacks because the integrity of the request cannot be validated. An attacker could sniff authentication information and use different headers to manipulate a call.

Authenticating every single request is not an option in some scenarios; for example, ACK and CANCEL requests do not expect a response and thus do not support a handshake. The standards support other forms of SIP authentication, but overemphasis on authentication stresses the provider's SIP proxy, increasing the delay and affecting connection quality.

S/MIME

Secure/Multipurpose Internet Mail Extensions (S/MIME) [7] provide an option for securing the payload only. The client sends the Invite request along with an encrypted SDP part to ensure the confidentiality and integrity of the SDP data. Doing so also guarantees that sockets that receive and send the RTP data at the other end really do belong to the authenticated partner. But this only makes sense if the media data are encrypted through the use of secure RTP [8].

Because many SIP proxy servers attempt to rewrite the SDP header, because of the media proxies, encrypted SDP can cause unexpected issues, and working S/MIME-capable clients are unknown.

According to a technique defined in RFC 3893 [9], parts of the SIP header are signed along with SDP. The certificate used for the S/MIME signature can also sign the original From and To headers. The SIP request is divided into three sections for this:

the SIP request itself is one;
part two is of the message/sipfrag type and contains a copy of one part of the SIP request (To, From, date/time information, and other details);
part three is the signature for part two.

If the recipient knows the sender's public key, it can validate the relevant data. Implementations of this approach, known as Authenticated Identity Body (AIB), are not widespread.

What To Do About RTP

RTP does not use encryption and relies on the connectionless UDP as its transmission protocol. This combination makes it simple for an attacker to reroute, sniff, and manipulate data. RFC 3550 [10], which defines RTP, takes this into consideration, providing an encryption option to guarantee data confiden-tiality.

RFC 3550 points to Secure Real-Time Transport Protocol (SRTP; Figure 3) [8] as a preferred method for extending RTP's functionality through the transport of RTP packets as encrypted payload data. SRTP does not include mechanisms for creating and exchanging keys and thus relies on external methods such as Multimedia Internet Keying (MIKEY, RFC 3830) [11].

The key exchange takes place in the course of the Invite dialog. MIKEY offers various approaches, including one that uses preshared keys, in which user agent A encodes the SRTP key to be exchange with a shared secret (MIKEY-PSK). This key exchange only requires a single message; the two SRTP keys for this session can be transmitted with the Invite request. MIKEY also supports a key exchange based on a Public Key Infrastructure (MIKEY-RSA), in which the initiator sends user agent B's public key and also sends the session key in the SDP packet to user B.

Another MIKEY method (MIKEY-RSA-R) completely does without a prior exchange of keys or a public key infrastructure. User agent A transmits its own public key, and user agent B responds by generating the SRTP session key and sending it back to user A.

Phil Zimmermann, the inventor of PGP, developed Zimmermann Real-Time Protocol (ZRTP) [12], an alternative key exchange method for initiating an SRTP call. ZRTP does not replace SRTP but extends its functionality.

In contrast to MIKEY, ZRTP does not use the signaling path to transmit the SRTP key information but relies entirely on the media path. An RTP connection is first established and used to exchange the session keys. After voice authentication, the clients switch to encrypted SRTP.

Although end-to-end encryption is possible and the data is protected en route between the user agent and the provider, encrypted communications terminate with the next media proxy and not with the call recipient. This gives the provider network full access to the media content. Just as with proprietary technologies such as Skype, at issue is whether or not the manufacturer is trustworthy (see the "Skype and Security" box).

Skype and Security

The proprietary VoIP provider Skype confirmed when asked that it does not have an official security policy. An independent survey [13], part of the source code to which the author was given access, confirms that the security algorithms are state of the art and correctly implemented. Despite this, the Skype application remains a black box and is not open to critical expert appraisal. It is impossible to rule out the existence of backdoors or appliances that allowa hackers, or authorities, to eavesdrop and manipulate conversations.

TLS and IPSec

Transport Layer Security sits above the transport layer as an enhanced version of SSL (Figure 4). Participants are authenticated through the TLS Handshake Protocol by means of asymmetric cryptography using a public key approach. In the course of mutual authentication, the communication partners negotiate a specially generated, symmetric session key. To verify integrity, they add an SHA- or MD5-encrypted Message Authentication Code (MAC) to the message. For more information on TLS, you can read RFC 4346 [14].

In contrast to TLS, Internet Protocol Security (IPSec) resides directly in the IP layer. Developed in the course of IPv6 standardization, IPSec adds security options to IPv4 and also works transparently for the application layer to provide additional protection to protocols without their own security mechanisms. Because it resides one layer below TLS, it does not need a reliable transport protocol; however, it also supports datagram-based protocols like UDP. IPSec supports two different modes: transport and tunnel.

1 2 Next »