VoIP converts voice signals from a telephone into digital signals that can be transmitted over the Internet. VoIP is becoming more popular every day.
Nowadays it is common to find leased lines and VPN connections between company branches which are used to transport voice traffic in addition to data.
However, voice has strict real-time requirements in terms of delay, jitter, and bandwidth.
To be able to effectively carry voice over a connectionless, unreliable protocol such as IP, certain mechanisms need to be addressed. A protocol on an upper layer needs to provide reliability and different compression algorithms need to exist to satisfy different bandwidth requirements.
All of these things are crucial in designing and operating a VoIP infrastructure, and this is exactly what we’ll talk about today.
Traditional voice networks are based on Time Division Multiplexing, a technique that requires both accurate and timely processing.
Voice traffic has strict requirements in terms of delay, jitter and packet loss. Circuit switched telephone exchanges were able to fulfill those requirements and provide reliable and guaranteed voice services.
The need for carrying voice traffic over the data network has caused an evolution to voice architectural design both on the signaling call control part and media transportation.
The big question is: How to integrate connection-oriented voice traffic in a connectionless IP network and still provide a reliable service?
VoIP Protocols have been implemented to provide a solution to the above question.
Signaling protocols have been enhanced to address the call control requirements over IP networks. In addition, media transportation protocols have been designed to reliably transfer voice packets and effectively save bandwidth.

Voice media packets use RTP/UDP for transport; this is a constant attribute. For media, UDP is always used and on top RTP protocol provides reliable transfer by providing sequencing functionality, hence providing the mechanism of synchronizing and reordering media packets.
Moreover RTCP (Real Time Control Protocol) operates on top of RTP and provides the mechanism for controlling RTP by monitoring QoS parameters on running sessions.
Total Bandwidth consumed depends on the compression codec. Codec type to be used can be negotiated per call session or it can be preconfigured from the beginning.
A few codec types are presented below along with the perceived codec rate:
To be able to correctly plan the bandwidth requirements of the WAN link for carrying voice traffic, the following formula needs to be considered:
IP, UDP and RTP headers have more or less a constant size. IP has a 20-byte header, UDP consists of an 8-byte header and RTP has a 12-byte header.
The terms in the formula that are variable are Layer 2 headers and payload size. The latter depends upon the codec used, while the first depends on the link layer protocol used, i.e. Ethernet, HDLC, Frame Relay, PPP, etc.
Let’s calculate the Bandwidth requirements for the HDLC data link protocol (approximately 6-byte header size) for the codec types presented above.
When VAD enabled codecs are used, extra bandwidth savings are achieved. Approximate bandwidth estimation per call is presented below.
Keep in mind that this is just an approximate estimation and not a fix per call value. It is a general observation that is concealed from a sample of more than 20 simultaneous calls on a specific WAN link.
Real-time Transport Protocol (RTP) is a protocol used for carrying packetized audio and video traffic over an IP network. RTP has been designed to serve transport requirements of real-time traffic such as audio and video.
RTP has a minimal 12-byte long header which combined with IP (20-byte header) and UDP (8-byte header) creates a total of 40-byte header. This is too much for header size; it is inefficient to transmit the 40-byte header per packet without applying compression to it.
By applying RTP header compression (CRTP), the IP/UDP/RTP header in an RTP data packet is reduced from 40 bytes to approximately 2 to 5 bytes, as shown on figure 2 below.
RTP header compression is a hop-by-hop scheme therefore all parties involved within the transmission path should comply with this scheme. Details on CRTP can be found in RFC 2508.

Choosing the right codec to use is not that simple.
If Bandwidth is not an issue then the traditional codec G.711 is the best choice to use. It is the only codec that has achieved an excellent grade of service.
If bandwidth and processing resources are an important issue, then a compromise needs to be made between the amount of CPU resources to spare, voice quality and additional delay incorporated by the use of the codec.
In practice, G.729 and G.723 are the most popular compression schemes used.