WebRTC adoption has increased in the past few years and is expected to continue growing in the near future. Gartner expects that by 2019, WebRTC will be utilized for 15% of enterprise voice and video communication. By the end of 2015 there were more than 850 vendors and projects using it, a more than 100% growth in 2 years, which is a good indication that the technology is booming.
We’ve seen a wave of creative uses of WebRTC for communications and collaboration through websites, sales apps, contact centers and customer care and business applications—to name a few.
Gartner expects that by 2019, WebRTC will be utilized for 15% of enterprise voice and video communication
We believe that one of the reasons for this trend is that real time video enables a live experience in multiple different industries, it’s easy for the user (with no software to download) and for the company represents a tiny fraction of the cost of providing a live experience (when compared to running local branches, schools, clinics, and so on). It recaptures the human element that was lost through the “optimization” of customer support. Trying to cut costs by making everything online and asynchronous just made us (the consumers) unhappy. Now live customer experiences can be optimized to save the company money and to keep the customer engaged and happy.
We have been implementing WebRTC-related projects over the past 4 years for organizations of all sizes, from startup companies to large carriers. Among the challenges and lessons learned, there is one which stands out – interoperability.
The Interoperability Challenge
To give you some perspective, our very first project, more than 4 years ago, consisted of developing a gateway to allow WebRTC to interoperate with existing communications infrastructure. Exactly the same scope came up for a more recent project, which leads us to believe the repetitive need for interoperability is not a coincidence. And, it leads us to ask, why is WebRTC interoperability such a big deal?
Why is WebRTC interoperability such a big deal?
Of course, for those implementing new apps with no integration to legacy communications infrastructure, there is no need to worry. However, if you are planning to provide a service where your customers will eventually demand to leverage their existing invested infrastructure then you should be aware that some complications may arise.
By implementing WebRTC-related projects for Telecom, Call Center and Enterprise Collaboration scenarios—and even enabling remote video calls to prison inmates—we identified some patterns, involving mainly five aspects of interoperability:
- Call control
- Identity management
In communications systems, signaling is the process of coordinating the communication between the parties and through the system. It is separated from the media itself but is necessary to handle user registration, call setup and teardown as well as many other call features.
WebRTC does not specify signaling and leaves it to the system developer to define what they want to use. So you can create your own simple protocol to handle the signaling which many vendors have done but the “legacy” world is dominated by SIP (Session Initiation Protocol) or in some cases H.323.
However, even when both the WebRTC endpoint and the legacy use the same signaling protocol, such as SIP, there may be caveats. One of them is that SIP, for example, “is an application layer protocol designed to be independent of the underlying transport layer”. In other words, it relies on other underlying protocols which may be different in each endpoint. So in the case of WebRTC you would have SIP over WebSocket on the WebRTC side, and SIP over TCP/UDP on the legacy side creating the need for conversion.
The solution for this and other interoperability challenges is the use of gateways and proxies that are able to translate protocols efficiently. We mention some of these tools below.
Legacy communication platforms usually support multiple call controlling features for hold, park, transfer and so on. If you are going to be interoperable with existing systems then you will need to decide which of these call controls will be necessary for your customer use cases and then how to create that type of feature in the WebRTC side? Solving this could be done in a gateway software but ultimately requires development that enables efficient management of voice and video, so when the conversation is put on hold, for example, video content is not sent—therefore, saving bandwidth.
WebRTC uses modern video codecs, such as VP8 and VP9, whereas legacy networks use several different, usually older ones, which may require conversion. A common example of a video codec requiring conversion is H.264. Although in the future (very soon we hope) WebRTC implementations in all browsers clients will support it, given it is now a specification requirement. There are also common audio codecs in legacy networks, such as AMR, AMR-WB and G.722.1 that are not currently supported by WebRTC, and may require transcoding, although there are discussions on whether some of them should be included as part of the WebRTC specification in the future, as suggests a recent memo from IETF (Internet Engineering Task Force).
Transcoding is also needed when variants of the same protocol are used by the endpoints. For example, Microsoft created H.264UC which does not work with the common H.264AVC.
Transcoding video is undesirable because it is is CPU-intensive, but sometimes unavoidable if the objective is to achieve interoperability. One of our clients, for example, uses an array of 60+ core computers only for that task.
Although there are many discrete aspects to Transcoding, the point here is that all of these challenges can be solved so WebRTC interoperates with existing infrastructure.
Legacy communication platforms usually control subscriber identity, if not for billing, at least for security aspects and feature enablement. There are uses cases where WebRTC clients needs to register with these legacy systems and that introduces a need for unified identification of users, which must be handled by the applications or in a gateway device.
When it comes to security, on the one hand WebRTC is all encrypted, but on the other, unfortunately legacy communication infrastructure usually is not. And sometimes the only feasible alternative for endpoints on both sides to interop is by encrypting/decrypting media. For example, a WebRTC gateway may convert media from RTP (Real-time Transport Protocol)—a network protocol for delivering audio and video over IP networks—to SRTP or DTLS-SRTP, its secure counterpart.
Other P2P Challenges
Because legacy networks are behind firewalls and other security mechanisms, in order to provide peer-to-peer communication (as much as possible) through these barriers, usually a technique known as ICE (Interactive Connectivity Establishment) is used to find the best path to connect peers. Although not a major challenge in WebRTC interop initiatives, it is also important to consider as it leverages two other technologies that may be components of an interop architecture: STUN and TURN.
The basic steps in this mechanism are:
- ICE tries to obtain a direct connection between peers.
- If that fails (for example when peers are behind NATs), ICE obtains an external address using a STUN (Session Traversal Utilities for NAT) server.
- If that also fails, ICE falls back to a TURN (Traversal Using Relays around NAT) server, an intermediary server which improves call success rate, but also increases bandwidth consumption.
Legacy network endpoints usually do not take advantage of ICE, so gateways that leverage this technology may be used to allow for external media to interop with these endpoints deep into legacy infrastructure.
Luckily, many open source tools are available to help overcome interoperability challenges, and most are a good fit for WebRTC and legacy infrastructures. Here are a few we have experience using, along with a short description from their websites:
- Kamailio: a high performance open source SIP server, able to handle thousands of call setups per second
- FreeSWITCH: An open-source telephony platform designed to facilitate the creation of voice and chat driven products
- reSIProcate: set of components including a SIP stack implementation and a few related protocols
- webrtc2sip: a gateway that allows a web browser to make and receive calls from/to any SIP-legacy network or PSTN
- libnice: an implementation of the Interactive Connectivity Establishment (ICE) standard and the Session Traversal Utilities for NAT (STUN) standard. It automates the process of traversing NATs and provides security against some attacks. It also allows applications to create reliable streams using a TCP over UDP layer.
- PJSIP: an open source SIP, media, and NAT traversal library implementing standard based protocols such as SIP, SDP, RTP, STUN, TURN, and ICE
There are a number of open source tools that help address interoperability challenges, although our experience shows that success requires choosing the right ones for each scenario
Our experience shows that success requires choosing the right pieces for each scenario, and good engineering practices to build a scalable, reliable architecture out of the chosen options. Although the tools themselves are great, they will be part of a multi-component architecture, which will need to integrate with large, mission-critical communication infrastructures with challenging availability requirements. So, yes, it can be complicated, but solving interoperability challenges is very doable.
On the Horizon for WebRTC
A couple of emerging technologies, protocols and solutions from vendors may introduce some changes in the WebRTC interoperability scenario in the short term and are important to watch. Among them we highlight:
- Next generation of audio and video codecs: VP9 and H.265 for video, and iSAC, iLBC for audio.
- New Enterprise SBC (Session Border Controller) features. E-SBCs are devices used between Enterprise networks and Session Initiation Protocol (SIP) trunking providers, as well as between different enterprise unified communications (UC) platforms, and between UC endpoints and the associated UC platforms.
- Some E-SBC vendors are starting to include gateways to enable WebRTC endpoints to connect to non-WebRTC devices, such as to a phone connected through the PSTN, which may facilitate interoperability.
- Microsoft has launched a variant of WebRTC called ORTC – Object real-time communications, which is not out of the box interoperable with WebRTC (currently). In any case if this is to be supported also then it will introduce additional interop challenges in your environment.
WebRTC is gaining momentum in multiple industries, and some use cases still require interoperability with legacy systems. Challenges for WebRTC interoperability primarily revolve around signaling, call control, transcoding, identity management and security. There are a number of open source tools that help address these challenges. However, as these components are added to your infrastructure, new concerns will arise, such as how to scale, monitor and manage all of them, which may involve a learning curve and careful engineering. Additionally, as new codecs and protocols emerge, we believe interoperability will continue to be a challenge, although likely to simplify in the future as mainstream adoption grows and equipment infrastructure vendors and web browsers increase native support to the same emerging protocols.
In our next blog, we will cover the unique implementation challenges of three different communication infrastructure case studies: Enterprise Collaboration, Call Center, and Telecom.