Redundancy Solutions Maximize Uptime

By David Weiss

With the ongoing migration of voice communications toward server-based facilities and IP networks, today’s call centers are susceptible to all of the maladies of the network world, including hacker attacks, viruses, and Trojan Horses. While many companies have embraced redundancy in the data center, few have recognized the need to incorporate redundant systems into their voice technologies. As the corporate world hinges on a constant stream of data and voice communications, organizations – and especially call centers – must provide for the highest degree of fault tolerance to maintain these vital links to clients, customers, and employees.

Historically, the resiliency of the voice network meant disaster-planning budgets were focused in other areas of concern. Now, call centers are deploying server-based PBXs, VoIP solutions, conference bridges, and related systems. Providing new capabilities at a fraction of the cost, these services are increasingly critical to day-to-day business operations and, as such, they demand expected levels of uptime. Because these are built more frequently from a combination of hardware and software vendors, however, the likelihood is increasing that these systems experience failures.

Failures can occur at any level. Redundancy solutions are a key part of planning for the inevitable. Fail-safe options on every level, including failover servers, diverse phone lines, media storage, and hot sites can all minimize unscheduled downtime and prevent real disasters.

Minimizing Failures on the Line Side: A highly available solution attempts to eliminate single points of failure in all aspects of a system’s design. To minimize failure options, network planners need to evaluate both link redundancy and hardware redundancy. Carriers can also provide diversity and avoidance to help minimize risks.

Done at both the local and long-distance levels, diversity refers to redundant services, and avoidance ensures that redundant services do not share common facilities. Additionally, loop diversity provides two redundant circuits from a local point-of-presence (POP) to your call center. POP diversity – local links originating from multiple wire centers, or POPs – is an ideal solution. Interoffice diversity provides the same level of service between wire centers.

Diversity services may or may not include the customer premise equipment necessary to switch between redundant links. Protection switching for redundant T-1 or DS-3 circuits is either provided by the carrier or purchased and installed by the call center. Any service degradation or interruption is automatically detected with switch over to a spare circuit. Protection switching is either 1:1, with a standby circuit for each primary circuit, or 1:N, with a spare circuit for one of several circuits. Even with VoIP solutions, any gateway to the public switched network involves local loops and carrier services, and it is essential that this critical link to customers is not overlooked.

Minimizing Failures on the Equipment Side: The migration of telephony services to a server-based platform has prompted the wide acceptance of IP-based systems. Advantages, such as an open and flexible architecture, standardized components, multiple sources, and lower costs mean servers are now common for call center systems, voicemail, call recorders, and other voice technologies. Although processors, memory, storage, power supplies, telephony boards, operating systems, and application software are supplied by “best of breed” vendors, it is, however, the end user who must ensure that all the pieces work together seamlessly.

A critical metric for each piece is its availability, or readiness to perform its stated function at any given time. To achieve the best availability possible, system engineers must look at maximizing reliability and minimizing both scheduled and unscheduled downtime. Redundancy is the key to providing both the maximum reliability and the minimum repair time. Redundancy is commonplace for products at the component level that are most likely to fail, such as disk drives, power supplies, and other electro-mechanical items. Additionally, inexpensive redundant discs (RAID) and power supplies are standard with most systems for telephony services.

For more complex applications, system level redundancy needs to be evaluated. With hot standby systems, two identical systems are installed. The standby system continuously monitors the health of the primary system and, upon detecting any failures, automatically switches the standby system into service. Deploying two systems also allows for the performance of scheduled maintenance with minimal interruption, as one server is upgraded while the other is in service. And, if a planned upgrade or new software installation goes awry, the organization has an immediate, graceful fallback.

Redundancy switches essentially perform the same function as a patch panel, but they do so automatically and simultaneously for all circuits. They operate on the physical layer, moving the actual wires from phone lines and operator instruments to telephony boards in the system. With IP phone systems, switching is only required for the phone line side. As the central component of a fault tolerant solution, the redundancy switch itself cannot represent a single point of failure because magnetic latching relays provide a continuous mechanical connection in all circumstances.

Although redundancy servers are optimal solutions, several issues need to be considered. Databases must be synchronized so all configurations, securities, and call logs remain identical. Licensing is also a factor. If redundant systems share the same licenses, either they need to have the dongles switched or they require an add-on module to support an automatic redundancy switchover.

Conclusion: For call centers, voice technology is essential for effective corporate communications and customer service. To maintain these systems and ensure their optimal operation, organizations must consider the level of fault tolerance required and examine how redundancy solutions can help meet these objectives. Deploying diversity and avoidance services, as well as protection switching, will help minimize failures on the line side. In addition, standby systems and redundancy switches will mitigate the risk of failures on the equipment side. Unplanned downtime is clearly not acceptable, and redundancy solutions maximize uptime and allow voice communications to continue in the event of system failures.

David Weiss has nearly twenty years experience in product management, business development, and sales and marketing. He is an expert in the remote site management technology industry. David serves as the president and CEO of Dataprobe, a remote site management and monitoring solutions provider.

[From the April/May 2008 issue of AnswerStat magazine]