|
|
SpeechBench is a tool designed to voice enable web content for access over telephones as well as browsers. |
|
|
In the earlier days, voice enabling was achieved by complex TAPI programming. As things progressed, several markup languages appeared and the standard that took hold for voice was Voice XML (VXML), an XML based markup language. |
|
A similarly conceptualized language that is now beginning to emerge as a competing standard is SALT. Both VXML and SALT are based on open standards with VXML supported by consortium of industry leaders headed by IBM, just as SALT is a result of industry efforts championed by Microsoft. |
|
One key differentiator is that VXML is suitable for creating of telephony-only voice applications; SALT is multimodal - meaning it can be accessed in voice mode via telephones, devices and browsers. IBM's recent remedy to this is the X+V approach, meaning to create multimodal voice; developers need to create XHTML and VXML. |
|
It will be interesting note to the X+V approach is that if it existed and were possible to get developers into developing strict HTML (akin to XHTML) content, then our core SpeechFirst technology is capable of voice enabling ANY content for any output for device on the fly! |
|
The Given |
|
Before either of these had become standards, we had commenced on our own markup language for voice access, but as VXML and SALT became the dominant standards, there was no opportunity for a new standard, and certainly not from an organization that was small and privately held. |
|
We could choose to either incline towards one or work to leverage both. |
|
|
Voice application development requires knowledge of voice markup languages,voice interface design and an understanding of telephony |
|
|
VXML & SALT programmers are in short supply, with direct annual wages of about $ 100,000 for a capable programmer |
|
|
Deployment of voice applications could require investment in upgrades in addition to "voice servers" |
|
|
Evolving out of legacy approaches, voice systems by the leading pure-play voice vendors are expensive to deploy and even more expensive to maintain. This is evidenced by the SLA and support income on balance sheets of these major players. |
|
|
Due to mergers and acquisitions many organizations have inherited dual systems, creating a nightmare in maintenance and diluting the cost-saving dreams of the management. |
|
|
The Needs |
|
Although larger companies guard their outflow vigorously, they can afford to invest a substantial amount for voice automation. Not withstanding the ROI of voice applications, smaller organizations do not have the wherewithal to invest in such embellishments. We believe the lethargy in wider use of voice automation is directly related to the high cost and capability dearth. This divide has to be bridged.
|
|
So far as the customer that wants a voice application built, the need is not of yet another markup language, the need is for a common approach that can provide a great application at the lowest possible cost, with least possible infrastructural investments demanding minimal resources to manage. Our approach had to be intelligent scripting and automation of processes. In search of neutrality, we chose to adopt HTML as a base! |
|
Another key point to consider was that as X+V approaches or VXML & SALT standards evolve or yet others appear; smaller companies that do not have dedicated resources would be forced to re-engineer their applications. |
|
We went to the drawing board after studying the domain and debating the issue for a full year! |
|
The Approach |
|
Based on our study of the domain, of the high profile failures in the industry, we felt a wider approach would provide a fair shot for the technology. The result had to be a simple technology that yielded affordable voice applications - the dilemma was how to build-in futuristic considerations. |
|
We decided to create an efficient, invisible "middle layer". We leveraged our knowledge of markup languages and created SIVA ( S peech I nteractive V oice A ttributes) middle layer ' attribute of markups' . SIVA would provide a core set of attributes (of no concern to the developer) to which could then be mapped specific VXML tags, or SALT tags or any tag! So far as the developers are concerned, they are simply developing a voice application; in the end they just click "Compile VXML" or "Compile SALT" and the application would be ready for hosting on the appropriate environment. |
|
SIVA standards will be published and the open source community will likely enhance it; the SIVA mapping is housed in SpeechBench which is proprietary to us giving us the optimal leverage for futurizing, and to create optimal voice solutions. |
|
SpeechBench as a voice authoring tool is the mother of either voice standard, it becomes technology-neutral to the specific markup developer chooses to generate, and this offers the industry leaders an opportunity to sell their speech hardware and platform software to a wider spectrum. |
|
Conceptually, a SIVA based, technology-neutral voice/telephony interpreter could be created in the future, and for voice applications developed using SpeechBench, or pure VXML or pure SALT. |
|