分析文本 - Deepgram

使用 Deepgram 的文本到文本智能 API 来分析文本。

{ "metadata": { "request_id": "9a110df0-17bb-40ec-94c2-3cb7f862a045", "created": "2024-02-02T19:00:43.271Z", "language": "en", "summary_info": { "model_uuid": "67875a7f-c9c4-48a0-aa55-5bdb8a91c34a", "input_tokens": 1855, "output_tokens": 123 }, "sentiment_info": { "model_uuid": "ba5b22e4-b39a-4550-a4bc-d8655f5092bc", "input_tokens": 2043, "output_tokens": 2047 }, "topics_info": { "model_uuid": "ba5b22e4-b39a-4550-a4bc-d8655f5092bc", "input_tokens": 2043, "output_tokens": 225 }, "intents_info": { "model_uuid": "ba5b22e4-b39a-4550-a4bc-d8655f5092bc", "input_tokens": 2043, "output_tokens": 65 } }, "results": { "summary": { "text": "The potential for voice-based interfaces in conversational AI applications is discussed, with a focus on voice-premises and wearable devices. The success of voice-first experiences and tools, including DeepgramQuad, has led to rapid development of these technologies. The speakers emphasize the benefits of voice quality, including the ability to swap between voices, the naturalness of the flow of conversations, and the importance of tailoring voice to specific applications. They also discuss the potential for AI to be a panacea for speech recognition and text-to-speech capabilities, with a focus on speed, quality, and cost-efficiency." }, "topics": { "segments": [ { "text": "Meet Deepgram Aura: real-time text-to-speech for real-time AI agents ---------- It’s been a year since large language models (LLMs) seemingly went mainstream overnight (Happy Birthday, ChatGPT!!!), and the world has witnessed both rapid development of these technologies and immense interest in their potential.", "start_word": 1, "end_word": 43, "topics": [ { "topic": "Real-time text-to-speech", "confidence_score": 0.02132084 } ] }, { "text": "We believe that we have reached an inflection point where voice-based interfaces will be the primary means to accessing LLMs and the experiences they unlock.", "start_word": 43, "end_word": 67, "topics": [ { "topic": "Llms experiences", "confidence_score": 0.017349197 }, { "topic": "Voice-based interfaces", "confidence_score": 0.79457426 } ] }, { "text": "Thanks to ChatGPT and the advent of the LLM era, the conversational AI tech stack has advanced sufficiently to support productive (not frustrating) voice-powered AI assistants and agents that can interact with humans in a natural manner.", "start_word": 158, "end_word": 194, "topics": [ { "topic": "Voice-powered ai agents", "confidence_score": 0.0036265212 } ] }, { "text": "That’s why we’re excited to introduce our own text-to-speech (TTS) API, Deepgram Aura, built for real-time, conversational voice AI agents.", "start_word": 270, "end_word": 289, "topics": [ { "topic": "Conversational ai", "confidence_score": 0.26679963 } ] }, { "text": "We are thrilled about the progress our initial group of developers has made using Aura, so much so that we are extending limited access to a select few partners who will be free to begin integrating with Aura immediately.", "start_word": 333, "end_word": 371, "topics": [ { "topic": "Aura integration", "confidence_score": 0.2293859 } ] }, { "text": "High Production is all about crafting the perfect voice. It's used in projects where every tone and inflection matters, like in video games or audiobooks, to really bring a scene or story to life.", "start_word": 434, "end_word": 467, "topics": [ { "topic": "High throughput", "confidence_score": 0.4301796 } ] }, { "text": "High Production is all about crafting the perfect voice. It's used in projects where every tone and inflection matters, like in video games or audiobooks, to really bring a scene or story to life. Here, voice quality is king, with creators investing hours to fine-tune every detail for a powerful emotional impact.", "start_word": 434, "end_word": 485, "topics": [ { "topic": "Voice ai technology", "confidence_score": 0.5011565 } ] }, { "text": "These tasks are relevant to just about everyone on the planet, and they require fast, efficient text-to-speech conversion for an AI agent to fulfill them.", "start_word": 569, "end_word": 593, "topics": [ { "topic": "Text-to-speech conversion", "confidence_score": 0.56403285 }, { "topic": "Importance of text-to-speech", "confidence_score": 0.0008262385 } ] }, { "text": "While voice quality is still important to keep users engaged, quality here is more about the naturalness of the flow of conversation and less about sounding like Morgan Freeman.", "start_word": 594, "end_word": 622, "topics": [ { "topic": "Conversation quality", "confidence_score": 0.0012676471 } ] }, { "text": "While voice quality is still important to keep users engaged, quality here is more about the naturalness of the flow of conversation and less about sounding like Morgan Freeman. But the primary focus for most customers in this category is on improving customer outcomes, meaning speed and efficiency are must-haves for ensuring these everyday exchanges are smooth and reliable at high volume.", "start_word": 594, "end_word": 655, "topics": [ { "topic": "Quality assurance", "confidence_score": 0.08728356 }, { "topic": "User engagement", "confidence_score": 0.011892504 } ] }, { "text": "And our customers would be more than satisfied with the conversation quality.\" Jordan Dearsley, Co-founder at Vapi Although high production use cases seem to be well-served with UI-centric production tools, high throughput, real-time use cases still mostly rely on APIs provided by the major cloud providers.", "start_word": 672, "end_word": 717, "topics": [ { "topic": "Conversation quality", "confidence_score": 0.9803023 } ] }, { "text": "Furthermore, we are dedicated to tailoring these voices to their specific applications, ensuring they remain composed and articulate, particularly in enunciating account numbers and business names with precision.", "start_word": 818, "end_word": 845, "topics": [ { "topic": "Ai-based voice recognition", "confidence_score": 0.003596105 } ] }, { "text": "The quality and overall performance will continue to improve with additional model training and refinement. We encourage you to give them a listen and note the naturalness of their cadence, rhythm, and tone in the flow of conversation with another human.", "start_word": 953, "end_word": 993, "topics": [ { "topic": "Aura performance", "confidence_score": 0.0024802522 } ] }, { "text": "Along the way, we’ve transcribed trillions of spoken words into highly accurate transcriptions.", "start_word": 1018, "end_word": 1030, "topics": [ { "topic": "Api-based transcriptions", "confidence_score": 0.004975504 } ] }, { "text": "Our model research team has developed novel transformer architectures equipped to deal with the nuances of conversational audio–across different languages, accents, and dialects, while handling disfluencies and the changing rhythms, tones, cadences, and inflections that occur in natural, back-and-forth conversations.", "start_word": 1032, "end_word": 1071, "topics": [ { "topic": "Languages", "confidence_score": 0.0004280001 } ] }, { "text": "So what can you expect from Aura? Delivering the same market-leading value and performance as Nova-2 does for STT. Aura is built to be the panacea for speed, quality, and efficiency–the fastest of the high-quality options, and the best quality of the fast ones.", "start_word": 1223, "end_word": 1266, "topics": [ { "topic": "Aura", "confidence_score": 0.3260436 }, { "topic": "Aura", "confidence_score": 0.032662213 } ] }, { "text": "Delivering the same market-leading value and performance as Nova-2 does for STT. Aura is built to be the panacea for speed, quality, and efficiency–the fastest of the high-quality options, and the best quality of the fast ones.", "start_word": 1230, "end_word": 1266, "topics": [ { "topic": "Performance", "confidence_score": 0.0063725146 } ] }, { "text": "\"Deepgram is a valued partner, providing our customers with high throughput speech-to-text that delivers unrivaled performance without tradeoffs between quality, speed, and cost.", "start_word": 1284, "end_word": 1306, "topics": [ { "topic": "Speech-to-text", "confidence_score": 0.20689109 } ] } ] }, "intents": { "segments": [ { "text": "That’s why we’re excited to introduce our own text-to-speech (TTS) API, Deepgram Aura, built for real-time, conversational voice AI agents.", "start_word": 270, "end_word": 289, "intents": [ { "intent": "Introduce deepgram ura", "confidence_score": 0.72176206 }, { "intent": "Provide voice-based agents", "confidence_score": 0.0034496784 } ] }, { "text": "Here, voice quality is king, with creators investing hours to fine-tune every detail for a powerful emotional impact.", "start_word": 468, "end_word": 485, "intents": [ { "intent": "Demonstrate quality", "confidence_score": 0.000025880421 } ] }, { "text": "The quality and overall performance will continue to improve with additional model training and refinement. We encourage you to give them a listen and note the naturalness of their cadence, rhythm, and tone in the flow of conversation with another human.", "start_word": 953, "end_word": 993, "intents": [ { "intent": "Enhance voice performance", "confidence_score": 0.0164178 } ] }, { "text": "And all the while, we’ve purposefully built our models under limited constraints to optimize their speed and efficiency.", "start_word": 1071, "end_word": 1088, "intents": [ { "intent": "Optimize audio performance", "confidence_score": 0.28505138 } ] } ] }, "sentiments": { "segments": [ { "text": "Meet Deepgram Aura: real-time text-to-speech for real-time AI agents ---------- It’s been a year since large language models (LLMs) seemingly went mainstream overnight (Happy Birthday, ChatGPT!!!), and the world has witnessed both rapid development of these technologies and immense interest in their potential.", "start_word": 0, "end_word": 42, "sentiment": "neutral", "sentiment_score": 0.18202751874923703 }, { "text": "We believe that we have reached an inflection point where voice-based interfaces will be the primary means to accessing LLMs and the experiences they unlock. Here are a few recent signals in support of our thesis: - Good old fashioned voice notes are enjoying a healthy resurgence.", "start_word": 43, "end_word": 89, "sentiment": "positive", "sentiment_score": 0.38409921526908875 }, { "text": "- According to a recent survey, a majority of respondents stated phone calls are still their preferred communication channel for resolving customer service issues. - An emerging boom in wearable devices equipped with continuous listening and speech AI technology is gaining steam. - OpenAI recently enabled voice interactions in ChatGPT. - A wave of interest in voice-first experiences and tools is sweeping across brands, investors, and tech companies.", "start_word": 90, "end_word": 157, "sentiment": "neutral", "sentiment_score": 0.2346823811531067 }, { "text": "Thanks to ChatGPT and the advent of the LLM era, the conversational AI tech stack has advanced sufficiently to support productive (not frustrating) voice-powered AI assistants and agents that can interact with humans in a natural manner.", "start_word": 158, "end_word": 194, "sentiment": "positive", "sentiment_score": 0.4896208047866822 }, { "text": "We have already observed this from our most innovative customers who are actively turning to these technologies to build a diverse range of AI agents for voice ordering systems, interview bots, personal AI assistants, automated drive-thru tellers, and autonomous sales and customer service agents.", "start_word": 195, "end_word": 238, "sentiment": "neutral", "sentiment_score": 0.26346486806869507 }, { "text": "While these AI agents hold immense potential, many customers have expressed their dissatisfaction with the current crop of voice AI vendors, citing roadblocks related to speed, cost, reliability, and conversational quality.", "start_word": 239, "end_word": 269, "sentiment": "negative", "sentiment_score": -0.4057016372680664 }, { "text": "That’s why we’re excited to introduce our own text-to-speech (TTS) API, Deepgram Aura, built for real-time, conversational voice AI agents. Whether used on its own or in conjunction with our industry-leading Nova-2 speech-to-text API, we’ll soon provide developers with a complete speech AI platform, giving them the essential building blocks they need to build high throughput, real-time AI agents of the future.", "start_word": 270, "end_word": 331, "sentiment": "positive", "sentiment_score": 0.4036688804626465 }, { "text": "We are thrilled about the progress our initial group of developers has made using Aura, so much so that we are extending limited access to a select few partners who will be free to begin integrating with Aura immediately. With their feedback, we’ll continue to enhance our suite of voices and API features, as well as ensure a smooth launch of their production-grade applications.", "start_word": 332, "end_word": 395, "sentiment": "positive", "sentiment_score": 0.6666476130485535 }, { "text": "What Customers Want ---------- I feel the need, the need for speed What we’ve heard from many of our customers and partners is that voice AI technology today caters to two main areas: high production or high throughput. High Production is all about crafting the perfect voice.", "start_word": 396, "end_word": 442, "sentiment": "neutral", "sentiment_score": 0.10989074409008026 }, { "text": "It's used in projects where every tone and inflection matters, like in video games or audiobooks, to really bring a scene or story to life. Here, voice quality is king, with creators investing hours to fine-tune every detail for a powerful emotional impact. The primary benefit is the ability to swap out a high-paid voice actor with AI where you have more dynamic control over what’s being said while also achieving some cost savings. But these use cases are more specialized and represent just a sliver of the overall voice AI opportunity.", "start_word": 443, "end_word": 534, "sentiment": "positive", "sentiment_score": 0.4490419030189514 }, { "text": "On the flip side, High Throughput is about handling many quick, one-off interactions for real-time conversations at scale. Think fast food ordering, booking appointments, or inquiring about the latest deals at a car dealership. These tasks are relevant to just about everyone on the planet, and they require fast, efficient text-to-speech conversion for an AI agent to fulfill them. While voice quality is still important to keep users engaged, quality here is more about the naturalness of the flow of conversation and less about sounding like Morgan Freeman.", "start_word": 535, "end_word": 622, "sentiment": "neutral", "sentiment_score": 0.2202893942594528 }, { "text": "But the primary focus for most customers in this category is on improving customer outcomes, meaning speed and efficiency are must-haves for ensuring these everyday exchanges are smooth and reliable at high volume. \"Deepgram showed me less than 200ms latency today. That's the fastest text-to-speech I’ve ever seen.", "start_word": 623, "end_word": 670, "sentiment": "positive", "sentiment_score": 0.4590202569961548 }, { "text": "And our customers would be more than satisfied with the conversation quality.\" Jordan Dearsley, Co-founder at Vapi Although high production use cases seem to be well-served with UI-centric production tools, high throughput, real-time use cases still mostly rely on APIs provided by the major cloud providers.", "start_word": 671, "end_word": 716, "sentiment": "neutral", "sentiment_score": 0.01252671144902706 }, { "text": "And our customers have been telling us that they’ve been falling short, with insufficient quality for a good user experience, too much latency to make real-time use cases work, and costs too expensive to operate at scale.", "start_word": 717, "end_word": 753, "sentiment": "negative", "sentiment_score": -0.49942296743392944 }, { "text": "More human than human ---------- With Aura, we’ll give realistic voices to AI agents. Our goal is to craft text-to-speech capabilities that mirror natural human conversations, including timely responses, the incorporation of natural speech fillers like 'um' and 'uh' during contemplation, and the modulation of tone and emotion according to the conversational context. We aim to incorporate laughter and other speech nuances as well. Furthermore, we are dedicated to tailoring these voices to their specific applications, ensuring they remain composed and articulate, particularly in enunciating account numbers and business names with precision. \"I don’t really consider Azure and the other guys anymore because the voices sound so robotic.\" Jordan Dearsley, Co-founder at Vapi In blind evaluation trials conducted for benchmarking, early versions of Aura have consistently been rated as sounding more human than prominent alternatives, even outranking human speakers for various audio clips more often than not on average.", "start_word": 754, "end_word": 902, "sentiment": "neutral", "sentiment_score": 0.10511736571788788 }, { "text": "We were pleasantly surprised by these results (stay tuned for a future post containing comprehensive benchmarks for speed and quality soon!), so much so that we’re accelerating our development timeline and publicly announcing today’s waitlist expansion.", "start_word": 903, "end_word": 938, "sentiment": "positive", "sentiment_score": 0.4318973124027252 }, { "text": "Here are some sample clips generated by one of the earliest iterations of Aura.", "start_word": 939, "end_word": 952, "sentiment": "neutral", "sentiment_score": 0.1747044026851654 }, { "text": "The quality and overall performance will continue to improve with additional model training and refinement.", "start_word": 953, "end_word": 967, "sentiment": "positive", "sentiment_score": 0.3693663775920868 }, { "text": "We encourage you to give them a listen and note the naturalness of their cadence, rhythm, and tone in the flow of conversation with another human. Our Approach ---------- For nearly a decade, we’ve worked tirelessly to advance the art of the possible in speech recognition and spoken language understanding. Along the way, we’ve transcribed trillions of spoken words into highly accurate transcriptions.", "start_word": 968, "end_word": 1030, "sentiment": "neutral", "sentiment_score": 0.2442323863506317 }, { "text": "Our model research team has developed novel transformer architectures equipped to deal with the nuances of conversational audio–across different languages, accents, and dialects, while handling disfluencies and the changing rhythms, tones, cadences, and inflections that occur in natural, back-and-forth conversations. And all the while, we’ve purposefully built our models under limited constraints to optimize their speed and efficiency. With support for dozens of languages and custom model training, our technical team has trained and deployed thousands of speech AI models (more than anybody else) which we operate and manage for our customers each day using our own computing infrastructure. We also have our own in-house data labeling and data ops team with years of experience building bespoke workflows to record, store, and transfer vast amounts of audio in order to label it and continuously grow our bank of high-quality data (millions of hours and counting) used in our model training. These combined experiences have made us experts in processing and modeling speech audio, especially in support of streaming use cases with our real-time STT models. Our customers have been asking if we could apply the same approach for TTS, and we can. So what can you expect from Aura?", "start_word": 1031, "end_word": 1229, "sentiment": "neutral", "sentiment_score": 0.18155942857265472 }, { "text": "Delivering the same market-leading value and performance as Nova-2 does for STT. Aura is built to be the panacea for speed, quality, and efficiency–the fastest of the high-quality options, and the best quality of the fast ones. And that’s really what end users need and what our customers have been asking us to build. \"Deepgram is a valued partner, providing our customers with high throughput speech-to-text that delivers unrivaled performance without tradeoffs between quality, speed, and cost.", "start_word": 1230, "end_word": 1306, "sentiment": "positive", "sentiment_score": 0.4960947036743164 }, { "text": "We're excited to see Deepgram extend their speech AI platform and bring this approach to the text-to-speech market.\" - Richard Dumas, VP AI Product Strategy at Five9 What's Next ---------- As we’ve discussed, scaled voice agents are a high throughput use case, and we believe their success will ultimately depend on a unified approach to audio, one that strikes the right balance between natural voice quality, responsiveness, and cost-efficiency. And with Aura, we’re just getting started.", "start_word": 1307, "end_word": 1382, "sentiment": "neutral", "sentiment_score": 0.2990237772464752 }, { "text": "We’re looking forward to continuing to work with customers like Asurion and partners like Five9 across speech-to-text AND text-to-speech as we help them define the future of AI agents, and we invite you to join us on this journey.", "start_word": 1383, "end_word": 1421, "sentiment": "positive", "sentiment_score": 0.5466783046722412 }, { "text": "We expect to release generally early next year, but if you’re working on any real-time AI agent use cases, join our waitlist today to jumpstart your development in production as we continue to refine our model and API features with your direct feedback.", "start_word": 1422, "end_word": 1464, "sentiment": "neutral", "sentiment_score": 0.32348108291625977 } ], "average": { "sentiment": "neutral", "sentiment_score": 0.2622680365893686 } } } }

请求参数

Query 参数

language

string

必需

示例值:

Header 参数

Content-Type

string

必需

示例值:

application/json

Body 参数application/json

text

string

可选

基本文本请求（以字符串或文本/纯文本形式发送文本时）

url

string

必需

基本 URL 请求（将文本作为托管 URL 发送时）

示例

返回响应

🟢200成功

application/json

Body

metadata

object

必需

request_id

string

必需

created

string

必需

language

string

必需

summary_info

object

必需

sentiment_info

object

必需

topics_info

object

必需

intents_info

object

必需

results

object

必需

summary

object

必需

topics

object

必需

intents

object

必需

sentiments

object

必需

🟠400请求有误