Bhavesh

Technical blog - Livekit tts bot

Project link

I was working on a POC for a project few weeks back with purpose set in to have a livekit agent which can do TTS and also to have more fine-grained control over voice activity detection, an example use case would be: when user stops talking the agent can resume/start speaking. This project is a simple POC for converting previously saved text at the server during a livekit session.

Libraries

Some important libraries I ended up choosing for this project after multiple trials:

  1. Library used for text to speech is TTS
  2. For voice activity detector, so to detect the user's pauses and to start the resume the agent. Silero VAD

Prod ready deployment planning

Other considerations that can be taken while running this project for production ready use case:

  1. For static texts - we process all the questions to wav file beforehand and then agent will use those recordings during call session which will reduce the memory usage significantly.
  2. For more realtime control we have can have a single TTS service running all the time to which any of our livekit agents can send text during interview and thus service will process it, it would result in increased latency but if there are static texts to be converted then we can process the next k questions beforehand.