In the era of voice-enabled devices like Google Assistant, Amazon Alexa it’s quite obvious that In the near future, there will be more or less support of voice-enabled services in every aspect of our routine life.
As it provides better interactivity and easy accessibility, it’ll be a game-changer for the next generation. There are already smart houses out there where every single thing in your house can talk to you and respond to your command. There is neither GUI nor content needed in voice-enabled devices, the only concerning factor is speed. You can get a faster response compared to all other technologies.
There are so many libraries and API out there, you can use to get started with your voice bot like
We are going to use Google web speech API from speechRecognition library. It’s easy to use as it has a default API key that is hard-coded into the SpeechRecognition library.
So that you can get started using it without any configuration and authentication process. Of course like every other API it has a daily limit of 50 requests. And we can’t raise the limit by any chance. so this is the best API you can use for experiment purposes. For production or live scenarios, you’ll have to purchase paid services from the above-mentioned APIs.
There will be a three-step process for every voice-enabled device –
So, Let’s get started with developing your first voice-enabled bot.
pip install SpeechRecognition
pip install pyaudio
pip install Flask
Script.py import json import os from flask import Flask, Response from flask import jsonify from flask import request, redirect from flask_socketio import SocketIO from flask_cors import CORS import ss import speech_recognition as sr import io from gtts import gTTS app = Flask(__name__) socketio = SocketIO(app) CORS(app) # Redirect http to https on CloudFoundry @app.before_request def before_request(): fwd = request.headers.get('x-forwarded-proto') if fwd is None: return None elif fwd == "https": return None elif fwd == "http": url = request.url.replace('http://', 'https://', 1) code = 301 return redirect(url, code=code) @app.route('/') def Welcome(): return app.send_static_file('index.html') @app.route('/api/conversation', methods=['POST', 'GET']) def getConvResponse(): convText = request.form.get('convText') convContext = request.form.get('context', "{}") jsonContext = json.loads(convContext) if convText: response = "Did you mean, " + convText + " ?" else: response = "Hello There" responseDetails = {'responseText':response, 'context':response} return jsonify(results=responseDetails) @app.route('/api/text-to-speech', methods=['POST']) def getSpeechFromText(): inputText = request.form.get('text') def generate(): if inputText: audioOut = gTTS(text=inputText, lang='en', slow=False) kk = audioOut.save("welcome.mp3") f = open("welcome.mp3",'rb') data = f.read() else: print("Empty response") data = "I have no response to that." yield data return Response(response=generate(), mimetype="audio/x-wav") @app.route('/api/speech-to-text', methods=['POST']) def getTextFromSpeech(): recognizer = sr.Recognizer() f = request.files['audio_data'] print(f,type(f)) file_obj = io.BytesIO() file_obj.write(f.read()) file_obj.seek(0) mic = sr.AudioFile(file_obj) response = ss.recognize_speech_from_mic(recognizer, mic) print('\nSuccess : {}\nError : {}\n\nText from Speech\n{}\n\n{}' \ .format(response['success'], response['error'], '-'*17, response['transcription'])) return Response(response=response['transcription'], mimetype='plain/text') port = 5000 if __name__ == "__main__": socketio.run(app, host='0.0.0.0', port=int(port)) Ss.py import speech_recognition as sr def recognize_speech_from_mic(recognizer, microphone): with microphone as source: audio = recognizer.record(source) response = { "success": True, "error": None, "transcription": None } try: response["transcription"] = recognizer.recognize_google(audio) except sr.RequestError: # API was unreachable or unresponsive response["success"] = False response["error"] = "API unavailable/unresponsive" except sr.UnknownValueError: # speech was unintelligible response["error"] = "Unable to recognize speech" return response
Run script.py file and it’ll run your server on 5000 ports. you‘ll need to call all defined functions from your front-end i.e HTML and javascript.
Let’s understand the code first.
Here in this tutorial, we developed a pretty simple example of a voice bot to make you understand how voice recognition works. You can use it in your live project by adding more functionalities. Feel free to contact us for any queries and know more about the other services we provide in Voice Assistant App development.
Generative AI refers to a category of advanced algorithms designed to produce original content across…
Generative AI Video Tools Everyone Should Know About Generative AI is revolutionizing video creation, making…
Large Language Models (LLMs) are a transformative advancement in artificial intelligence, capable of understanding, processing,…
In the ever-evolving landscape of retail, virtual clothing mirrors stand out as a key differentiator,…
As technology evolves, businesses in the retail and beauty sectors face increased pressure to innovate…
The technological realm is continuously evolving, and as it stands, Augmented Reality (AR) and Artificial…