Categories: Blog

Speech Recognition & response in web or mobile directly without Alexa/Google home dependency

In the era of voice-enabled devices like Google Assistant, Amazon Alexa it’s quite obvious that In the near future, there will be more or less support of voice-enabled services in every aspect of our routine life.

As it provides better interactivity and easy accessibility, it’ll be a game-changer for the next generation. There are already smart houses out there where every single thing in your house can talk to you and respond to your command. There is neither GUI nor content needed in voice-enabled devices, the only concerning factor is speed. You can get a faster response compared to all other technologies.

There are so many libraries and API out there, you can use to get started with your voice bot like

Microsoft Bing Speech
Google Web Speech API
Google Cloud speech
IBM speech to text
Wit.ai

We are going to use Google web speech API from speechRecognition library. It’s easy to use as it has a default API key that is hard-coded into the SpeechRecognition library.
So that you can get started using it without any configuration and authentication process. Of course like every other API it has a daily limit of 50 requests. And we can’t raise the limit by any chance. so this is the best API you can use for experiment purposes. For production or live scenarios, you’ll have to purchase paid services from the above-mentioned APIs.

There will be a three-step process for every voice-enabled device –

Speech to Text: In this phase, we are going to let our bot understand what we are talking about. We’ll provide either an audio file or a direct stream from our mic. The bot will convert this sound signal into text using our google speech recognition API.
Processing: After converting your voice into a text bot will process your text and respond the same as a text-based bot will do. The process can be either to search a song from the web or can be to set an alarm or reminder.
Text to speech: After the bot completed its processing and ready with your output stream or data, the last step is to give the user that processed response in voice form, which can be achieved using the google TextToSpeech library.

So, Let’s get started with developing your first voice-enabled bot.

Dependencies :

Google Speech recognition library
```
pip install SpeechRecognition
```
Pyaudio
```
pip install pyaudio
```
Flask
```
pip install Flask
```

Script.py


import json
import os
from flask import Flask, Response
from flask import jsonify
from flask import request, redirect
from flask_socketio import SocketIO
from flask_cors import CORS
import ss
import speech_recognition as sr
import io
from gtts import gTTS

app = Flask(__name__)
socketio = SocketIO(app)
CORS(app)


# Redirect http to https on CloudFoundry
@app.before_request
def before_request():
fwd = request.headers.get('x-forwarded-proto')
if fwd is None:
return None
elif fwd == "https":
return None
elif fwd == "http":
url = request.url.replace('http://', 'https://', 1)
code = 301
return redirect(url, code=code)


@app.route('/')
def Welcome():
return app.send_static_file('index.html')


@app.route('/api/conversation', methods=['POST', 'GET'])
def getConvResponse():
convText = request.form.get('convText')
convContext = request.form.get('context', "{}")
jsonContext = json.loads(convContext)
if convText:
response = "Did you mean, " + convText + " ?"
else:
response = "Hello There"
responseDetails = {'responseText':response,
'context':response}
return jsonify(results=responseDetails)


@app.route('/api/text-to-speech', methods=['POST'])
def getSpeechFromText():
inputText = request.form.get('text')
def generate():
if inputText:
audioOut = gTTS(text=inputText, lang='en', slow=False)
kk = audioOut.save("welcome.mp3")
f = open("welcome.mp3",'rb')
data = f.read()
else:
print("Empty response")
data = "I have no response to that."

yield data

return Response(response=generate(), mimetype="audio/x-wav")

@app.route('/api/speech-to-text', methods=['POST'])
def getTextFromSpeech():
recognizer = sr.Recognizer()
f = request.files['audio_data']
print(f,type(f))
file_obj = io.BytesIO()
file_obj.write(f.read())
file_obj.seek(0)
mic = sr.AudioFile(file_obj)
response = ss.recognize_speech_from_mic(recognizer, mic)
print('\nSuccess : {}\nError : {}\n\nText from Speech\n{}\n\n{}' \
.format(response['success'],
response['error'],
'-'*17,
response['transcription']))
return Response(response=response['transcription'], mimetype='plain/text')


port = 5000
if __name__ == "__main__":
socketio.run(app, host='0.0.0.0', port=int(port))

Ss.py

import speech_recognition as sr

def recognize_speech_from_mic(recognizer, microphone):

with microphone as source:
audio = recognizer.record(source)
response = {
"success": True,
"error": None,
"transcription": None
}

try:
response["transcription"] = recognizer.recognize_google(audio)
except sr.RequestError:
# API was unreachable or unresponsive
response["success"] = False
response["error"] = "API unavailable/unresponsive"
except sr.UnknownValueError:
# speech was unintelligible
response["error"] = "Unable to recognize speech"

return response

Run script.py file and it’ll run your server on 5000 ports. you‘ll need to call all defined functions from your front-end i.e HTML and javascript.
Let’s understand the code first.

getConvResponse: This is the function that is responsible for storing the context of the conversation and printing output to your HTML front.
getSpeechFromText: This function is responsible for converting your processed text output to voice output.
getTextFromSpeech: This one is the most important function where we are getting voice input from the web recorder and converting it to text using speechRecognition API. This data will be passed to getConvResponse to save the context and process it.

Here in this tutorial, we developed a pretty simple example of a voice bot to make you understand how voice recognition works. You can use it in your live project by adding more functionalities. Feel free to contact us for any queries and know more about the other services we provide in Voice Assistant App development.

Lets Nurture

Next Using Deep Learning for Image-Based Plant Disease Detection »

Previous « Google AdWords: Reaching Your Ideal Audience With Ad Targeting

Published by

Lets Nurture

Tags: Direct Speech RecognitionVoice Assistant App developmentVoice Assistant App development companyVoice Assistant Apps

4 years ago

Best AI Web Design Tools In 2025 To Build Your Dream Website Faster
Web design was once an element of online brand-building that you needed to know how…
AI In Finance: How Machine Learning Is Transforming Banking
Financial services and banking are being reshaped and revolutionized by what artificial intelligence has made…
Can I Add Google Maps To My Android App: A Step-By-Step Integration Guide
Yes, you can add Google Maps to your Android app, providing seamless, interactive mapping for…

Best AI Web Design Tools In 2025 To Build Your Dream Website Faster

Web design was once an element of online brand-building that you needed to know how…

2 months ago

Blog

AI In Finance: How Machine Learning Is Transforming Banking

Financial services and banking are being reshaped and revolutionized by what artificial intelligence has made…

2 months ago

Blog

Can I Add Google Maps To My Android App: A Step-By-Step Integration Guide

Yes, you can add Google Maps to your Android app, providing seamless, interactive mapping for…

2 months ago

Blog

Future-Proof Your SEO: Top Digital Marketing Trends You Need To Leverage In 2025

SEO and digital marketing are constantly evolving. It happens slowly but quickly at the same…

2 months ago

Blog

The Ethics of AI: Balancing Innovation with Responsibility

As an AI development team, we regularly encounter scenarios and questions from clients regarding AI…

2 months ago

Blog

How AI-Powered Predictive Analytics Is Shaping Business Decisions

Predictive analytics is your ultimate guide to data-backed business strategy. Data-driven insights, proactive decision-making, and…

2 months ago

Speech Recognition & response in web or mobile directly without Alexa/Google home dependency

Dependencies :

Related Post

Recent Posts

Best AI Web Design Tools In 2025 To Build Your Dream Website Faster

AI In Finance: How Machine Learning Is Transforming Banking

Can I Add Google Maps To My Android App: A Step-By-Step Integration Guide

Future-Proof Your SEO: Top Digital Marketing Trends You Need To Leverage In 2025

The Ethics of AI: Balancing Innovation with Responsibility

How AI-Powered Predictive Analytics Is Shaping Business Decisions

Categories