Difference between revisions of "Tech Reflect Voice"

From ESE205 Wiki
Jump to navigation Jump to search
(STT update)
Line 52: Line 52:
 
=== Software Overview ===
 
=== Software Overview ===
 
=== Speech to Text ===
 
=== Speech to Text ===
 +
INSERT STT RESEARCH HERE
 +
After sampling different speech to text libraries, it became clear that Google Cloudspeech was vastly superior than any other solution in terms of what we wanted to achieve. While it would have been nice to run something locally like CMU Sphinx, or a totally free solution, in the end Google was easiest to integrate with and had the most accurate speech to text conversion technology.
 +
 +
Additionally, the use of Google Cloudspeech allowed us to use the Google AIY Voice Kit, which was cheap and made integration with google's services incredibly easy with a pre-configured distro of raspbian available for use.
 +
 +
What we essentially have is a two-loop system for hotword detection and then speech recognition- see the following psuedocode indicates:
 +
 +
<nowiki>
 +
while shouldBeListening:
 +
    shouldBeListening = checkIfShouldListen()
 +
    text = listen()
 +
    if hotword in text:
 +
        commandText = listen()
 +
        sendToNodeServer(commandText)
 +
</nowiki>
 +
 
=== Command Parsing ===
 
=== Command Parsing ===
 
=== Facial Recognition ===
 
=== Facial Recognition ===

Revision as of 19:21, 13 April 2018

Project Overview

The best creations come not from reinventing the wheel, but from integrating existing technologies in new and interesting ways. This is why when we saw the original Tech Reflect project, we realized that there was so much opportunity to improve upon it. With the increasing prevalence of IOT devices, home assistants like Google Home and Amazon Alexa, open-source and free to use APIs, and the decreasing cost of display technology, it has become possible to cheaply and easily create a piece of physical hardware for the home which can utilize the strengths of home assistants and cheap display technology, while minimizing their obtrusiveness on your life.

Group Members

  • Ethan Shry
  • Tony Sancho-Spore
  • Baihao Xu (Kevin)
  • Ellen Dai (TA)

Project Proposal

https://docs.google.com/presentation/d/1UtUwZfxM7SI90nvJo0Fx1iLGG8HB7N2jdl1ApcL0bRY/edit?usp=sharing

Objectives

We hope to construct a proof-of-concept bathroom mirror which responds to user feedback. The mirror and GUI should be somewhat aesthetically pleasing. It should listen to the user and be able to convert what they ask for into a visual response displayed on-screen. We hope to show that there is some novelty or value in integrating technology into everyday items like mirrors.

Challenges

Due to the tools available to us, ensuring that the hardware is talking to the Python listener is talking to the GUI server will be somewhat logistically challenging, especially as we will be running code in several different programming languages.

The selection of which command to take when analyzing user speech will also be a nightmare should we decide to allow multiple different trigger commands. A simple solution would only look for exact string matches, but a more robust solution will require looking into.

Our current plan for the mirror is to 3D print the frame, which due to the lack of large printers available to us needs to be done in many pieces (>10), which will be potentially infeasible.

Additionally Kevin will need to become comfortable in Pug templating language and NodeJS.

Gantt Chart

GanttChartTRVoiceSpring2018.PNG Media:GanttChartTRVoiceSpring2018.PNG

Budget

21.5" Display: $64.99 @ Microcenter

Google AIY Voice Kit (Includes Speaker and Microphone): $9.99 @ Microcenter

Mirror: 49.99 @ Amazon (https://www.amazon.com/gp/product/B01G4MQ966/ref=oh_aui_detailpage_o00_s00?ie=UTF8&psc=1)

Raspberry Pi 3: $35

Raspberry Pi NoIR Camera: $25

Arduino Mini: $10

LED Strip: Free, we already had them

3 spools Standard Breadboard-y wire (22 gauge? is that a thing?): $15

Frame: $20ish for wood and screws

Other Supplies:???

Code

All the code for this project can be found on Github

Design And Solutions: Modules

Software Overview

Speech to Text

INSERT STT RESEARCH HERE After sampling different speech to text libraries, it became clear that Google Cloudspeech was vastly superior than any other solution in terms of what we wanted to achieve. While it would have been nice to run something locally like CMU Sphinx, or a totally free solution, in the end Google was easiest to integrate with and had the most accurate speech to text conversion technology.

Additionally, the use of Google Cloudspeech allowed us to use the Google AIY Voice Kit, which was cheap and made integration with google's services incredibly easy with a pre-configured distro of raspbian available for use.

What we essentially have is a two-loop system for hotword detection and then speech recognition- see the following psuedocode indicates:

while shouldBeListening:
    shouldBeListening = checkIfShouldListen()
    text = listen()
    if hotword in text:
        commandText = listen()
        sendToNodeServer(commandText)

Command Parsing

Facial Recognition

LED Ring

Physical Design

3D Printed Frame

Results

= SubPart: Future Improvements/ How would we improve on this Project

Cost Saving: Would have been great to cut costs on this project- one major cost saving measure would have been to use a piece of glass and a reflective coating instead of a actual two way mirror. This is more technically challenging but is about half as expensive per square foot based on brief research. Also looking harder for a cheaper display would have been a great way to cut costs.

Physical Design: Spending more time on the physical design of the mirror would have been nice. The 3D printed frame is a great idea for our context (testing out many differently technologies and seeing what we can make work), but it would have been much easier to simply spend more time designing a better, lighter frame/box for our mirror and components, as opposed to spending hours CADing and printing parts.

Also would love to get a better mic/speaker onto this mirror- we got ours as part of the AIY Voice HAT Kit, which was cheap and easy, but a better speaker would definitely improve the project's functionality. Software:


Tech Reflect Voice Log: https://classes.engineering.wustl.edu/ese205/core/index.php?title=Tech_Reflect_Voice_Log