Difference between revisions of "Tech Reflect Voice"

From ESE205 Wiki
Jump to navigation Jump to search
m (Protected "Tech Reflect Voice" ([Edit=Allow only administrators] (indefinite) [Move=Allow only administrators] (indefinite)))
 
(34 intermediate revisions by 3 users not shown)
Line 28: Line 28:
 
21.5" Display: $65 @ Microcenter
 
21.5" Display: $65 @ Microcenter
  
Google AIY Voice Kit (Includes Speaker and Microphone): $10 @ Microcenter (DISCONTINUED?)
+
Google AIY Voice Kit (Includes Speaker and Microphone): $10 @ Microcenter (DISCONTINUED)
  
 
Mirror: $50 @ Amazon (https://www.amazon.com/gp/product/B01G4MQ966/ref=oh_aui_detailpage_o00_s00?ie=UTF8&psc=1)
 
Mirror: $50 @ Amazon (https://www.amazon.com/gp/product/B01G4MQ966/ref=oh_aui_detailpage_o00_s00?ie=UTF8&psc=1)
Line 40: Line 40:
 
LED Strip: About $25 for more than you need
 
LED Strip: About $25 for more than you need
  
3 spools Standard Breadboard-y wire (22 gauge? is that a thing?): $15
+
3 spools Standard Breadboard-y wire (22 gauge): $15
  
 
Frame: $20ish for wood and screws
 
Frame: $20ish for wood and screws
Line 56: Line 56:
 
Total Cost: $340 (ish)
 
Total Cost: $340 (ish)
  
=== Code ===
+
== Code and CAD Files ==
 
All the code for this project can be found on [https://github.com/ethanshry/smart-mirror-voice Github]
 
All the code for this project can be found on [https://github.com/ethanshry/smart-mirror-voice Github]
 +
 +
All the CAD Files for this project can be found on [https://grabcad.com/library/frame-mirror-1 Grabcad]
  
 
== Design And Solutions: Modules ==
 
== Design And Solutions: Modules ==
 
=== Product Design Process Overview ===
 
=== Product Design Process Overview ===
Step 1: Find target users
+
'''Step 1: Find and target users'''
We think our target users are people between age of 20 and 40, because these people need to use their time efficiently and would like to know most up-to-date news around them every day, so our Twitter feature will keep them informed about news of the day while they are using the mirror. Also, they can know about the weather of the day and things of the day at the same time. This is a perfect product for them.
 
  
 +
We think our target users are families or home use, and as such we can break down our information into what is most pertinent to them:
  
Step 2: Design our user case
+
1. Our users need access to the information on the internet: Twitter, Stock, Wiki Search
1. Our user need to view interesting news around them: Twitter, Stock
 
  
2. Our user need to know what they need to do during the day: Reminder
+
2. Our users need to know what they need to do during the day: Reminders
  
3. Our user need to know the environment around them: Weather
+
3. Our users need to know the environment around them: Weather
  
4.    Our user needs to have some personalized features: Timer
+
4.    Our users needs to have some personalized features: Timer, personalized hotwords, etc.
  
  
Step 3: Designing the main page
+
'''Step 2: Designing the main page'''
When we design the main page, what we think is to let new users have an idea about what we are doing. For the existed user, we make them not boring when they see us every day. The way we  make them not boring is to make our display as clear as possible.
 
  
 +
When we designed the main page, we wanted to give the consumers a brief overview of some of the features the mirror can do for them, to give them inspiration for possible uses of the mirror
  
Step 4: Target customers
+
'''Step 3: Prioritize features based on client importance'''
1. Users (people between age 20 and 40)
 
  
2. Family( to install at their home)
+
1. Solo Users: Our users will care about the utility first, which are weather, reminder, and timer. The timer will serve to control the time they make up every morning. The weather will help them decide what clothes to wear for the day. The reminder feature will remind them not forget anything during the day. For the Twitter and Stock, these features are optional, because they might not have time to see this, but it will be very attractive to users who like them, and make us competitive compare to potential competitors.
  
3.     School( for display purposes)
+
2. Families: Family users may need similar feature to our self-buyer users. However, they might need a more powerful back-end support, because there are more information need to be saved at database, like reminders- and they need the ability to have different information and different profiles for different users in the household
  
 +
'''Step 4:  Prioritize our features'''
  
Step 5:  How our target buyers prioritize their need?
+
At this point we pared down our feature list to only the most important of features- the ones that our users will rely most heavily on, and the ones most vital to client use.
  
1. Users: Our users will care about the utility first, which are weather, reminder, and timer. The timer will serve to control the time they make up every morning. The weather will help them decide what cloth to wear for the day. The reminder will mind them not forget anything during the day. For the Twitter and Stock, these features are optional, because they might not have time to see this, but it will be very attractive to users who like them, and make us competitive compare to our competitor.
 
  
2. Family: Family users may need similar feature to our self-buyer users. However, they might need a more powerful back-end support, because there are more information need to be saved at database, like reminder, and multi-task feature, like weather section may need to save weather of lots of city at the same time.
+
'''Step 5: Design Testing and Guidelines'''
  
3. School: School user may care more about our effective of display, compare to traditional TV, we are more innovative. However, they want to make sure this can really attract students, not just regard it as a mirror. To attract school client, we have to guarantee people can easily see the information on our screen.
+
Due to the nature of a smart mirror, it is very important that UI designs have high contrast, since the very nature of a two way mirror is that it blocks part of the light from coming through, and as such low-contrast designs can be hard to see. additionally, the less cluttered the design is the more functional the mirror is as a mirror.
  
 +
=== Software Overview ===
 +
Below is the overview of the Software flow for this project.
  
Step 6: Prioritize our feature
+
[[File:smVoiceCodeFlow.png|800px]]
1. Reminder
+
[[Media:smVoiceCodeFlow.png]]
2. Weather
 
3. Timer
 
4. Stock
 
5.   Twitter
 
  
 +
=== HTML & CSS to Pug ===
  
Step 7: Strategy
+
Nodejs gives us access to tools for writing html pages to be rendered on the server- we chose to use the Pug language for this purpose. Pug is lightweight, fast, and highly readable, making it easy to learn and understand. This allows us to easily and quickly write what would otherwise be very tedious html pages and render them securely on the sever with minimal impact on the client. The tutorial and syntax of Pug can be found at https://github.com/pugjs/pug.
  
1. Create a UI with black background and white text, so we can have best transparency experience for our user. Then, implement all of our 5 core features.
+
=== Speech to Text ===
 
 
2. Create a multi-color background. However, the transparency experience will not be so well. Also, for further development of our product in the future, there will be more requirement for our developers
 
 
 
3. Use white background and black text. Only implement the top 3 features, because this will save a large amount of our time.
 
 
 
 
 
Step 8: Based on our analysis, we choose the first one, which serve to attract all of our customers and maximize our feature and start to put effors in our work.
 
  
=== Our Designing Principle ===
+
[[File:SmVoiceSpeechTesting.png|1200px]]
 +
[[Media:SmVoiceSpeechTesting.png]]
  
Main Page: We make our main page have the background color of black, because our screen is behind our mirror, and we make every text and pictures white, and we think this is the best way to guarantee users see our screen clearly. Also, as the main page, we need to let users know what features can we offer. In the page, we list all of our features, like Twitter, reminder….. For our weather feature, we even list the logo for each icon of whether we think they are appropriate, like cloudy, raining.
 
 
 
Twitter: We try to make the user experience as same as Twitter, but because our screen is generally black and white only, we still make background black, and make text white. We redesigned the layout of the page and make it better fit for our screen. Actually, it is better than real twitter, because this is an ad-free version of Twitter.
 
 
 
Weather: In our weather section, we imported lots of icon, because we want to give users a real experience of what will the weather like today beside the text reminder. In order make every day’s UX consistence, we import the same set of UX, which makes our product seems better quality. Also if some users like other sets of weather icon, we can change based on their preference.
 
 
 
Reminder:The reminder page can remind people what to do every day, so the most important thing in this part is to let users really know where do they do what at which time. Our layout designed had included these parameter and make it a more user-friendly experience.
 
 
 
Timer: Our timer serve to help our user control the time they need every day when they use our mirror, for example, when the user just get up in the morning, their time might be valuable, we want them not miss their important thing(by using our reminder feature) and enjoy the feature offered by our mirror.
 
 
 
Stock: This is an optional feature for us because this only work for people who like stocks. They can view most up-to-date information about the stock by using our mirror, which may give most exciting news for them every day.
 
 
=== Software Overview ===
 
=== HTML&CSS to PUG===
 
 
When we design the leayouts for our project, we just follow our 2-step procedure. First, we write our ideas in HTML. After all members of the group agree with our design, we start to translate to Pug. Pug is a high-performance template engine heavily influenced by Haml and implemented with JavaScript for Node.js and browsers. It provides the ability to write dynamic and reusable HTML documents, its an open source HTML templating language for Node.js (server-side JavaScript). So, if we use pug, after we complete our front UX design, we will use Pug to communicate with our back-end.
 
Here is a sample translating process:
 
<!DOCTYPE html> . => doctype html
 
<html lang="en"> => html(lang="en")
 
  <head> =>  head
 
    <title>Pug</title> => title= pageTitle
 
    <script type="text/javascript"> => script(type='text/javascript').
 
      if (foo) bar(1 + 5) =>  if (foo) bar(1 + 5)
 
    </script>
 
  </head>
 
  <body> .  =>  body
 
    <h1>Pug - node template engine</h1> => h1 Pug - node template engine
 
    <div id="container" class="col"> =>  #container.col
 
      <p>You are amazing</p>  =>  p You are amazing
 
      <p>Pug is a terse and simple templating language with a strong focus on performance and powerful features.</p> =>  p.
 
        Pug is a terse and simple templating language with a
 
        strong focus on performance and powerful features.
 
    </div>
 
  </body>
 
</html>
 
 
=== Speech to Text ===
 
INSERT STT RESEARCH HERE
 
 
After sampling different speech to text libraries, it became clear that Google Cloudspeech was vastly superior than any other solution in terms of what we wanted to achieve. While it would have been nice to run something locally like CMU Sphinx, or a totally free solution, in the end Google was easiest to integrate with and had the most accurate speech to text conversion technology.
 
After sampling different speech to text libraries, it became clear that Google Cloudspeech was vastly superior than any other solution in terms of what we wanted to achieve. While it would have been nice to run something locally like CMU Sphinx, or a totally free solution, in the end Google was easiest to integrate with and had the most accurate speech to text conversion technology.
  
 
Additionally, the use of Google Cloudspeech allowed us to use the Google AIY Voice Kit, which was cheap and made integration with google's services incredibly easy with a pre-configured distro of raspbian available for use.
 
Additionally, the use of Google Cloudspeech allowed us to use the Google AIY Voice Kit, which was cheap and made integration with google's services incredibly easy with a pre-configured distro of raspbian available for use.
  
What we essentially have is a two-loop system for hotword detection and then speech recognition- see the following psuedocode indicates:
+
What we essentially have is a two-loop system for hotword detection and then speech recognition- see the following psuedocode:
  
 
  <nowiki>
 
  <nowiki>
Line 176: Line 126:
  
 
=== Command Parsing ===
 
=== Command Parsing ===
 +
A command string will come down from the server as follows:
 +
 +
    "Show me the weather for Seattle"
 +
 +
Our custom command parser will try to match that to an array of commands. Any command takes the form of an object in a javascript list, as follows:
 +
 +
    {
 +
            name: "descriptive name, only for coder use",
 +
            cmdStrings: [
 +
                "list of strings
 +
                Can make use of %?% for parameter
 +
                Can make use of %$% for continuous whitelist character sequence
 +
                Max 1 of each per command
 +
                Must have seperator for %?% and %$%. i.e. %$% timer for %?% is valid, timer %$% %?% is not valid"
 +
                ],
 +
            keywords: ["keyword list. should be unique to only this command- idea is if no cmdString matches for any command, will loop back to look for keyword in command. no %?% or %#% allowed"],
 +
            trigger: (param, activeUser) => {
 +
                // function to be called when command is triggered
 +
                };
 +
            },
 +
            viewName: "name of .pug file which should be displayed as a result of this command i.e. stockView"
 +
        }
 +
 +
Our Command parser then works as follows:
 +
 +
    for command in possibleCommandList:
 +
        for commandString in command.commandStrings:
 +
            if commandString strictly matches spokenCommand (excluding %?% or %$% templates:
 +
                //we have matched to a command, call the command trigger function with a passed in parameter (if one exists)
 +
    // no command matched, try to match a keyword
 +
    for command in possibleCommandList:
 +
        for keyword in command.keywords
 +
            if spokenCommand contains keyword
 +
                //we have matched to a command, call the command trigger function
 +
 +
For instance, our sample command would match to the following:
 +
 +
    sample spokenCommand: "Show me the weather for Seattle"
 +
    would match to comamnds as follows:
 +
    commandString: "%$% weather for %?%" - would match with parameter: Seattle
 +
    commandString "Show me the weather %$%" - would match with no parameter
 +
    keyword - "weather" - would match with no parameter
 +
 
=== Facial Recognition ===
 
=== Facial Recognition ===
 
To recognize a persons face, we used Amazon's Rekognition AWS service. Rekognition is designed to work as a complete Computer Vision (CV) API, which includes facial recognition, object finding, image comparisons, etc. We also used Amazon's S3 AWS service, which is equivalent to Google Drive, except for all of the Amazon AWS API's. The complete version of our facial recognition program works as follows:
 
To recognize a persons face, we used Amazon's Rekognition AWS service. Rekognition is designed to work as a complete Computer Vision (CV) API, which includes facial recognition, object finding, image comparisons, etc. We also used Amazon's S3 AWS service, which is equivalent to Google Drive, except for all of the Amazon AWS API's. The complete version of our facial recognition program works as follows:
Line 185: Line 178:
 
*If any of the pictures match the unknown face, then we know who is currently using the mirror
 
*If any of the pictures match the unknown face, then we know who is currently using the mirror
 
*Else, switch back to the default user (Future Update: be able to add users on the fly)
 
*Else, switch back to the default user (Future Update: be able to add users on the fly)
 
=== Front End ===
 
  
 
=== LED Ring ===
 
=== LED Ring ===
Line 205: Line 196:
 
*Not tying the signal pin of the LEDs to ground
 
*Not tying the signal pin of the LEDs to ground
 
*Not placing a small resistor on the signal pin of the Arduino connected to the LEDs
 
*Not placing a small resistor on the signal pin of the Arduino connected to the LEDs
 +
 +
For future iterations of this subsystem, the only change that we would make in this design is to change the way that the Arduino checks the serial port. In the current design, it is only able to check for data from the Raspberry Pi once every lighting cycle (i.e. one fade in-fade out cycle, one loop during start up sequence, etc), because the LED's are sequentially updated from the data line, with very specific timing required to avoid problems. The WS2812B chipset that we are using for our LED strip reads data encoded as a PWM signal, with a duty cycle of approximately 33% representing a 0, and a duty cycle of approximately 67% representing a 1. Each bit cycle takes approximately 1.25 microseconds, with a maximum of a 0.6 microsecond leeway. The LED strip expects to receive 8 bit color, which totals to 24 bits per LED in total. This totals to 30 microseconds per LED, with a maximum of a 14.4 microsecond leeway. If the data line is held low for more than 50 microseconds, however, the strip will reset itself, meaning that in order for the Arduino to check the serial port in between LED write cycles, it only has a maximum of 34.4 microseconds to do a Serial.available() call. During testing, the fastest Serial.available() call took far more than the absolute maximum allotted time, thus it became apparent that doing a check for data was not possible except for between a full write cycle to the LED strip. On average, this takes approximately 4 milliseconds, given that there are 128 LEDs on the strip.
 +
 +
During the development process, having the Arduino control the LEDs while seamlessly taking serial input from the Raspberry Pi proved difficult. The Arduino's serial buffer does not have enough space to store a complete command from the Pi. To solve this, we utilized a one byte handshake from the Pi, which when read by the Arduino on it's next read cycle, would instruct the Arduino to reply with it's own handshake and prepare to receive the new command. Although the time from when the Arduino checks the serial port to when the new light sequence should take less than 100ms, the actual times were between 1-3 seconds. This could be due to a number of things, including how often the process that communicates with the Arduino gets processor time, a potentially bad connection that results in a high data loss rate, or a coding error on either microcontroller. Because we were unable to pinpoint the cause of the extra delay, it was decided that the Arduino would check for a new command after every lighting cycle, so that the animations appeared smooth and not excessively choppy and sluggish.
 +
 +
In the next iteration of the LED subsystem, there are two possible solutions to fix the response times. The first is to implement a circuit that can store an entire command from the Raspberry Pi, and outputs the new command to the Arduino either by its' serial port, or through its' data pins. The second option is more likely to be viable, since each digitalRead() cycle takes approximately 5 microseconds, and with a maximum of 128 bytes per command, this process takes at a minimum 5 milliseconds, and should take no more than 30 milliseconds worst case. Since most commands take significantly less than 128 bytes, the average time for this process to complete should be no more time than a serial read of 128 bytes, which is what our data input algorithm assumes. The second possible solution is to completely rewrite the command structure to use less additional arguments, which means sacrificing how much of each animation can be customized for response timing. This means that an entire command can fit within the Arduino's 64 byte serial buffer, which removes the need for a handshake before data transmission can begin. The Arduino code must also be completely rewritten, in such a way that it is able to check the serial port after a certain number of complete write cycles to the LED strip.
  
 
=== Physical Design ===
 
=== Physical Design ===
 +
The mirror was designed such that the monitor would be central on the mirror, with the Raspberry Pi and Arduino mounted near the camera mount on top of the mirror, and all of other electronics, except the speaker, being mounted nearby on the top. The speaker is mounted on its mounting block on the bottom of the mirror, with extended wires going to the Google AIY Voice HAT's output screw terminals.
 +
 +
In order to accomplish this, a simple open wooden box large enough to fit the interior edge of the 3D printed frame inside is constructed. Exterior dimensions should be no larger than 26"x32", and no less than 2" shorter for either dimension. The box we constructed was 6" deep due to the wood we bought, although a box half as thick would have sufficed.
 +
 +
There are 2 wooden crossbeams inside of the wooden box to hold the monitor in its appropriate position. These are held to the box with 2 screws on either side, with the lower beam approximately 8.5" above the bottom of the box, and the second beam approximately 10" above the first beam. These measurements should be adjusted to accommodate different size monitors.
 +
 +
Finally, as a part of cable management, the power supplies and power strip were held to the left edge of the box using screws. Any other cable management is accomplished using electrical tape as a temporary measure, as we intend to improve the physical box in the future, and any permanent cable management or other mounting mechanism would make upgrading the wooden box difficult.
 +
 
=== 3D Printed Frame ===
 
=== 3D Printed Frame ===
 +
The front facing exterior frame was constructed of 18 individual 3D printed parts, using 6 different CAD files.
 +
Parts list:
 +
* (4) Corner blocks
 +
* (1) Camera mount
 +
* (1) Speaker mount
 +
* (9) 6"x4" block
 +
* (2) 5.6"x4" block
 +
* (1) Cord hole block
 +
 +
The top is constructed by connecting a corner block, 6x4 block, camera mount, another 6x4 block, and finally another corner block.
 +
The bottom is constructed the same way, except the speaker mount replaces the camera mount.
 +
The left side is constructed with 2 6x4 blocks, a 5.6x4 block, and the cord hole block.
 +
The right side is constructed with 3 6x4 blocks, and a 5.6x4 block.
 +
 +
Before assembling the 4 sides of the frame, make sure that the mirror fits snugly within the ruts, and that the mirror has the protective layers removed before final insertion.
  
 
== Results ==
 
== Results ==
 +
 +
=== Safety/Privacy Concerns ===
 +
 +
Since this mirror has access to both a camera and an always-on microphone, there is a fair amount of concern over privacy violations to the user. This is especially pertinent since these files are uploaded through the internet and are analyzed by foreign servers- simply something to be aware of. In the future we would like the ability to take the mirror offline (use CMU Sphinx and OpenCV) which would limit functionality but increase security for concerned users.
 +
 +
Additionally, this mirror is an electronic device, so people should be careful not to pour water over it. The box and plastic frame are what we would call "splash resistant", but if you were to actively try to water damage components it would be very possible.
 +
 +
=== Frame and Physical Design ===
 +
 +
The Frame was a bit of a nightmare- our connections weren't long enough and so the frame was highly unstable. With lots of epoxy and tape and rigid frame supports, we were able to get the frame into a sturdy-ish state, and it looked cool in the end. The physical box was a challenge to assemble and work with as our measurements for the wood weren't perfect, and so the frame was a bit taller than the box, the box was a bit deeper than we expected it to be, and we didn't have all the tools we needed for the job. In the end, however, the box came together well and we had plenty of room for all our components with minimal backlight issues.
 +
 +
=== LEDs ===
 +
 +
The LEDs integrated into the project very well, and for the most part worked as intended.
 +
 +
=== Speech to Text and Audio Output ===
 +
 +
Speech to text turned out well- it is challenging in a space where there are a large number of people talking close to the microphone at once, and no STT engine handles accents very well, but it is surprisingly sensitive and fantastic at tuning out background noises and music when listening. Audio out worked well as well, though pronunciation for Google Audio isn't perfect quite yet.
 +
 +
=== Facial Recognition ===
 +
 +
Facial recognition, when it works, works great. Ours broke due to fraudulent charges to our account and our account being suspended, but not much we could do about that
 +
 +
=== General Software ===
 +
 +
The software in general works really well. Its a little finicky to ensure all the servers boot up in the correct order, but in general the websocket communication keeps everything running quickly, smoothly, and overall leads to a very satisfying result.
 +
 +
=== Commands ===
 +
 +
In the end we were able to implement even more commands than we planned- including commands that make use of Web Scraping techniques, APIs with secure keys, simple javascript functions, and local filesystem storage. Additionally, we designed the command engine to require about 40 lines of code in total between a command object and a UI to add additional commands to the mirror.
  
 
== SubPart: Future Improvements/ How would we improve on this Project ==
 
== SubPart: Future Improvements/ How would we improve on this Project ==
Cost Saving:
+
'''Cost Saving:'''
 +
 
 
Would have been great to cut costs on this project- one major cost saving measure would have been to use a piece of glass and a reflective coating instead of a actual two way mirror. This is more technically challenging but is about half as expensive per square foot based on brief research.
 
Would have been great to cut costs on this project- one major cost saving measure would have been to use a piece of glass and a reflective coating instead of a actual two way mirror. This is more technically challenging but is about half as expensive per square foot based on brief research.
 
Also looking harder for a cheaper display would have been a great way to cut costs.
 
Also looking harder for a cheaper display would have been a great way to cut costs.
  
Physical Design:
+
'''Physical Design:'''
 +
 
 
Spending more time on the physical design of the mirror would have been nice. The 3D printed frame is a great idea for our context (testing out many differently technologies and seeing what we can make work), but it would have been much easier to simply spend more time designing a better, lighter frame/box for our mirror and components, as opposed to spending hours CADing and printing parts.
 
Spending more time on the physical design of the mirror would have been nice. The 3D printed frame is a great idea for our context (testing out many differently technologies and seeing what we can make work), but it would have been much easier to simply spend more time designing a better, lighter frame/box for our mirror and components, as opposed to spending hours CADing and printing parts.
  
 
Also would love to get a better mic/speaker onto this mirror- we got ours as part of the AIY Voice HAT Kit, which was cheap and easy, but a better speaker would definitely improve the project's functionality.
 
Also would love to get a better mic/speaker onto this mirror- we got ours as part of the AIY Voice HAT Kit, which was cheap and easy, but a better speaker would definitely improve the project's functionality.
  
Software:
+
'''Software:'''
  
 +
The main software improvements I would make are around the STT and Facial Recognition engines- I would love to move them offline to something like CMU Sphinx and OpenCV, simply for reliability without internet access and for the privacy of the user. I would also like to make commands each have their own folder to make it even easier to add new commands to the system without having to modify the core mirror files, and improve reliability of bootup.
 
----
 
----
 
Tech Reflect Voice Log:
 
Tech Reflect Voice Log:

Latest revision as of 12:14, 5 May 2018

Project Overview

The best creations come not from reinventing the wheel, but from integrating existing technologies in new and interesting ways. This is why when we saw the original Tech Reflect project, we realized that there was so much opportunity to improve upon it. With the increasing prevalence of IOT devices, home assistants like Google Home and Amazon Alexa, open-source and free to use APIs, and the decreasing cost of display technology, it has become possible to cheaply and easily create a piece of physical hardware for the home which can utilize the strengths of home assistants and cheap display technology, while minimizing their obtrusiveness on your life.

Group Members

  • Ethan Shry
  • Tony Sancho-Spore
  • Baihao Xu (Kevin)
  • Ellen Dai (TA)

Project Proposal

https://docs.google.com/presentation/d/1UtUwZfxM7SI90nvJo0Fx1iLGG8HB7N2jdl1ApcL0bRY/edit?usp=sharing

Objectives

We hope to construct a proof-of-concept bathroom mirror which responds to user feedback. The mirror and GUI should be somewhat aesthetically pleasing. It should listen to the user and be able to convert what they ask for into a visual response displayed on-screen. We hope to show that there is some novelty or value in integrating technology into everyday items like mirrors.

Challenges

Due to the tools available to us, ensuring that the hardware is talking to the Python listener is talking to the GUI server will be somewhat logistically challenging, especially as we will be running code in several different programming languages.

The selection of which command to take when analyzing user speech will also be a nightmare should we decide to allow multiple different trigger commands. A simple solution would only look for exact string matches, but a more robust solution will require looking into.

Our current plan for the mirror is to 3D print the frame, which due to the lack of large printers available to us needs to be done in many pieces (>10), which will be potentially infeasible.

Additionally Kevin will need to become comfortable in Pug templating language and NodeJS.

Gantt Chart

GanttChartTRVoiceSpring2018.PNG Media:GanttChartTRVoiceSpring2018.PNG

Budget

21.5" Display: $65 @ Microcenter

Google AIY Voice Kit (Includes Speaker and Microphone): $10 @ Microcenter (DISCONTINUED)

Mirror: $50 @ Amazon (https://www.amazon.com/gp/product/B01G4MQ966/ref=oh_aui_detailpage_o00_s00?ie=UTF8&psc=1)

Raspberry Pi 3: $35

Raspberry Pi NoIR Camera: $25

Arduino Mini: $10

LED Strip: About $25 for more than you need

3 spools Standard Breadboard-y wire (22 gauge): $15

Frame: $20ish for wood and screws

Paint: $5

Power Strip: $10

3D Printed Frame: $35 (approximately 1kg of PLA)

2 Power Supplies/MicroUSB Cables: $25

Epoxy: $10

Total Cost: $340 (ish)

Code and CAD Files

All the code for this project can be found on Github

All the CAD Files for this project can be found on Grabcad

Design And Solutions: Modules

Product Design Process Overview

Step 1: Find and target users

We think our target users are families or home use, and as such we can break down our information into what is most pertinent to them:

1. Our users need access to the information on the internet: Twitter, Stock, Wiki Search

2. Our users need to know what they need to do during the day: Reminders

3. Our users need to know the environment around them: Weather

4. Our users needs to have some personalized features: Timer, personalized hotwords, etc.


Step 2: Designing the main page

When we designed the main page, we wanted to give the consumers a brief overview of some of the features the mirror can do for them, to give them inspiration for possible uses of the mirror

Step 3: Prioritize features based on client importance

1. Solo Users: Our users will care about the utility first, which are weather, reminder, and timer. The timer will serve to control the time they make up every morning. The weather will help them decide what clothes to wear for the day. The reminder feature will remind them not forget anything during the day. For the Twitter and Stock, these features are optional, because they might not have time to see this, but it will be very attractive to users who like them, and make us competitive compare to potential competitors.

2. Families: Family users may need similar feature to our self-buyer users. However, they might need a more powerful back-end support, because there are more information need to be saved at database, like reminders- and they need the ability to have different information and different profiles for different users in the household

Step 4: Prioritize our features

At this point we pared down our feature list to only the most important of features- the ones that our users will rely most heavily on, and the ones most vital to client use.


Step 5: Design Testing and Guidelines

Due to the nature of a smart mirror, it is very important that UI designs have high contrast, since the very nature of a two way mirror is that it blocks part of the light from coming through, and as such low-contrast designs can be hard to see. additionally, the less cluttered the design is the more functional the mirror is as a mirror.

Software Overview

Below is the overview of the Software flow for this project.

SmVoiceCodeFlow.png Media:smVoiceCodeFlow.png

HTML & CSS to Pug

Nodejs gives us access to tools for writing html pages to be rendered on the server- we chose to use the Pug language for this purpose. Pug is lightweight, fast, and highly readable, making it easy to learn and understand. This allows us to easily and quickly write what would otherwise be very tedious html pages and render them securely on the sever with minimal impact on the client. The tutorial and syntax of Pug can be found at https://github.com/pugjs/pug.

Speech to Text

SmVoiceSpeechTesting.png Media:SmVoiceSpeechTesting.png

After sampling different speech to text libraries, it became clear that Google Cloudspeech was vastly superior than any other solution in terms of what we wanted to achieve. While it would have been nice to run something locally like CMU Sphinx, or a totally free solution, in the end Google was easiest to integrate with and had the most accurate speech to text conversion technology.

Additionally, the use of Google Cloudspeech allowed us to use the Google AIY Voice Kit, which was cheap and made integration with google's services incredibly easy with a pre-configured distro of raspbian available for use.

What we essentially have is a two-loop system for hotword detection and then speech recognition- see the following psuedocode:

while shouldBeListening:
    shouldBeListening = checkIfShouldListen()
    text = listen()
    if hotword in text:
        commandText = listen()
        sendToNodeServer(commandText)

Command Parsing

A command string will come down from the server as follows:

   "Show me the weather for Seattle"

Our custom command parser will try to match that to an array of commands. Any command takes the form of an object in a javascript list, as follows:

   {
           name: "descriptive name, only for coder use",
           cmdStrings: [
               "list of strings
               Can make use of %?% for parameter
               Can make use of %$% for continuous whitelist character sequence
               Max 1 of each per command
               Must have seperator for %?% and %$%. i.e. %$% timer for %?% is valid, timer %$% %?% is not valid"
               ],
           keywords: ["keyword list. should be unique to only this command- idea is if no cmdString matches for any command, will loop back to look for keyword in command. no %?% or %#% allowed"],
           trigger: (param, activeUser) => {
               // function to be called when command is triggered
               };
           },
           viewName: "name of .pug file which should be displayed as a result of this command i.e. stockView"
       }

Our Command parser then works as follows:

   for command in possibleCommandList:
       for commandString in command.commandStrings:
           if commandString strictly matches spokenCommand (excluding %?% or %$% templates:
               //we have matched to a command, call the command trigger function with a passed in parameter (if one exists)
   // no command matched, try to match a keyword
   for command in possibleCommandList:
       for keyword in command.keywords
           if spokenCommand contains keyword
               //we have matched to a command, call the command trigger function

For instance, our sample command would match to the following:

   sample spokenCommand: "Show me the weather for Seattle"
   would match to comamnds as follows:
   commandString: "%$% weather for %?%" - would match with parameter: Seattle
   commandString "Show me the weather %$%" - would match with no parameter
   keyword - "weather" - would match with no parameter

Facial Recognition

To recognize a persons face, we used Amazon's Rekognition AWS service. Rekognition is designed to work as a complete Computer Vision (CV) API, which includes facial recognition, object finding, image comparisons, etc. We also used Amazon's S3 AWS service, which is equivalent to Google Drive, except for all of the Amazon AWS API's. The complete version of our facial recognition program works as follows:

  • Wait for the "Switch User" command
  • Take a photo using the Raspi camera
  • Upload the picture to the S3 bucket as "UnknownFace.jpg"
  • Compare the picture to every other picture in the S3 bucket ("TonyFace.jpg", "EthanFace.jpg"...)
  • If any of the pictures match the unknown face, then we know who is currently using the mirror
  • Else, switch back to the default user (Future Update: be able to add users on the fly)

LED Ring

The LED Ring is a ring of 128 RGB LEDs controlled by an Arduino Uno. The Uno is connected to the Raspi 3 via UART serial connection, and to the LED strip via 1 signal pin. When the Arduino receives a mode number from the raspi corresponding to some action/command from the user, it updates the patterns for the LEDs.

Some of the modes are:

  • Steady on 1 color
  • Fade in and out 1 color
  • "Chasing" start up sequence
  • Wrap up from bottom to top once when hotword is detected
  • All off

The particular LED strip we're using is finicky, requiring about 3 hours and 1 blown up Arduino to get working properly with the Raspi. In order to not suffer the same fate we had, here are some of the issues we had/issues that we foresaw before they could happen:

  • Not sharing common ground. This was a tricky one to debug (Shoutout to Professor Feher for figuring this one out!!), as it wasn't obvious to us that the ground voltage of the Arduino and the LEDs could be different granted that they were both plugged into the same power strip.
  • Not placing a suitably sized capacitor on the LED strip's power leads.
  • Not placing an isolator between the Arduino and the LED strip (blew up an Arduino Leonardo due to this one)
  • Not tying the signal pin of the LEDs to ground
  • Not placing a small resistor on the signal pin of the Arduino connected to the LEDs

For future iterations of this subsystem, the only change that we would make in this design is to change the way that the Arduino checks the serial port. In the current design, it is only able to check for data from the Raspberry Pi once every lighting cycle (i.e. one fade in-fade out cycle, one loop during start up sequence, etc), because the LED's are sequentially updated from the data line, with very specific timing required to avoid problems. The WS2812B chipset that we are using for our LED strip reads data encoded as a PWM signal, with a duty cycle of approximately 33% representing a 0, and a duty cycle of approximately 67% representing a 1. Each bit cycle takes approximately 1.25 microseconds, with a maximum of a 0.6 microsecond leeway. The LED strip expects to receive 8 bit color, which totals to 24 bits per LED in total. This totals to 30 microseconds per LED, with a maximum of a 14.4 microsecond leeway. If the data line is held low for more than 50 microseconds, however, the strip will reset itself, meaning that in order for the Arduino to check the serial port in between LED write cycles, it only has a maximum of 34.4 microseconds to do a Serial.available() call. During testing, the fastest Serial.available() call took far more than the absolute maximum allotted time, thus it became apparent that doing a check for data was not possible except for between a full write cycle to the LED strip. On average, this takes approximately 4 milliseconds, given that there are 128 LEDs on the strip.

During the development process, having the Arduino control the LEDs while seamlessly taking serial input from the Raspberry Pi proved difficult. The Arduino's serial buffer does not have enough space to store a complete command from the Pi. To solve this, we utilized a one byte handshake from the Pi, which when read by the Arduino on it's next read cycle, would instruct the Arduino to reply with it's own handshake and prepare to receive the new command. Although the time from when the Arduino checks the serial port to when the new light sequence should take less than 100ms, the actual times were between 1-3 seconds. This could be due to a number of things, including how often the process that communicates with the Arduino gets processor time, a potentially bad connection that results in a high data loss rate, or a coding error on either microcontroller. Because we were unable to pinpoint the cause of the extra delay, it was decided that the Arduino would check for a new command after every lighting cycle, so that the animations appeared smooth and not excessively choppy and sluggish.

In the next iteration of the LED subsystem, there are two possible solutions to fix the response times. The first is to implement a circuit that can store an entire command from the Raspberry Pi, and outputs the new command to the Arduino either by its' serial port, or through its' data pins. The second option is more likely to be viable, since each digitalRead() cycle takes approximately 5 microseconds, and with a maximum of 128 bytes per command, this process takes at a minimum 5 milliseconds, and should take no more than 30 milliseconds worst case. Since most commands take significantly less than 128 bytes, the average time for this process to complete should be no more time than a serial read of 128 bytes, which is what our data input algorithm assumes. The second possible solution is to completely rewrite the command structure to use less additional arguments, which means sacrificing how much of each animation can be customized for response timing. This means that an entire command can fit within the Arduino's 64 byte serial buffer, which removes the need for a handshake before data transmission can begin. The Arduino code must also be completely rewritten, in such a way that it is able to check the serial port after a certain number of complete write cycles to the LED strip.

Physical Design

The mirror was designed such that the monitor would be central on the mirror, with the Raspberry Pi and Arduino mounted near the camera mount on top of the mirror, and all of other electronics, except the speaker, being mounted nearby on the top. The speaker is mounted on its mounting block on the bottom of the mirror, with extended wires going to the Google AIY Voice HAT's output screw terminals.

In order to accomplish this, a simple open wooden box large enough to fit the interior edge of the 3D printed frame inside is constructed. Exterior dimensions should be no larger than 26"x32", and no less than 2" shorter for either dimension. The box we constructed was 6" deep due to the wood we bought, although a box half as thick would have sufficed.

There are 2 wooden crossbeams inside of the wooden box to hold the monitor in its appropriate position. These are held to the box with 2 screws on either side, with the lower beam approximately 8.5" above the bottom of the box, and the second beam approximately 10" above the first beam. These measurements should be adjusted to accommodate different size monitors.

Finally, as a part of cable management, the power supplies and power strip were held to the left edge of the box using screws. Any other cable management is accomplished using electrical tape as a temporary measure, as we intend to improve the physical box in the future, and any permanent cable management or other mounting mechanism would make upgrading the wooden box difficult.

3D Printed Frame

The front facing exterior frame was constructed of 18 individual 3D printed parts, using 6 different CAD files. Parts list:

  • (4) Corner blocks
  • (1) Camera mount
  • (1) Speaker mount
  • (9) 6"x4" block
  • (2) 5.6"x4" block
  • (1) Cord hole block

The top is constructed by connecting a corner block, 6x4 block, camera mount, another 6x4 block, and finally another corner block. The bottom is constructed the same way, except the speaker mount replaces the camera mount. The left side is constructed with 2 6x4 blocks, a 5.6x4 block, and the cord hole block. The right side is constructed with 3 6x4 blocks, and a 5.6x4 block.

Before assembling the 4 sides of the frame, make sure that the mirror fits snugly within the ruts, and that the mirror has the protective layers removed before final insertion.

Results

Safety/Privacy Concerns

Since this mirror has access to both a camera and an always-on microphone, there is a fair amount of concern over privacy violations to the user. This is especially pertinent since these files are uploaded through the internet and are analyzed by foreign servers- simply something to be aware of. In the future we would like the ability to take the mirror offline (use CMU Sphinx and OpenCV) which would limit functionality but increase security for concerned users.

Additionally, this mirror is an electronic device, so people should be careful not to pour water over it. The box and plastic frame are what we would call "splash resistant", but if you were to actively try to water damage components it would be very possible.

Frame and Physical Design

The Frame was a bit of a nightmare- our connections weren't long enough and so the frame was highly unstable. With lots of epoxy and tape and rigid frame supports, we were able to get the frame into a sturdy-ish state, and it looked cool in the end. The physical box was a challenge to assemble and work with as our measurements for the wood weren't perfect, and so the frame was a bit taller than the box, the box was a bit deeper than we expected it to be, and we didn't have all the tools we needed for the job. In the end, however, the box came together well and we had plenty of room for all our components with minimal backlight issues.

LEDs

The LEDs integrated into the project very well, and for the most part worked as intended.

Speech to Text and Audio Output

Speech to text turned out well- it is challenging in a space where there are a large number of people talking close to the microphone at once, and no STT engine handles accents very well, but it is surprisingly sensitive and fantastic at tuning out background noises and music when listening. Audio out worked well as well, though pronunciation for Google Audio isn't perfect quite yet.

Facial Recognition

Facial recognition, when it works, works great. Ours broke due to fraudulent charges to our account and our account being suspended, but not much we could do about that

General Software

The software in general works really well. Its a little finicky to ensure all the servers boot up in the correct order, but in general the websocket communication keeps everything running quickly, smoothly, and overall leads to a very satisfying result.

Commands

In the end we were able to implement even more commands than we planned- including commands that make use of Web Scraping techniques, APIs with secure keys, simple javascript functions, and local filesystem storage. Additionally, we designed the command engine to require about 40 lines of code in total between a command object and a UI to add additional commands to the mirror.

SubPart: Future Improvements/ How would we improve on this Project

Cost Saving:

Would have been great to cut costs on this project- one major cost saving measure would have been to use a piece of glass and a reflective coating instead of a actual two way mirror. This is more technically challenging but is about half as expensive per square foot based on brief research. Also looking harder for a cheaper display would have been a great way to cut costs.

Physical Design:

Spending more time on the physical design of the mirror would have been nice. The 3D printed frame is a great idea for our context (testing out many differently technologies and seeing what we can make work), but it would have been much easier to simply spend more time designing a better, lighter frame/box for our mirror and components, as opposed to spending hours CADing and printing parts.

Also would love to get a better mic/speaker onto this mirror- we got ours as part of the AIY Voice HAT Kit, which was cheap and easy, but a better speaker would definitely improve the project's functionality.

Software:

The main software improvements I would make are around the STT and Facial Recognition engines- I would love to move them offline to something like CMU Sphinx and OpenCV, simply for reliability without internet access and for the privacy of the user. I would also like to make commands each have their own folder to make it even easier to add new commands to the system without having to modify the core mirror files, and improve reliability of bootup.


Tech Reflect Voice Log: https://classes.engineering.wustl.edu/ese205/core/index.php?title=Tech_Reflect_Voice_Log