FingerSpark Weekly Log
From ESE205 Wiki
Week of 2/8
- Chose name for Project
- Completed Project Proposal
- Identified essential components of product and incorporated these components into the budget
- Finalized Gantt Chart
- Created Wiki Page for Project
Week of 2/15
- Updated Gantt Chart
- Researched prices of items on budget: The results are uploaded in the budget section of this project.
- Discussed the basis of what algorithms we wanted to implement: There may be premade algorithms to find the centroid of color blobs, but if we can't implement one, we could instead test each pixel in the image for whether it falls within our thresholds of color, then take the average of the coordinates of all of these points (which should put us somewhere close to the center of the spot of color).
- Researched camera specs: We discovered that the PiCamera allows you to take video at 1080p at 30FPS and 720p at 30FPS, as well as still images, and has a native resolution of 5 megapixels. We do not know if this is enough resolution yet, however.
- We researched common methods of color-finding, and learned that most algorithms that attempt to identify colors in a human-like way use HSV encoding rather than RGB encoding. Accordingly, we now plan to do all of our analysis and thresholding in HSV.
Week of 2/22
- Updated Project Wiki
- Updated Project Budget
- Researched OpenCV: It turns out that OpenCV has many libraries that contain operations we may need, but 1) the documentation is often incomplete or above our level and 2) many functions are implemented tens of times with minor differences in semantics and nuance. We settled on a sublibrary called ImgProc, which seems to contain the blob-detection and color-scanning algorithms we were planning to use.
- As of last week, we did not know whether the camera had high enough resolution to achieve the tasks we had laid out as critical for this project. This week, we tested a laptop camera with similar specs to the PiCamera to determine at what distance the latter will be able to clearly distinguish different fingers. The result was that at three meters away from the camera, the fingers of a hand are clearly distinguishable to the human eye.
Week of 2/29
- Continued researched into possible OpenCV sublibraries to use for color point detection (especially further testing with imgproc)
- Loaded OpenCV onto Raspberry Pi (using Python 2, rather than Python 3 as planned; this took most of our time this week unfortunately)
- Configured OpenCV to be accessible to Python programs on Raspberry Pi (it took a while to realize that it imports as "cv2", not "opencv")
Week of 3/7
- This week was midterms so not very much happened on this project, unfortunately. However, we discussed algorithms further and found several ways to skip processing time (for example, checking only every 5th pixel for thresholding, then zooming in on areas with the correct colors to scan further) to avoid having to check millions of points.
Week of 3/14
- Since we didn't work very much last week due to midterm exams, David took home the Raspberry Pi over spring break and continued to work on getting useful images from the camera. We are able to take video and photos and save them to the desktop or load them into our program; however, the still images have significant blurring even with slow hand motion speeds (not a problem we had anticipated, but one that might be solved by reducing the image's exposure time). The videos are also in the .h264 format, which unfortunately cannot be read by the RaspberryPi or OpenCV. David downloaded a shell script (runnable from python) to convert the movies to .mp4 format, but it still takes too long to be practical at the moment. We have code to send the live video feed from the camera into OpenCV as a "stream" object, but we do not yet understand this code well enough to implement or modify it. That's one of our goals for next week.
Week of 3/21
- We started implementing the information we'd researched a few weeks ago in our explorations of OpenCV. In particular, at the moment we have somewhat abandoned blob-recognition (too complex and slow for what we need) and instead we are adapting a certain approach to color-finding. A common method of highlighting just a single color range in an image is to generate a "mask" image (1 where a pixel is in the color range, zero otherwise) and use a bitwise-and to overlay that onto the original, leaving only the colored pixels with a non-zero value. However, David realized that we could instead use a bitwise-OR to overlay the masks onto each other, and thus get a black-and-white "smear" image showing everywhere that the colored dot (the glove finger) has been throughout the gesture. This image could be directly compared to a set of templates we can create in the Pi's memory, since image-comparison is a well-explored problem in computer science (and you'd only have to do it once, at the end of the gesture, as opposed to the blob-finding algorithm that would have to run for each frame). Neither of us knows much about how to implement image-shape comparison, however, so we spent time looking up research papers on the subject (which Kjartan also helped us with). We now possess several possible solutions: Mean Squared Error (pixel-by-pixel comparison; most internet resources consider this to be inaccurate but fast), the Structural Similarity Index (developed by these researchers), a machine learning-based approach suggested by Alden (ideal for this purpose, but is significantly beyond our level of knowledge to implement), Keypoint Matching (suggested by Kjartan), and another paper found by Kjartan (whose authors I don't remember off the top of my head). We have not implemented any of these five yet, however, which is our goal for next week. Notably, this approach also somewhat helps with our gesture recognition and scaling difficulties (the other two unsolved problems Prof. Gonzalez gave us in our meeting), since that's incorporated (in theory) within the similarity indices, and we don't have to know the real-life size of the gestures to do this. Connor was also able to make significant breakthroughs on streaming video from the camera module live onto the monitor. However, the stream was in a format that has proved difficult to process or manipulate, especially when it comes to extracting and analyzing individual frames. Connor accessed the livestream via a Python script called as a command line argument from a Python program he wrote, which made the stream challenging to modify as it wasn't technically recognized as an object in Python. The livestream obscured the entire screen, and could be aborted. After trying multiple methods of pulling frames from the livestream without success, Connor moved on to working on pulling frames from saved video files.
Week of 3/28
- Both of us made major amounts of progress this week. Connor researched a number of different methods of processing frames from the video files, including using methods from picamera (a library specifically optimized to work with the camera module for the Raspberry Pi). He spend time attempting to solve this issue by screenshotting the video preview, but with no success. He also experimented with circular streams using picamera, but wasn't able get the camera to successfully livestream to the circular stream, and eventually came to the conclusion that recording a video using the VideoCapture method from OpenCV, accessing each of the video's individual frames using the VideoCapture object's read method, and processing the video's frames after the recording had been completed would be our best demonstration option for the upcoming project evaluation. Connor began researching a program that would iterate through a .h264 or .mp4 video file frame by frame, thereby allowing us to identify and collect data on the points in each image within our HSV color range for each color. David wrote an algorithm to implement the masking approach that was identified last week, which works with a great degree of accuracy on saved images (both color tests downloaded off of the internet and images captured through the camera). However, at the moment the algorithm can only open .png files (which is acceptable, because the camera captures images in that format, but rather confusing).
Week of 4/4
- Connor updated the Gantt Chart, Project Objective (modified desired final product to include gesture recognition), and Project Overview (modified description of demonstration from painting program to gesture recognition) to reflect the current direction of the project.
- David significantly refined his algorithm from last week - in particular, it is now able to access individual frames from a video file, as well as spending time refining the color bounds we had previously determined. David also successfully converted the images into HSV encoding without significant loss in processing speed, and is now using bounds in that color space (much easier to work with for our application).
- David and Connor completed a program that has the capacity to either record a new video or analyze a video file, and then iterate through that video file frame by frame applying David's masking algorithm. The results of applying the mask to each image are visible on the screen as the program iterates through the recording and generates a composite image of the masked frames. David was able to successfully test our program with 1) a red dot on a piece of paper and 2) blue masking tape on his finger to a high degree of accuracy. We plan to test our program more extensively in the coming days. During our weekly meeting, David, Connor, and Kjartan also discussed the possibility of using a least-squares approach to solving the gesture matching problem. We also learned how to access the color of specific pixels in the image (we had not previous recognized that images were simply saved as numpy arrays). We will begin working to implement this proposed technique in the near future, and hope to discuss this further during our meeting with Professor Gonzalez.
Week of 4/11
- Following our meeting with Professor Gonzalez, Connor used this week to focus on researching the Hooke-Jeeves algorithm. After learning the logic and steps of the algorithm, he decided that Professor Gonzalez's idea was correct: we could apply the (relatively simple) logic of the Hooke-Jeeves method in our program to significantly reduce the time our program took to execute. Connor discovered someone's implementation of Hooke-Jeeves in Python on GitHub. He downloaded the program text, compiled it, and ran tests. The program executed successfully, and was intially built to take the Rosenbrock function as a parameter.
Week of 4/18
- Connor continued to work on implementing Hooke-Jeeves and adapting the algorithm to image comparison. After plugging in our image comparison function in place of the Rosenbrock function and attempting to adjust our program to fit the variable standards of hooke-jeeves.py, he concluded that it would be more efficient to adapt the methods of the Hooke-Jeeves Python program to suit the purposes of our program (instead of vice-versa). David and Connor wrote our own version of the Hooke-Jeeves algorithm which took an image parameter, a template parameter, and a starting point parameter. We tested this function, and eventually moved the logic of our algorithm to the main body of our program, where it worked to a limited degree. Initially, the algorithm we wrote only worked to a limited degree because we were unsure which starting point and delta value to pass, but through trial and error (mostly array out-of-bounds errors) we ended up selecting a start point and delta value that worked well.
- We also attempted to paint the gloves we ordered on Amazon, but soon realized that the fabric spray paint wasn't sticking to the rubber/mesh fingertips of the gloves. We bought and painted new white fabric gloves, and these tested incredibly well.
- We continued to work on comparing the composite mask to the gesture templates. Hooke-Jeeves reduced the time that our program took to execute down from nearly half an hour to a few minutes. By modifying our delta and omega (number of total iterations per image-template comparison), we were able to reduce our total runtime to 53 seconds while retaining accurate results.
Week of 4/25
- Using the resources of the machine shop, we built a stand for the Raspberry Pi B+ that could attach to the tripod. We also painted the final version of our glove.
- We consolidated general information about our project, the objective of our project, the challenges we faced, potential future applications of our project, and our methodology in processing and detecting user gestures into a poster.
- We further adjusted the parameters of our Hooke-Jeeves algorithm, testing the program with a wide variety of different delta and range values. We found that the optimal set of values for the initial offset value, the maximum offset, and the delta was [15,75,15] (all in pixels) starting from a point in the center of the frame.
- We tested our product in Lopata Gallery. Immediately, we found that the HSV bounds that we had successfully tested for multiple colors in the CAD Lab were not effective under the greenish lighting of Lopata Gallery. We first tested our product in front of the green wall, which tinted our video feed green and made detecting blue and red in the video nearly impossible. Recognizing this, we began testing our product in front of the blue wall with our original HSV bounds for the color red. This strategy proved to be effective, and we then began testing blue HSV ranges with the user's gesture against our white backdrop. After a few hours of testing and making adjustments, we were able to successfully mask red and blue color values in Lopata Gallery.