FingerSpark Weekly Log
From ESE205 Wiki
Week of 2/8
- Chose name for Project
- Completed Project Proposal
- Identified essential components of product and incorporated these components into the budget
- Finalized Gantt Chart
- Created Wiki Page for Project
Week of 2/15
- Updated Gantt Chart
- Researched prices of items on budget: The results are uploaded in the budget section of this project.
- Discussed the basis of what algorithms we wanted to implement: There may be premade algorithms to find the centroid of color blobs, but if we can't implement one, we could instead test each pixel in the image for whether it falls within our thresholds of color, then take the average of the coordinates of all of these points (which should put us somewhere close to the center of the spot of color).
- Researched camera specs: We discovered that the PiCamera allows you to take video at 1080p at 30FPS and 720p at 30FPS, as well as still images, and has a native resolution of 5 megapixels. We do not know if this is enough resolution yet, however.
- We researched common methods of color-finding, and learned that most algorithms that attempt to identify colors in a human-like way use HSV encoding rather than RGB encoding. Accordingly, we now plan to do all of our analysis and thresholding in HSV.
Week of 2/22
- Updated Project Wiki
- Updated Project Budget
- Researched OpenCV: It turns out that OpenCV has many libraries that contain operations we may need, but 1) the documentation is often incomplete or above our level and 2) many functions are implemented tens of times with minor differences in semantics and nuance. We settled on a sublibrary called ImgProc, which seems to contain the blob-detection and color-scanning algorithms we were planning to use.
- As of last week, we did not know whether the camera had high enough resolution to achieve the tasks we had laid out as critical for this project. This week, we tested a laptop camera with similar specs to the PiCamera to determine at what distance the latter will be able to clearly distinguish different fingers. The result was that at three meters away from the camera, the fingers of a hand are clearly distinguishable to the human eye.
Week of 2/29
- Continued researched into possible OpenCV sublibraries to use for color point detection (especially further testing with imgproc)
- Loaded OpenCV onto Raspberry Pi (using Python 2, rather than Python 3 as planned; this took most of our time this week unfortunately)
- Configured OpenCV to be accessible to Python programs on Raspberry Pi (it took a while to realize that it imports as "cv2", not "opencv")
Week of 3/7
- This week was midterms so not very much happened on this project, unfortunately. However, we discussed algorithms further and found several ways to skip processing time (for example, checking only every 5th pixel for thresholding, then zooming in on areas with the correct colors to scan further) to avoid having to check millions of points.
Week of 3/14
- Since we didn't work very much last week due to midterm exams, David took home the Raspberry Pi over spring break and continued to work on getting useful images from the camera. We are able to take video and photos and save them to the desktop or load them into our program; however, the still images have significant blurring even with slow hand motion speeds (not a problem we had anticipated, but one that might be solved by reducing the image's exposure time). The videos are also in the .h264 format, which unfortunately cannot be read by the RaspberryPi or OpenCV. David downloaded a shell script (runnable from python) to convert the movies to .mp4 format, but it still takes too long to be practical at the moment. We have code to send the live video feed from the camera into OpenCV as a "stream" object, but we do not yet understand this code well enough to implement or modify it. That's one of our goals for next week.
Week of 3/21
- We started implementing the information we'd researched a few weeks ago in our explorations of OpenCV. In particular, at the moment we have somewhat abandoned blob-recognition (too complex and slow for what we need) and instead we are adapting a certain approach to color-finding. A common method of highlighting just a single color range in an image is to generate a "mask" image (1 where a pixel is in the color range, zero otherwise) and use a bitwise-and to overlay that onto the original, leaving only the colored pixels with a non-zero value. However, David realized that we could instead use a bitwise-OR to overlay the masks onto each other, and thus get a black-and-white "smear" image showing everywhere that the colored dot (the glove finger) has been throughout the gesture. This image could be directly compared to a set of templates we can create in the Pi's memory, since image-comparison is a well-explored problem in computer science (and you'd only have to do it once, at the end of the gesture, as opposed to the blob-finding algorithm that would have to run for each frame). Neither of us knows much about how to implement image-shape comparison, however, so we spent time looking up research papers on the subject (which Kjartan also helped us with). We now possess several possible solutions: Mean Squared Error (pixel-by-pixel comparison; most internet resources consider this to be inaccurate but fast), the Structural Similarity Index (developed by these researchers), a machine learning-based approach suggested by Alden (ideal for this purpose, but is significantly beyond our level of knowledge to implement), Keypoint Matching (suggested by Kjartan), and another paper found by Kjartan (whose authors I don't remember off the top of my head). We have not implemented any of these five yet, however, which is our goal for next week. Notably, this approach also somewhat helps with our gesture recognition and scaling difficulties (the other two unsolved problems Prof. Gonzalez gave us in our meeting), since that's incorporated (in theory) within the similarity indices, and we don't have to know the real-life size of the gestures to do this. Connor was also able to make significant breakthroughs on streaming video from the camera module live onto the monitor. However, the stream was in a format that has proved difficult to process or manipulate, especially when it comes to extracting and analyzing individual frames. Connor accessed the livestream via a Python script called as a command line argument from a Python program he wrote, which made the stream challenging to modify as it wasn't technically recognized as an object in Python. The livestream obscured the entire screen, and could be aborted. After trying multiple methods of pulling frames from the livestream without success, Connor moved on to working on pulling frames from saved video files.
Week of 3/28
- Both of us made major amounts of progress this week. Connor researched a number of different methods of processing frames from the video files, including using methods from picamera (a library specifically optimized to work with the camera module for the Raspberry Pi). He spent time attempting to solve this issue by screenshotting the video preview, but with no success. He also experimented with circular streams using picamera, but wasn't able get the camera to successfully livestream to the circular stream, and eventually came to the conclusion that recording a video using the VideoCapture method from OpenCV, accessing each of the video's individual frames using the VideoCapture object's read method, and processing the video's frames after the recording had been completed would be our best demonstration option for the upcoming project evaluation. Connor began researching a program that would iterate through a .h264 or .mp4 video file frame by frame, thereby allowing us to identify and collect data on the points in each image within our HSV color range for each color. David wrote an algorithm to implement the masking approach that was identified last week, which works with a great degree of accuracy on saved images (both color tests downloaded off of the internet and images captured through the camera). However, at the moment the algorithm can only open .png files (which is acceptable, because the camera captures images in that format, but rather confusing).
Week of 4/4
- Connor updated the Gantt Chart, Project Objective (modified desired final product to include gesture recognition), and Project Overview (modified description of demonstration from painting program to gesture recognition) to reflect the current direction of the project. He also found code online that allows a program to step through each frame in a saved video file, though David and Connor are both still struggling to understand how it functions.
- David significantly refined his algorithm from last week - in particular, it is now able to access individual frames from a video file, as well as spending time refining the color bounds we had previously determined. David also successfully converted the images into HSV encoding without significant loss in processing speed, and is now using bounds in that color space (much easier to work with for our application).
- David completed a program that has the capacity to either record a new video or analyze a video file, and then iterate through that video file frame by frame applying David's masking algorithm. The results of applying the mask to each image are visible on the screen as the program iterates through the recording and generates a composite image of the masked frames. David was able to successfully test our program with 1) a red dot on a piece of paper and 2) blue masking tape on his finger to a high degree of accuracy. We plan to test our program more extensively in the coming days. During our weekly meeting, David, Connor, and Kjartan also discussed the possibility of using a least-squares approach to solving the gesture matching problem. We also learned how to access the color of specific pixels in the image (we had not previous recognized that images were simply saved as numpy arrays). We will begin working to implement this proposed technique in the near future, and hope to discuss this further during our meeting with Professor Gonzalez.
Week of 4/11
- Following our meeting with Professor Gonzalez, Connor used this week to focus on researching the Hooke-Jeeves algorithm. After learning the logic and steps of the algorithm, he decided that we could indeed apply the (relatively simple) logic of the Hooke-Jeeves method in our program to significantly reduce the time our program took to execute. Connor discovered another coder's implementation of Hooke-Jeeves in Python on GitHub. He downloaded the program text, compiled it, and ran tests. The program executed successfully, and was initially built to take the Rosenbrock function as a parameter. However, the program was built to function in R2 rather than in R5 as our application requires. Connor began work on modifying this version of Hooke-Jeeves' algorithm to be compatible with the latest version of our program. He experimented with moving specific functions of the Hooke-Jeeves Python program into the program file, and making specific calls to these functions in the main method of our program. He created a temporary version of our overlap function that he modified to take two parameters instead of five, and worked in some of the logic included in the Hooke-Jeeves Python program.
- David focused on brainstorming, researching, and creating the necessary image transforms we discussed with Professor Gonzalez. First, he researched the methods to rotate the image. He was successfully able to rotate test images; however, he decided that this would be generally unnecessary because we did not want to recognize rotated versions of our gestures - for example, a horizontal and vertical line are identical when rotated by 90 degrees. This also reduces the problem's solution space to R4, making our algorithm significantly faster. Next, the methods for stretching an image and translating it were both simple to implement. However, our fitness function (initially, comparing the raw number of pixels that overlap between the image and the transformed template) requires the two images to be identical in size. Both stretching and translating images causes changes in size, necessitating another function to force them into the same size (eventually, we just cropped them both to only include their area of overlap, then compared those). By the end of the week, we had a working set of image transforms and a more complex fitness function (detailed on the project page). The only major tasks that remain are for Connor to finish the optimization algorithm and for David to refine the color bounds used in the masks.
Week of 4/18
- Connor continued to work on implementing Hooke-Jeeves and adapting the algorithm to image comparison. After plugging in our image comparison function in place of the Rosenbrock function and attempting to adjust our program to fit the variable standards of hooke-jeeves.py, we concluded that it would be more efficient to adapt the methods of the Hooke-Jeeves Python program to suit the purposes of our program (instead of vice-versa). David and Connor wrote our own version of the Hooke-Jeeves algorithm which took an image parameter, a template parameter, and a starting point parameter (in R4). We tested this function significantly until it worked to a reasonable degree of accuracy. Initially, the algorithm we wrote only worked to a limited degree because we were unsure which starting point and delta value to pass, but through trial and error we ended up selecting a starting point and delta value that worked well.
- David spent the week revising Connor's Hooke-Jeeves method and implementing all of the individual methods we'd constructed into a single cohesive program.
- We also attempted to paint the gloves we ordered on Amazon, but soon realized that the fabric spray paint wasn't sticking to the rubber/mesh fingertips of the gloves. Connor bought and painted new white fabric gloves, and these allowed extremely accurate masking (including the same analysis shown on the wiki page).
- We continued to work on comparing the composite mask to the gesture templates. Hooke-Jeeves reduced the time that our program took to execute down from nearly half an hour to a few minutes. By modifying our delta (step size) and omega (total iterations per comparison), we were able to reduce our total runtime to 53 seconds while retaining accurate results.
Week of 4/25
- Using the resources of the machine shop, David built a stand for the Raspberry Pi B+ that could attach to the tripod. Connor painted the final version of our glove.
- We consolidated general information about our project, the objective of our project, the challenges we faced, potential future applications of our project, and our methodology in processing and detecting user gestures into a poster.
- David continued to adjust the optimization algorithm we had constructed, because it wasn't finding global maxima very well. (More precisely, it found local maxima fine, but the fitness function was non-ideal, with too many local maxima with low values compared to the global maxima.) He ended up calculating the area of the image which contained 95% of the white pixels in each direction, and choosing initial values to fit the transformed template into this area exactly.
- The delta used by the Hooke-Jeeves was varied by a FOR loop, in order to test different "scales" of change in the four parameters. Connor tested different values for this FOR loop, and settled on the range of 15 to 75 (with a step size of 15). He found that this range and step size combination ran approximately 33% than our previous combination while retaining the accuracy of our results. Together, we found that Connor's parameters induced the optimization algorithm to settle on excellent results - in our testing, out of 40 trials, the code correctly identified the gesture's shape in all but one.
- We tested our product in Lopata Gallery. Immediately, we found that the HSV bounds that we had successfully tested for multiple colors in the CAD Lab were not effective under the greenish lighting of Lopata Gallery. We first tested our product in front of the green wall, which tinted our video feed green and made detecting blue and red in the video nearly impossible. Recognizing this, we began testing our product in front of the blue wall with our original HSV bounds for the color red. This strategy proved to be effective, and we then began testing blue HSV ranges with the user's gesture against our white backdrop. After a few hours of testing and making adjustments, we were able to successfully mask red and blue color values in Lopata Gallery.