I am pleased to present the latest result from our road lane detection project. I am really excited to see what can be done with it.
The RoadAR system performs road lane detection and recognition. RoadAR assists the driver by providing aggregated traffic information using inexpensive hardware. Our current approach is based on image processing using a monocular camera system. Below is a demonstration of the lane detector.
This list includes a selection of papers from this year's Compter Visoin and Pattern Recognition conference. The conference has a lot of papers and many of them are also very good.
- Face Alignment at 3000 FPS via Regressing Local Binary Features
- Socially-aware Large-scale Crowd Forecasting
- Human Action Recognition by Representing 3D Skeletons as Points in a Lie Group
- Real-time Hand Tracking on Depth Images
- Make3D: Learning 3D Scene Structure from a Single Still Image
- Deep Learning with Hierarchical Convolutional Factor Analysis
Here is a link to the whole proramme: http://pamitc.org/cvpr14/program.php
This past month I've looked at object recognition. Initially I was interested in people detection which led me to look at a particular feature extraction method called Histogram of Oriented Gradients (HOGs), but then it became clear that the HOG descriptor can find general shapes and is robust against varying lighting and object orientation.
I use HOG descriptors and Support Vector Machines (SVMs) to detect people in standard datasets. The image below shows the result of processing one sample frame.
This is an idea by a friend --- to use an accelerometer to rotate the desktop.
After concluding my work with face detection (see below) I naturally started to research speech recognition with the goal to create a single word recognition system for home.
For a very long time I struggled with Hidden Markov Models (HMMs) used in many recognitioning systems today However, lately I found out that HMMs are possibly overkill for the single word recognitioning system I was aiming for and I implemented a technique called Dynamic Time Warping which so far works well.
Below is a plot of various stages in converting the speech input (top left) to feature vectors (bottom right). The feature extraction is performed using Mel-Cepstral coefficients which extract vocal properties from the log-power spectra of the speech signal.
The figure shows recorded speech (top left) which is filtered to extract the interesting samples (bottom left). This filtered data is used in the spectral analysis to compute Mel-Cepstram coefficients (top right) which are used as weights to the inverse transformation back to time domain (bottom right).
If you have never tried OpenCV I would recommend it for any type of image processing. Earlier this year I modified the Haar-detectioning sample script to do some cool effects. Now this summer, I have implemented my own face recognition software based on the eigenfaces technique. This software is part of a larger project that I will put some time into this fall. The picture below shows a sample application I built around the recognition software. It shows portraits of a camera feed and lets the user select and save the portrait which is immediately loaded into the recognitioning software.
The library works well but there are immediate improvements that I will have to complete before I continue to build on this code.