CSCI-B657 Research Projects -- Spring 2016

This page summarizes the 26 final projects conducted in CSCI B657, Introduction to Computer Vision, in Spring 2016 at Indiana University. Each team consisted of 1 to 3 graduate students. The course was supervised by Professor David Crandall and Associate Instructors Zhenhua Chen, Xuan Dong, Sumit Gupta, and Kai Zhen.

Traffic Sign Detection System

Adithya Tirumale, Akash Gopal and Sumit Dey

Abstract:
The main objective of our project is to design and construct a computer based system which can automatically detect the road signs so as to provide assistance to the user or the machine so that they can take appropriate actions. The proposed approach consists of building a model using convolutional neural networks by extracting traffic signs from an image using color information.

For more information, see: poster or final paper.

Finding Motion Patters in Complex Videos

Nihar Khetan, Anand Sharma and Puneet Loya

Abstract:
We are trying to find Motion Patterns in a given Video. Implemented algorithm not only finds the direction of motion happening in the video but also areas of dominant motion in a video. Our work is based on research paper by [MinHuSaad2008]. We work on instantaneous streams of video instead of long duration videos. Moreover we worked on Complex videos which have many objects moving in different directions simultaneously. We assume that videos are taken from stationery camera. Global motion flow field was considered to detect motion happening across the frames. Results were used to find sinks which are points which are in motion. These points were further clustered to generate Super tracks. We tested the algorithm on few real world videos as well as shot few videos ourselves.

For more information, see: poster or final paper.

Speed Detection of Moving Vehicles in a Video

Bipra De, Ghanshyam Malu and Suhas Gulur Ramakrishna

Abstract:
In recent years, the number of traffic accidents have increased exponentially. Currently, speed detection is done in most of the areas using RADAR technology which uses Doppler Effect. Monitoring speeding vehicles manually is not feasible round the clock everywhere. Automation of speed detection is need of the hour. Given a traffic surveillance video, we aim to detect the speed of the moving vehicles.

For more information, see: poster or final paper.

Currency Reader for People with Visual Impairments

Sagar Bhandare and Sanket Mhaiskar

Abstract:
A large number of people are unfortunately affected by visual impairments. India is home to around 30% of the world’s population as per the Huffington Post statistics. Visual Impairments drastically affect the quality of life and cause a lot of hardships for these populations. They also limit the daily activities. One of the daily activities is management of Currency. Currencies have been designed to also help the visually impaired recognize the denominations. But it is still somewhat difficult for the visually impaired to handle cash currency.

For more information, see: poster or final paper.

Chess Board Recognition

Raghuveer Kanchibail, Supreeth Suryaprakash and Suhas Jagadish

Abstract:
Chess Board recognition is an implementation which recognizes the chess board by locating the squares and detect the chess pieces from the input image using image processing techniques. The chess board is segmented from the input image, edges are detected using Canny’s edge detector and cross lines are detected using Hough transform. This will give us the required 64 squares. Each square, with some vicinity around it, is extracted and compared to see if it contains a chess piece. If yes, then the test piece is scaled and oriented to compare with the pre-defined training set. The area score is calculated by taking difference of training pieces and test piece and the one with the lowest score is chosen as the best matching piece.

For more information, see: poster or final paper.

Logo Recognition Using Bundle Min-Hashing

Abhishek Mehra, Marshal Patel and TejasShah

Abstract:
The objective is to identify brand logos from given set of images. The dataset consists of various images of objects bearing logos. In order to solve this problem, we tried two approaches, based on bag-of-words and bundle min-hashing. One of the potential application is within marketing industry where, the companies can track down the visibility of their logos within an image or video.

For more information, see: poster or final paper.

Classification of cross-sectional scans of the retina

Ayoub Lassoued

Abstract:
The aim of this project is to find a computer vision method that allows the classification of scans acquired from a subject into either healthy or unhealthy. This fits into the general goal of providing a valid support to a human diagnostic in the field of healthcare.

For more information, see: poster or final paper.

Storyline Reconstruction for Unordered Images

Sameedha Bairagi, Venkatesh Raizaday and Arpit Khandelwal

Abstract:
Storyline reconstruction is a relatively new topic and has not been researched extensively. The main objective is to take a stream of images as input and re-shuffle them in chronological order. The recent growth of online multimedia data has generated lots and lots of unstructured data on the web. Image streams are generated daily on websites like Flicker, Instagram etc. and almost 400 hours of video is uploaded on YouTube on a daily basis. In this paper, we try and implement an algorithm which uses the property of videos of being temporally adept to sort a stream of unordered images.

For more information, see: poster or final paper.

Localized Object Detection with Convolutional Neural Networks

Bardia Doosti and Vijay Hareesh Avula

Abstract:
Multi-Object detection is one of the active fields of computer vision. The goal of this field is detecting all the objects of a given image. Since the Convolutional Neural Networks can detect the objects with more than 90% of accuracy, we can use a fine-tuned neural network to detect an object. The problem of CNNs and all other classifiers is that they only can classify the whole image to an object and cannot localize that object in the image. In this project we are going to implement some techniques to detect the border of each object in the image and send these images to a well-known CNN and label that region.

For more information, see: poster or final paper.

Classification of Images into specific Scene Categories

Pradeep Kumar Ravilla, Srikanth Srinivas Holavanahal and Manikandan Murugesan

Abstract:
Object recognition has been a broad area of research in computer vision. Scene classification is a problem under the domain of object recognition. Scene classification is a very interesting problem of automatically labeling an image among a set of semantic categories. Scene classification has varied range of applications for example; it can be used to see if the pictures uploaded in Yelp/Foursquare/TripAdvisor sites were uploaded in the correct category. The image can be classified if it is a Scenic Lookout site/Hotel/Restaurant/Sports Facility. This classification can be used to automatically update the features of those places instead of manual update by the users.

For more information, see: poster or final paper.

Classification of Dogs and Cats

Rajeev Reddy and Karthik Sreenivas

Abstract:
The topic of our final project involves classification between cats and dogs. This is a classical image classification problem in Computer Vision which mainly involves identifying certain features which can help us in clearly distinguishing between the two species. We used the following methods to classify the images: Bag of Words, Region Growing and Convolutional Neural networks.

For more information, see: poster or final paper.

Optical Music Recognition with Neural Networks

Scott McCaulay

Abstract:
This project is an evaluation of a neural network as a method for performing Optical Music Recognition. The goal is to try to use a neural network alone to parse images of musical scores, with as little help as possible from other computer vision techniques. A big part of the challenge in this field is the heterogeneity of the data. There is an interest in retrieving data from scanned musical scores from different historical eras, many originally hand written, in various conditions, sizes and using inconsistent notations. This project makes no attempt to address those challenges. Since all the data utilized here will be from the same source, it will be as though the image pre-processing stage is already completed, and done flawlessly. Given this sanitized and consistent data, it may be possible for a machine learning algorithm with no understanding of the domain to produce results at least better than random guessing. That is what the project intends to test.

For more information, see: poster or final paper.

Text Detection and Recognition from Natural Scene using Stroke Width Transform and Deep Feature Classification

Ishtiak Zaman

Abstract:
In a natural scenery, there could be multiple instances of text that an agent may want to read. In this project, we detect and recognize text characters from an image. The project has two major parts. First part is to detect and localize text characters in an image. This part is able to filter out the characters from the whole image that can be used in the second part. We are using Stroke Width Transform [1] method to detect and localize text. In the second part, we have a set of localized characters. We extract the deep features of each characters and classify the characters using a trained SVM.

For more information, see: poster or final paper.

Estimating Distance of a Person from Camera

Tousif Ahmed and Rakibul Hasan

Abstract:
Nowadays, visually impaired people can live their life more independently with the help of several assistive devices and technology. Even though, computing devices like Or- Cam1, Victor Reader Stream, CTC Scanner, SignatureGuide and other accessible de- vices addressed many of the accessibility and mobility concerns of people with visual impairment in the physical world, the physical privacy and security concerns remain largely unaddressed. In a recent work which is conducted by our group, reported the privacy and security needs of visually impaired people and discussed several privacy concerns of visually impaired people in the physical world including eavesdropping and shoulder surfing. In a subsequent study, our group reported several safety concerns that can be caused by their inability of judging the surrounding environment. In this project, we decided to explore this problem by using computer vision approaches. Our research seeks to answer two questions: 1. Counting Problem: Can we count the number of people from an image? 2. Distance Problem: Can we estimate the distance of the detected person of the image?

For more information, see: poster or final paper.

Analyzing Vibrating Objects from Video

Kotak Dhvani Deven, Tejashree Elli and Maria Soledad

Abstract:
The main goal of our project is to estimate the frequency at which an object is oscillating in a silent video. Extracting such a property can lead to applications in non-invasive respiratory rate estimation, human blood ow detection and sound extraction from vibrating objects. By modeling the movement of tracked features (e.g. corners) on a video we can infer some important behavior of a rhythmic object, e.g, the frequency of a tuning fork or the respiratory rate of a sleeping person. We started our project with frequency estimation of a tuning fork from a mute video and its audio retrieval and then extended the concept to respiratory rate and heart rate estimation. For this project, we had to record most of the videos in our own data set and we took advantage of the slow-motion and timelapse features of our cellphones camera.

For more information, see: poster or final paper.

Offline Mathematical Expression Recognizer

Yisu Peng and Yang Zhang

Abstract:
In academic area, LATEX is widely used. But it takes people a large amount of time to write the code for formulas and mathematical expressions. Besides this, people often don't know how to write some special mathematical symbols and then need to go to do a search. Therefore, it is very useful to have some tool that can translate a math formula image into LATEX code. In order to solve such a problem, we decompose it into several subproblems. The first step is to preprocess the raw input image so that it can be handled better in later steps. The second task is to do a segmentation on the image to have all pixels for one symbol in a set. The next one is to figure out the location and size of each single symbol with the information provided by the previous segmentation process. Then, we need a classifier to recognize each symbol. At last, there should be some parser which in cooperate both the location and classification information to get the final result. For the parser, we have not started because we did not get good enough accuracy for classification.

For more information, see: poster or final paper.

Liveness Detection Using Face Detection

Prakash Rajagopal and Vishesh Tanksale

Abstract:
We are concerned with face recognition system in our project. There are multiple methods by which attack on face recognition system can be launched. In our project we have tried to develop a system that will detect a spoofing attack on a face recognition system using photograph. An attacker can obtain a picture of a victim and then use it to gain wrongful access to a system which is protected by a face recognition biometric system. We have tried to implemented liveness detection for face recognition system using two different method. Details about them will be put forth in later sections. There are various way of liveness detection for a face recognition system. Our method of liveness detection is based on the facial variation. Precisely, we are trying to capture the motion in the eye region to determine liveness. Eye region is used for detection because it exhibits lot of variation in it shape and size in small amount of time. This variation in eye region can be attributed to blinking of eyes, continuous movement of eye, change in pupil size.

For more information, see: poster or final paper.

FEBEI - Face Expression Based Emoticon Identification

Nethra Chandrasekaran Sashikar and Prashanth Kumar Murali

Abstract:
Expression Based Emoticon Identification (FEBEI) system is an open source extension to the Tracker.js framework which converts a human facial expression to the best matching emoticon. The contribution of this project was to build this robust classifier which can identify facial expression in real time without any reliance on an external server or computation node. An entirely client-side JavaScript implementation has clear privacy benefits as well as the avoidance of any lag inherent in uploading and downloading images. We accomplished this by utilizing several computationally efficient methods. Tracking.js provided a Viola Jones based face detector which we used to pass facial images to our own implementation of an eigenemotion detection system which was trained to distinguish between happy and angry faces. We have implemented a similar eigenface classifier in python and have trained a Convoluted Neural Network (CNN) to classify emotions to provide a comparative view of its advantages. We aim to make FEBEI easily extendable so that the developer community will be able to add classifiers for more emoticon.

For more information, see: poster or final paper.

Holistically-Nested Edge Detection

Mingze Xu and Hanfei Mei

Abstract:
Holistically-Nested Edge Detection (HED), which is a novel edge detection method based on fully convolutional neural networks, has a great performance on edge detection work for natural scenes. There are two important improvements in this method: (1) image-to-image training and prediction; and (2) utilization of multi-scale and multi-level deep learning architectures. Inspired by this, we re-implement HED and try to solve Optical Character Recognition (OCR) problem by matching edge maps between templates and input images.

For more information, see: poster or final paper.

Obstacle Detection through Forward Movement

Qiuwei Shou and Alan Wu

Abstract:
Our project is inspired by its applications to autonomous robotics on an embedded platform. We envision an unmanned aerial vehicle navigating through a dense environment, such as a forest, for a search and rescue operation or an area survey. The ability to detect and avoid obstacles is paramount to a successful mission. Our approach is motivated by Mori and Scherer’s paper published at the 2013 ICCV conference where they use relative size of SURF features to detect approaching objects. Their algorithm is based on using a monocular camera, which has advantages in a smaller and more affordable payload compared to a stereo (or two) camera(s). Relative size is one of several approaches in using a monocular camera, which the paper’s authors outline in their related works section, and briefly summarized here.

For more information, see: poster or final paper.

Generating Chinese Captions for Flickr30K Images

Hao Peng and Nianhen Li

Abstract:
We trained a Multimodal Recurrent Neural Network on Flickr30K dataset with Chinese sentences. The RNN model is from Karpathy and Fei-Fei, 2015 [6]. As Chinese sentence has no space between words, we implemented the model on Flickr30 dataset in two methods. In the first setting, we tokenized each Chinese sentence into a list of words and feed them to the RNN. While in the second one, we split each Chinese sentence into a list of characters and feed them into the same model. We compared the BLEU score achieved by our two methods to that achieved by [6]. We found that the RNN model trained with char-level method for Chinese captions outperforms the word-level one. The former method performs very close to that trained on English captions by [6]. This came to a conclusion that the RNN model works universally well, or at least the same, for image caption system on different languages.

For more information, see: poster or final paper.

Age Estimation from Frontal Images of Faces

Rohit Nair, Srivatsan Iyer and Shashant Devadiga

Abstract:
It is pretty easy for a human to look at a face and estimate the age of the person approximately. For a computer to automatically do that is quite a challenge. This would require using techniques from different areas such as feature detection, machine learning, and anthropometrics. This project report summarizes the result of using anthropometric models with varying parameters.

For more information, see: poster or final paper.

Seeing the Unseen – Lie Detection using Human Emotions

Achyut Sarma Boggaram, Debasis Dwivedy and Furu Zhang

Abstract:
The goal is to offer an alternative to the conventional contact-based lie detection system by using contactless video analysis. This approach offers several advantages: I)No court order required. II) Could be applied to real time as well as pre recorded videos. III) Can be applied without the knowledge of the observed person The setup incorporates the “Eulerian Video Magnification” approach published by MIT CSAIL¹ that can be used for motion and color amplification of video data. Here, we visualize the amplification of temporal color variations and low amplitude motion with this technique without the need for image segmentation or optical flow computation.

For more information, see: poster or final paper.

Pharmaceutical Pill Recognition Using Computer Vision Techniques

Charlene Tay and Mridul Birla

Abstract:
The National Library of Medicine (NLM) recently put out a "Pill Image Recognition Challenge", seeking algorithms and software to match images of prescription oral solid-dose pharmaceutical medications (pills, including capsules and tablets). For our final project in B657 Computer Vision, we have decided to take up this challenge. Such algorithms and software can be applied in many important ways, especially in helping medical care professionals and their patients deal with unidentified or mislabeled prescription pills. Senior citizens are especially affected by this, with nine out of 10 of US citizens over the age of 65 who take more than one prescription pill being prone to misidentifying their pills. Taking such pills can result in adverse drug reactions that affect health or could even cause death. By coming up with ways to easily identify and verify prescription pills, errors can be greatly reduced. Another useful application of pill recognition would be in aiding law enforcement with the identification of counterfeit or illicit drug pills.

For more information, see: poster or final paper.

Parking Lot Classfication

Aniket Gaikwad, Bhavik Shah and Cyril Shelke

Abstract:
Finding a vacant space in parking lots of large metropolitan areas may frequently be- come exhausting. Apart from being stressful, this challenging task usually consumes considerable time and money. In addition, it contributes to the pollution of the envi- ronment with CO2 emissions. Though there are number of solutions based on different technologies, we advocate the use of image/video processing. We are using a subset of PKLot Dataset[1] that contains 100,000 images of parking spaces segmented out from 12,000 images of parking lots under different weather conditions. Using random sampling we divided the dataset for training and testing purposes. 70 percent was used for training and 30 percent was used for testing and accuracy calculations.

For more information, see: poster or final paper.

Scene Graph Generation from Images

Satoshi Tsutsui and Manish Kumar

Abstract:
Image understanding by computer is advancing exponentially these days due to the phenomenal success of deep learning, but there is still much work left for the computers to reach human level perception. Image classification (sometimes with localization) is one of the standard task, but this is far from the image understanding. The other tasks such as image caption generation or visual question answering have also reached to practical level of quality, but these are still far from the complete image understanding. Image caption generation, which is a task that generates a summary sentence from an image, cannot fully describe rich scenery in an image. Visual question answering also answers simple questions, but cannot answer questions that requires complex reasoning. In order to fully understand an image, we need to know; what objects are in the image, what are the characteristics of objects, and how these objects interact to each other. In this project, we try two approaches to answer these questions. In the first approach, we use automatically generated caption as intermediate structure. In the second approach, we assume that objects with regions are given (or already detected by previous work), and mainly focus on two questions: what are the characteristics (attributes) of objects, and what are the relations between these objects.

For more information, see: poster or final paper.