CSCI-B657 Research Projects -- Spring 2019

This page summarizes the final projects conducted in CSCI B657, Introduction to Computer Vision, in Spring 2019 at Indiana University. Each team consisted of 1 to 4 graduate students. The course was supervised by Professor David Crandall and Associate Instructors Eriya Terada, Violet Xiang, and Shyam Narasimhan.

Image super-resolution using generative adversarial network

Adithya Chowdary Boppana, Satyaraja Dasara, Siva Charan Mangavalli

Abstract:
Deep learning based approaches like SR-CNN and SR-ResNet have shown good results for super resolution, however, the metric commonly used i.e PSNR is considered to be a not so reliable metric for estimating resolution quality and those architectures may be suboptimal for the resolution task as they are more suited towards classification and object detection tasks. This project seeks to explore a new deep learning based approach based on Generative Adversarial Network architecture using perceptual loss function for super resolution of images. This is an implementation project of the research paper, “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network”.

For more information, see: poster or final paper.

Optical Music Recognition

Michael Guo Daozhen Lu Anirudh Jhina

Abstract:
Optical character recognition is a popular subject area and its applications are found in many places such as in language translation apps that allows you to take a picture of some text and will translate that text for you. With the wide array of applications that uses optical character recognition, it seems that this area has been mostly solved. Aside from recognizing texts, optical music sheet detection appears to pose quite a challenge still as it hasn’t received as much exposure due to the seemingly lack of demand for such applications. The complexity of how music is written and structured posses a great challenge. In this project we will attempt to tackle the task of recognizing notes from an image of a sheet music and also transform the recognized notes into a midi file which can play back the music. We take this opportunity to apply our knowledge so far obtained in computer vision to explore the intricacies of sheet music and discover limitations as well as solutions to optical music recognition.

For more information, see: poster or final paper.

Semantic and Instance Segmentation

Abhilash Kuhikar, Raunak Vijan, Saurabh Mathur, and Shivam Rastogi

Abstract:
Robots are being used in a variety of environments from outer-space to deep seas.Vision-based navigation controls a robot's movement by analyzing image frames from the robot's camera. Thus, precise image understanding is necessary for vision-based autonomous robot navigation. Our work deals with semantic segmentation of images by labeling each pixel in the image as one of many classes. We trained and tested the popular segnet architecture for semantic segmentation using various loss functions on the camVid11 dataset. Through our work we show that given the same setting and the environment, the soft dice loss function gives better results for the semantic segmentation task. We also show that with little change in architecture such as Bayesian segnet and some postprocessing methods we achieve better accuracy. We implemented bayesian segnet and CRF smoothening to further improve the segmentation masks predicted by the network. We also experimented on the video by applying segmentaion on the image frames extracted from the video.

For more information, see: poster or final paper.

Pebble detection from 2D image

Ashok Kuppuraj(akuppuraj)

Abstract:
Motivated by the ability of the algorithms to detect objects and patterns, wanted to work on a less researched topic of detecting the type of stones from a given set of images. It is very conventional to work of detecting human faces, handwritten characters and lots of researches are going on these topics, however very minimal work is been done on this wide area of stone type detection. The objective is to embed intelligence in our day to day activities. This idea of detecting stone type came when I order a specific type of gravel and got delivered a similar one with a different pattern. Hence, considered this issue as my problem statement.

For more information, see: poster or final paper.

Eye in the Sky: Remote Sensing of Solar Panels

Anthony Duer, Johny Rufus, Santosh Kangane, and Subhojit Som

Abstract:
Over the past 5 years, solar generation at residential and commercial sites has nearly tripled from 11 Terawatt-Hours (TWh) to 30 TWh (EIA, 2019). This surge in self-generation has caused utilities to face situations they had not previously planned for such as areas with negative load, increased evening ramps as the sun sets, and transmission reliability issues (Roberts, 2016). In order to combat these issues, utilities must be able to track where solar installations are located and their capacity. Knowing these values allows the utility to plan transmission and distribution upgrades accordingly and to know where load may be shifting on their system. Our study attempts to tackle this issue directly by using satellite data. This data has many advantages including low acquisition costs compared to site visits or surveys, the potential to be updated quickly, and the ability to cover large geographical areas. We developed two models to detect and then segment solar panels using labeled data from Duke University’s Energy Data Analytics Lab (Bradbury, 2018). Our results confirm the conclusion of other studies that detection and segmentation is feasible at a large scale and we extend the conclusion of them by comparing the performance of multiple pretrained models against our dataset.

For more information, see: poster or final paper.

Day2Night Image-to-Image Translation

R.Dharia, N.Raichura, A.Sagar, A.Tai

Abstract:
We investigate three computer vision techniques for converting scenes from day-time to night-time. The first approach automates a Photoshop method to automatically adjust the brightness, contrast, and the image curve layer of the daytime image, followed by noise addition and sky scene modification to produce the final night-time image. The other two approaches utilize existing Deep Learning models (Cycle-GAN and Conditional Adversarial Networks) to map input images to output images. Both models follow the principles of the Generative Adversarial Network architecture with augmented loss functions to solve daytime-image-to-nighttime-image problems. We evaluate and compare the performance of the three techniques visually in a perceptive manner. Our experiments show that although the Photoshop technique generates good customized images, the Deep Learning approaches produce better results on generalized images that span similar context to that of the training dataset. Furthermore, we observe that Cycle-GAN is slower in terms of training process but generalizes better to unpaired data, while Conditional Adversarial Network performs better with paired image instances.

For more information, see: poster or final paper.

One Shot Object Segmentation

Jagpreet Singh Chawla, Amit Kumar Yadav, Bivas Maiti

Abstract:
Semantic segmentation is a classic vision problem which has many applications which include, but is not limited to robotics, autonomous driving, virtual reality, etc. Unlike detection, segmentation provides a more precise object boundary which is especially useful in the field of robotics. Semantic segmentation is a highly researched topic, and most of the proposed methods require a large amount of data to train accurate models. While getting image data is not very difficult because of the internet, getting a large amount of annotated data requires a lot of resources. Therefore, in this project we are focusing on the problem of one-shot object segmentation, i.e. Segmenting an object using only one annotated example per class. We implemented approaches from two papers, compared and evaluated them on RGB-D Object Dataset, and improved the results by incorporating depth information using MRF. We also discuss the limitations of these models as seen on this dataset.

For more information, see: poster or final paper.

CASP: Classifying Art Styles in Painting

Chris Farris

Abstract:
Painting is classified into a groups based on visual traits as well as historical context and semantic information. This paper applies convolutional neural networks to art data in an attempt to create a Computer Vision system to be able to classify images of art into 21 different styles. We expand work of previous authors to include new network architectures. Limited to basic processing resources, we were able to replicate results of previous authors with less complex architecure and we propose a new system which we will continue to explore.

For more information, see: poster or final paper.

Identification of Online Chess Board Configurations

Can Kockan

Abstract:
One of the standard ways to represent the current state of a chess board is to use the Forsyth-Edwards Notation (FEN), which communicates the position and type of each chess piece that is still in the game. In this project, my aim was to explore different computer vision techniques to automatically generate the FEN from an online image of a chess board for some ongoing game. However, due to time constraints, I limited the scope of this project to the identification of individual pieces on a single given chess board image, rather than generating the entire FEN for a full dataset. The problem of correctly classifying chess pieces turned out to be more difficult than expected due to a number of factors, such as how the pieces are represented via their 2D graphical models, similarly shaped pieces like Pawn, Bishop, and Queen generating similar features, and a variety of textures available in online chess systems. With the advent of Deep Convolutional Neural Networks (DCNN), achieving very high accuracy is possible nowadays despite all of the issues mentioned above, but in this work I’ve tried to explore whether any other vision techniques could be applied with some degree of success to the same problem and whether the results obtained could shed light to why DCNNs fail in the few cases they do. I argue that using image contours as an input to Hausdorff Distance calculation along with a random sampling strategy could yield decent results and help us understand why DCNNs fail in the few cases they do.

For more information, see: poster or final paper.

Application of Deep Learning Based Hybrid Approach to Image Bounding Box Object Detection and Classification

Joe Kelly (kellyjo), Prince Frank Butler (pfbutler), Cheng Wang (cw17), and Yafei Wang (wangyafe)

Abstract:
Object detection has been a challenge in computer vision. This paper pits detection using Histogram of Oriented Gradients (HOG) approach against a Mask Region Convolutional Neural Network (Mask R-CNN) approach. The Common Objects in Context (COCO) challenge has a large set of annotated images that serve as the basis for this project, to help determine which approach was most appropriate for object detection. Our results indicate that Mask R-CNN, although not perfect, performs better than our implementation of HOG-based object detection.

For more information, see: poster or final paper.

DCUIIF: Detection and Classification of User-Implemented Image Filters

Derrick Eckardt

Abstract:
Image sharing is prolific to previously unimaginable proportions, with photo-sharing applications, such as Instagram and Facebook, as the largest drivers. These apps enable users to easily apply photo filters, now an industry standard, to enhance their images. With the volume of images and ubiquitous use of filters, comes the challenge of determining if an image has been altered. Filters are the simplest form of altering an image. While detecting if someone’s #nofilter image was truly filter free would also be a useful parlor trick, a method to detect filter usage important in the detection of images modified to spread disinformation or other nefarious uses. Early efforts indicate that basic blurring, sharpening, and tinting filters can be detected with standard Machine Learning techniques with surprisingly high accuracy. Future efforts should focus on increasingly more difficult datasets, feature engineering, and using more complex filters.

For more information, see: poster or final paper.

Automated Detection and Counting of Pharmaceutical Tablets using Faster R-CNN

David Antoszyk

Abstract:
Pharmaceutical tablets are ubiquitous in our society, but not every individual is able to successfully interact with them. This project explores the use of R-CNNs to count pills in an image. To accomlplish this goal, we re-train Tensorflow's object detection model (faster_rcnn_resnet101_coco) on two publicly available pill image datasets, test the correct count using photos of grouped pills, and record the correct percentages under a varety of conditions. Our results suggest that this implementation and our training datasets do not represent a reliable solution to automated tablet counting.

For more information, see: poster or final paper.

Image Super Resolution using General Adversarial Network

Darshan Shinde, Virendra Wali, Pei-Yi Cheng

Abstract:
Abstract: Super resolution is one of the hot research topics in computer vision. We reimplemented this technique which was first proposed by Andrian Bulat et al. (2018). This technique comprises two GAN (General Adversarial Network) models named High-to-Low and Low-to-High. A High-to-Low model which generates paired dataset whereas a Low-to-High model is used to resolve the image. The results show that the GAN model successfully super resolves face images with some exceptions. We aim to validate the authors claim via testing the model with another dataset. In addition, we also aim to explore the feasibility of using only Low-to-High GAN model with a loss function proposed by author on artificially generated paired data set via bilinear down sampling.

For more information, see: poster or final paper.

Automatic cone detection with convolutional neural network-based method

Hae Won Jung, and Rui Li

Abstract:
Binary identification of whether there is a retinal neuron or not in a high-resolution retinal imaging has been reliant on manual marking by expert graders identifying each visible neuron. Convolutional neural network could be used to automate this labor and time intensive task. In this project we try to re-implement convolutional neural network-based retinal neuron detection on confocal images from previously published work from Cunefare et al.

For more information, see: poster or final paper.

Locate Handwritten Text Block from Scratch Paper

Kaiyuan Liu

Abstract:
Image recognition has archived impressive success in recent years, however, existing notes recording programs hardly detect complete text block but can only gives you some fragments, thus we proposed a novel application based on EAST that can locate text blocks from photos of scratch papers. Our application will not perform OCR but we make it robust towards locating text blocks with different angles, text sizes or orientations.

For more information, see: poster or final paper.

AutopilotNet: An Ensemble of ConvNets for adding perception capabilities in self-driving cars

Piyush Vyas, Shivam Thakur, Sanket Patole, Manjeet Pandey

Abstract:
In order for self-driving cars to maneuver safely on the roads they should not only be able to perceive what surrounds them but also make important driving decisions based on what they perceive. This is a difficult robotics and vision challenge for self-driving cars that humans solve well. However, with the recent advancement sin deep learning such as Convolution Neural Networks (CNNs), vision systems are now, in some tasks, able to achieve human-level vision capabilities. Hence using the power of CNNs, we try to build a unified system thatll add perception capabilities in self-driving cars to help them maneuver safely on the roads.

For more information, see: poster or final paper.

Who am I: face recognition mobile application

Shashank Lalit Khedikar

Abstract:
The Who Am I? is Face recognition cell-phone App (Android/iPhone). App calls the pre-trained model by a web service call or model saved locally (for offline mode). Pre-trained model essentially has a list of every SICE faculty and staff member's name, some simple features like job title etc., and pre-annotated one or more photos of the faculty and staff members. When user launch Who Am I? the app, on push of the button; application retrieves the details of the person in front of the camera with name, job title, and gander etc. If the App doesnt recognize the persons face from the list, then it shows Other (if it's not a face on the list). And facilitate mechanism to annotate un-recognize person. Later this information to be used to re-train model.

For more information, see: poster or final paper.

Binary Segmentation of Sexually Explicit Images

Vincent Malic

Abstract:
Though digital pornographic imagery is widely-available, easily-accessed, and an intrinsically visual, little computer vision research has been conducted on sexually explicit materials beyond not-safe-for-work classifiers. This is due both to the taboo nature of the content and the lack of labeled datasets. Because of the difficulty of obtaining annotations for sexually explicit imagery, computer vision research on pornography must relay on unsupervised methods to obtain useful embeddings for downstream tasks. However, such methods are often unsuited for the specificity of the concept of sexual explicitness, which is present only in parts of sexually explicit images. To remedy this issue, we present a binary segmenter that is capable of isolating the sexually explicit portions of images for downstream tasks. To achieve this, we create the first manually annotated dataset of sexually explicit images.

For more information, see: poster or final paper.

Fine-Grained Image Classification using Small Datasets

William Ollo

Abstract:
Image Classification has been a popular field of study in computer vision for a few years. There has been a lot of research focused on building deep specialized networks that are trained using a large number of images. This paper focuses on building small networks to be trained using as few images as possible. The reason for this is to try and reduce training time as much as possible to widen the range of uses for convolutional neural networks. After training a CNN using just low information pixel data I achieved decent results, with more fine tuning and better image data better results are achievable.

For more information, see: poster or final paper.

CatCentric Activity Recognition

Ziwei Zhao

Abstract:
Cats are cute, fluffy and cuddly animals. However, their behaviors still remains a mystery for human. In this project, we propose a deep-learning based network structure to recognize cat activities in video, especially from the cat’s own viewpoint (Egocentric view).

For more information, see: poster or final paper.