IRIS

Iris

Iris is an object detection and image recognition android application. The app was built as a month long, self-assigned class project for the Intro to Mobile Application Development course. The goal of the project was to implement all the mobile application development concepts learnt over the course, along with a machine learning component as a personal goal. The applications basic UI elements and use of the camera covered the mobile application development portion of the project. The applications use of two machine learning model, one for object detection and one for image recognition covered the machine learning aspect of the project. The models were pre trained Tensor Flow Lite's Convolutional Neural Networks that were optimized to run on a mobile device.

  • Time period: May 2018
  • Project Type: Class Project
  • Github: Project Link

Project Overview

Used Tensor Flow Lite's pre trained convolutional neural net for mobile device machine learning

The tensor flow lite model used in the project has a Single-Shot Multibox Detector (SSD) with feature extraction head from MobileNet. The feature extraction part of the network is a collection of convolution blocks from another network (VGG). The outputs from the feature extraction section are fed into to the detection layer. The detection section is made up of a sequence of detection blocks, with each block having its own output that contributes to the final output of the network. Feature maps are reduced in size after each block in order to capture different scales. Each of the detection blocks has 3 branches: box generation, classification and correction of localization. The first one is responsible for cropping rectangles (boxes) of the various aspect ratios centered in regular grid nodes over the feature map. The second one is responsible for predicting the confidence scores for each generated box. The third one is responsible for the fine adjustment of positions for all boxes generated. Outputs of each detection block are collected in the last layer and all predictions undergo smart filtering (non-maxima suppression).

Implemented multiple object detection in a live video feed.

The Tensor Flow Lite neural network was trained on a collection of images of twenty objects. The file containing the model along with the bounding boxes file was used to detect objects in a live video stream. One of the major problems faced was the lag in the video feed caused by passing each incoming frame through the neural network. This caused significant delay in rendering the video on the mobile device. To compensate for this, every N number of frames were passed through the neural network instead of passing every frame through the neural network. The number N thus became a hyperparameter that could be tuned to adjust the delay in video stream vs the delay in object detection.

Implemented single image recognition in a live video feed.

The neural network used for single image recognition was trained on a much larger COCO dataset. The single image recognition was built without a bounding box file which was present in the object detection implementation. The larger dataset allowed the neural network to recognize over 1000 distinct objects. Neural network predictions were displayed in the upper corner of the video feed. Only predictions above a certain confidence score were displayed on the GUI.

Development was done in Android Studios using JAVA. Tensor Flow was used to build the machine learning framework.