Blog

Facelyzr Deep Learning Project

Back 09/07/17
B.

Introduction

    Deep learning is a phrase that follows us everywhere. Even the non-technical people are aware how artificial intelligence can change our world. I totally agree with Andrew Ng that “AI is the new electricity” and that it can make our world better. Especially that part where evil robots kill us all. Just kidding, but for those “robots” we must start from somewhere. If a robot wants to kill humans, they must have eyes to see humans. And what if a robot wants to kill men, and doesn’t wanna hurt women? In that case, our robot must recognize whether the human is male or female. Next, should our robot spare young or old people? It must know who is old and who is young, etc. Enough of jokes, let’s start with this.

    In this blog we will present our deep learning project that can locate facial attributes on an image and predict age and gender (exactly what our robot needs). The name of this project is Facelyzr, just like in my last blog where we presented Twitalyzr. I guess we are into some weird …”lyzr” stuff… What is next, a foodlyzr? Who knows…

   The idea behind Facelyzr is to take a picture with a face as input, and giving information about that face as output. For example:

Facelyzr example

One Approach

    For educational and research purposes, we used two data sets CelebA and IMDB. This project was a great opportunity for me to learn TensorFlow and to dig deeper into deep learning. One solution for this task was to use an already pre-trained model for similar tasks and just train classifiers for our task. But, I wanted to make something on my own from scratch using the idea of unsupervised pre-training. I decided to combine both unsupervised and supervised learning. I trained stacked denoising the convolutional autoencoder on images of faces, and then I attached classifiers on top of it.

Convolution Stacked Denoising Autoencoder

    With autoencoder, our network can learn what is important by itself. It is like, please don’t tell me how to do my job, just give me those images, and I’ll see what can I do. So what autoencoder does is taking an image (or any other input data), finding some internal structure and using this internal structure to reconstruct the original input.

    The building block of SDAE is autoencoder. Each layer has its own autoencoder, and during the training time we train layer after layer without propagating throughout the entire network. Basically every layer will learn to reconstruct its input. To make our network more robust, we add noise to each layer input, so our network can learn to reconstruct uncorrupted input. The first convolution layer can learn to detect edges, colors etc, while the following convolution layers will use that feature to combine them into something more complex such as some patterns, textures, etc. Deeper layers can learn to detect noses, eyes, ears, depending on what data neural network is trained for. One layer of stacked denoising autoencoder is presented in the image below.

denoising autoencoder

    After training the autoencoder, we can throw away our reconstruction part, because we don’t need it for making predictions. We only care about the first part of the network, that is the encoder. Now, the role of the encoder is feature extractor, since it creates features for every input image. On the basis of those features, we can train any classifier, regressor that will learn to estimate age for example, or gender, or both.

facelyzr architecture

Results

    Our pre-trained model worked very well on gender prediction. We achieved 98% accuracy on test set. We also trained our model to estimate age of person and that model achieved root mean square error of 8 on test set. It still need some tuning... Given these results for gender recognition, our robot killer would “accidentally” kill women in some cases, sorry ladies...

bender

Summary

    In this blog I presented the idea of a project named Facelyzr, still under development, that uses both unsupervised and supervised deep learning to make predictions of facial features on an image. This model assumes that there is a face centered on the image. I plan to extend this model with face detection and face localization on images, without using fancy libraries and already pre-trained state of the art models. After all, I plan to open source all my work on this project, so stay tuned. Many thanks to Andrew Ng and Andrej Karpathy. They are really great instructors, and because of them, I stepped into the awesome world of neural networks.

Stanko Kuveljic

Data Scientist

Software engineer fallen to the dark side of Data Science with special interest in machine learning, deep learning and big data techonologies. Always seeking to grow my knowledge and explore new approaches to problem solving