Age and Gender Classification using Deep Convolutional Neural Networks

In the last few posts we mostly talked about binary image descriptors and the previous post in this line of works described our very own LATCH descriptor [1] and presented an evaluation of various binary and floating point image descriptors. In the current post we will shift our attention to the field of Deep Learning and present our work on Age and Gender classification from face image using Deep Convolutional Neural Networks [2].

Example images from the AdienceFaces benchmark

Example images from the AdienceFaces benchmark

Our method was presented in the following paper:

Gil Levi and Tal Hassner, Age and Gender Classification using Convolutional Neural Networks, IEEE Workshop on Analysis and Modeling of Faces and Gestures (AMFG), at the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, June 2015.

For code, models and examples, please see our project page.

New! Tensor-Flow implementation of our method .

Acknowledgements

The presented work was developed and co-authored with my thesis supervisor, Prof. Tal Hassner.

 

Introduction

Though age and gender classification plays a key role in social interactions, performance of automatic facial age and gender classification systems is far from satisfactory. This is in contrast to the super-human performance in the related task of face recognition reported in recent works [3,4].

Previous approaches for age and gender classification were based on measuring differences and relations between facial dimensions [5] or on hand-crafted facial descriptors[6,7,8]. Most have designed classification schemes tailored specifically for age or gender estimation, for example [9] and others. Few of the past methods have considered challenging in-the-wild images [6] and most did not leverage the recent rise in availability and scale of image datasets in order to improve classification performance.

Motivated by the tremendous progress made in face recognition research by the use of deep learning techniques[10] , we propose a similar approach for age and gender classification. To this end, we train deep convolutional neural networks[11] with a rather simple architecture due to the limited amount of training data available for those tasks.

We test our method on the challenging recently proposed AdienceFaces benchmark[6] and show it to outperform previous methods by a substantial margin. The AdienceFaces benchmarks depicts in-the-wild setting. Example images from this collection are presented in the figure above.

 

Method

Currently, databases of in-the-wild face images which contain age and gender labels are relatively small in size compared to other popular image classification datasets (for example, the Imagenet dataset[12] and the CASIA WebFace dataset [13]). Overfitting is a common problem when training complex learning models on a limited dataset, therefore we take special care in preventing overfitting in our method. This is done by choosing a relatively “modest” architecture, incorporating two drop-out layers and augmenting the images with random crops and flips in the training phase.

 

Network Architecture

The same network architecture is used for both age and gender classification. The proposed network comprises of only three convolutional layers and two fully-connected layers with a small number of neurons. This architecture is relatively shallow, compared to the much larger architectures applied, for example, in [14] and [15]. A schematic illustration of the network is below:

Illustration of our CNN architecture

Illustration of our CNN architecture

The network contains three convolutional layers, each followed by a ReLU operation and a pooling layer. The first two layers also follow an LRN layer [14]. The first Convolutional Layer contains 96 filters of 7×7 pixels, the second Convolutional Layer contains 256 filters of 5×5 pixels, The third and final Convolutional Layer contains 384 filters of 3 × 3 pixels. Finally, two fully-connected layers are added, each containing 512 neurons and each followed by a ReLU operation and a dropout layer.

 

Experiments

We tested our method on the recently proposed AdienceFaces [6] benchmark for age and gender classification.  The AdienceFaces benchmark contains automatically uploaded Flickr images. As the images were automatically uploaded without prior filtering, they depict challenging in-the-wild settings and vary in facial expression, head pose, occlusions, lighting conditions, image quality etc.  Moreover, some of the images are of very low quality or contain extreme motion blur. The figure above (first figure in the post) illustrates example images from the AdienceFaces collection. Below is a breakdown of the dataset into the different age and gender classes.

0-2 4-6 8-13 15-20 25-32 38-43 48-53 60+ Total
Male 745 928 934 734 2308 1294 392 442 8192
Female 682 1234 1360 919 2589 1056 433 427 9411
Both 1427 2162 2294 1653 4897 2350 825 869 19487

     

Results

We experimented with two methods of classification:

  • Center Crop: Feeding the network with the face image cropped to 227 × 227 around the face center.
  • Over-sampling: We extract five 227 × 227 pixel crop regions, four from the corners of the 256 × 256 face image and one from the center of the face along with their horizontal flips. All 10 crops are fed to the network and the final classification is the average of the predictions of the 10 crops.

The tables below summarizes our results compared to previously proposed methods.  We measure mean accuracy + standard variation, 1-off in age classification means the age prediction was either correct or 1-off from the correct age class:

Gender:

Method Accuracy
Best from [6] 77.8 ± 1.3
Best from [16] 79.3 ± 0.0
Proposed using single crop 85.9 ± 1.4
Proposed using over-sampling 86.8 ± 1.4

 

Age:

Method Exact 1-off
Best from [6] 45.1 ± 2.6 79.5 ±1.4
Proposed using single crop 49.5 ± 4.4 84.6 ± 1.7
Proposed using over-sampling 50.7 ± 5.1 84.7 ± 2.2

 

Evidently, the proposed network, though it’s simplicity, outperforms previous methods by a substantial margin. We further present misclassification results for our method, both for age and gender classification.
Gender misclassifications: Top row: Female subjects mistakenly classified as males. Bottom row: Male subjects mistakenly classified as females:

Gender misclassifications

Gender misclassifications

Age misclassifications: Top row: Older subjects mistakenly classified as younger. Bottom row: Younger subjects mistakenly classified as older.

Age misclassifications

Age misclassifications

As can be seen from the misclassification examples, most mistakes are due to blur, low image resolution or occlusions. Furthermore, in gender, most of the misclassifications are in babies or in young children where facial gender attributes are not clearly visible.

 

Microsoft how-old.net tool

A few months ago, there was a bit hype about Microsoft’s new how-old.net webpage that allow users to upload their images and then it tries to automatically determined their age and gender.

We thought it would be interesting to try and compare MS’s methods with ours and measure their accuracy. To this end, we automatically uploaded all of the AdienceFaces images to the how-old.net page and listed the results. We only got their age estimation result and only in case where MS’s page managed to detect a face in the image (if it the image was too hard for face detection, it would probably fail completely on the much more challenging task of age classification).

MS’s how-old.net site reached an accuracy of about 40%. As listed in the tables above, our network reached 50.7% with over-sampling and 49.5% using single-crop. Below are some examples of images which the MS tool misclassified, but our method classified correctly.  

 

 

Conclusion

We have presented a novel method for age and gender classification in the wild based on deep convolutional neural networks. Taking into account the relatively small amount of training data, we devised a relatively shallow network and took special care to avoid over-fitting (using data augmentation and dropout layers).

We measured our performance on the AdienceFaces benchmark[6] and showed that the proposed approach outperforms previous methods by a large margin. Moreover, we compared our method against Microsoft’s how-old.net webpage.

For paper, code and more details, please see our project page

 

 

References

[1] Gil Levi and Tal Hassner, LATCH: Learned Arrangements of Three Patch Codes, IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Placid, NY, USA, March, 2016

[2] Gil Levi and Tal Hassner, Age and Gender Classification using Convolutional Neural Networks, IEEE Workshop on Analysis and Modeling of Faces and Gestures (AMFG), at the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, June 2015.

[3] Sun, Yi, Xiaogang Wang, and Xiaoou Tang. “Deep learning face representation from predicting 10,000 classes.” Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on. IEEE, 2014.‏

[4] Schroff, Florian, Dmitry Kalenichenko, and James Philbin. “Facenet: A unified embedding for face recognition and clustering.” arXiv preprint arXiv:1503.03832 (2015).‏

[5] Kwon, Young Ho, and Niels Da Vitoria Lobo. “Age classification from facial images.” Computer Vision and Pattern Recognition, 1994. Proceedings CVPR’94., 1994 IEEE Computer Society Conference on. IEEE, 1994.‏

[6] Eidinger, Eran, Roee Enbar, and Tal Hassner. “Age and gender estimation of unfiltered faces.” Information Forensics and Security, IEEE Transactions on 9.12 (2014): 2170-2179.‏

[7] Gao, Feng, and Haizhou Ai. “Face age classification on consumer images with gabor feature and fuzzy lda method.” Advances in biometrics. Springer Berlin Heidelberg, 2009. 132-141.‏

[8] Liu, Chengjun, and Harry Wechsler. “Gabor feature based classification using the enhanced fisher linear discriminant model for face recognition.” Image processing, IEEE Transactions on 11.4 (2002): 467-476.‏

[9] Chao, Wei-Lun, Jun-Zuo Liu, and Jian-Jiun Ding. “Facial age estimation based on label-sensitive learning and age-oriented regression.” Pattern Recognition 46.3 (2013): 628-641.‏

[10] Taigman, Yaniv, et al. “Deepface: Closing the gap to human-level performance in face verification.” Computer Vision and Pattern Recognition (CVPR), 2014 IEEE Conference on. IEEE, 2014.‏

[11] LeCun, Yann, et al. “Backpropagation applied to handwritten zip code recognition.” Neural computation 1.4 (1989): 541-551.‏

[12] Russakovsky, Olga, et al. “Imagenet large scale visual recognition challenge.” International Journal of Computer Vision (2014): 1-42.‏

[13] Yi, Dong, et al. “Learning face representation from scratch.” arXiv preprint arXiv:1411.7923 (2014).‏

[14] Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. “Imagenet classification with deep convolutional neural networks.” Advances in neural information processing systems. 2012.‏

[15] Chatfield, Ken, et al. “Return of the devil in the details: Delving deep into convolutional nets.” arXiv preprint arXiv:1405.3531 (2014).‏
[16] Hassner, Tal, et al. “Effective face frontalization in unconstrained images.” arXiv preprint arXiv:1411.7964 (2014).‏

17 thoughts on “Age and Gender Classification using Deep Convolutional Neural Networks

  1. rogernazir

    Hey i am working on my Final year project and trying to make a application which can tell the Gender,Age,Mood by Face. I trained Haar Cascade for Gender Classification and giving me 69% result on it by image.Now I want to implement on Neural Network(Deep Learning) using Caffe find Gender,Age,Mood can you guide me how i can train my Face Data or about implementation or so this is the best approach for implementing this problem.

    Reply
    1. gillevicv Post author

      Hi,

      Thank you for your interest in my blog.

      Answering what is the “best” approach would be difficult and it depends on the size of the dataset, it’s variability and the amount of effort you are willing to spend.

      Having said that, you might want to take a look at the repository I created which contains the scripts used for training our age and gender models:

      https://github.com/GilLevi/AgeGenderDeepLearning

      I hope it can help you get started (probably after reading some Caffe tutorial).

      Best,
      Gil

      Reply
  2. Valery

    Hi, I am trying to use your trained model on face images that I collected. How to modify your python code in order to process images with several faces?

    Reply
    1. gillevicv Post author

      Hi Valery,

      Thank you for your interest in my project.

      You would need to run a face detection algorithm on the images and feed every detected face to the network.

      Best,
      Gil

      Reply
      1. Valery Golender

        Thanks, it is what I am doing. Will be happy to report you our results if you are interested.

  3. sid5000

    Any suggestion on making age/gender classification run with opencv’s new dnn module (contrib) / caffe wrapper ?

    Reply
  4. Ashim

    I was trying to train your model using more data. But I am not able to extract faces from the images. x, y, dx, dy in fold_*_data.txt files doesn’t actually give face bounding box? Can you please help me understand what is x, y, dx and dy in these files

    Reply
  5. Pingback: Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns | Gil's CV blog

  6. jmargieh

    Hi Gil,
    I trained the model using a subset of the training data.
    I also ran the eval.py in order to evaluate the model on the created run-id.

    can you please tell what does precision @1 / precision @2 mean?
    And where the gender_test.txt test data is being used.

    Thanks in advance,
    Jawad

    Reply
  7. Boriana Milenova

    Hi Gil,
    Very interesting work! I saw the models available at the Caffe Zoo. Is there a license associated with them? If not would you consider releasing them under Creative Common or MIT license?
    Thanks in advance,
    Boriana

    Reply

Leave a reply to gillevicv Cancel reply