Tag Archives: Computer Vision

Title to image search for improved thumbnail selection

Introduction

One of the key creative aspects of an advertisement is choosing the image that will appear alongside the advertisement text. The advertisers aim is to select an image that will draw the attention of the users and will get them to click on it, while remaining relevant to the advertisement text (naturally, an image of a cute puppy next to an advertisement about insurance doesn’t make much sense) . Below are some examples of advertisements appearing in Taboola’s “Promoted Links” box. Notice that each advertisement contains both a title and an appealing image.

Examples of advertisements placed in Taboola's "Promoted Links" box. Notice that each advertisement contains both a title and an appealing image.

Examples of advertisements placed in Taboola’s “Promoted Links” box. Notice that each advertisement contains both a title and an appealing image.

Given an advertisement title (for example “15 healthy dishes you must try”), the advertiser has endless possibilities of choosing the image thumbnail to accompany it, clearly some more clickable than others. One can apply best practices in choosing the thumbnail, but manually searching for the best image (out of possibly thousands that fit a given title) is time consuming and impractical. Moreover, there is no clear way of quantifying how much a given image is related to a title and more importantly – how clickable the image is, compared to other options.

To Alleviate this problem, we developed a text to image search algorithm that given a proposed title, scans an image gallery to find the most suitable images and estimates their expected click through rate (a common marketing metric depicting the amount of user clicks per a fixed number of advertisement displays).

For example, here are the images returned by our algorithm for the query title “15 healthy dishes you must try” along with their predicted Click Through Ratio (CTR):

Examples of images returned by our algorithm for the title query "15 healthy dishes you must try” along with their predicted Click Through Ratio (CTR). Notice that the images nicely fit the semantics of title.

Examples of images returned by our algorithm for the title query “15 healthy dishes you must try” along with their predicted Click Through Ratio (CTR). Notice that the images nicely fit the semantics of title.

Continue reading

Advertisement

Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns

In the last post we talked about age and gender classification from face images using deep convolutional neural networks. In this post we will show a similar approach for emotion recognition from face images that also makes use of a novel image representation based on mapping Local Binary Patterns to a 3D space suitable for finetuning Deep Convolutional Neural Networks [8]:

teareser_c

Local Binary Patterns (LBP) Mapping

Our method was presented in the following paper:

Gil Levi and Tal Hassner, Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns, Proc. ACM International Conference on Multimodal Interaction (ICMI), Seattle, Nov. 2015

For code, models and examples, please see our project page.

Continue reading

Deep Learning 101 talk at DevCon 2016

At the recent DevCon conference I had the pleasure of giving an introductory talk to Deep Learning. A short theoretical overview is given following a technical deep dive on how to train deep networks with a few demos, practical examples and tips.

 

The notebook used in the demo is available here and the various deep networks and definition files used to run the demo are available here.

Age and Gender Classification using Deep Convolutional Neural Networks

In the last few posts we mostly talked about binary image descriptors and the previous post in this line of works described our very own LATCH descriptor [1] and presented an evaluation of various binary and floating point image descriptors. In the current post we will shift our attention to the field of Deep Learning and present our work on Age and Gender classification from face image using Deep Convolutional Neural Networks [2].

Example images from the AdienceFaces benchmark

Example images from the AdienceFaces benchmark

Our method was presented in the following paper:

Gil Levi and Tal Hassner, Age and Gender Classification using Convolutional Neural Networks, IEEE Workshop on Analysis and Modeling of Faces and Gestures (AMFG), at the IEEE Conf. on Computer Vision and Pattern Recognition (CVPR), Boston, June 2015.

For code, models and examples, please see our project page.

New! Tensor-Flow implementation of our method .

Continue reading

Adding rotation invariance to the BRIEF descriptor

In this post I will explain how to add a simple rotation invariance mechanism to the BRIEF[1] descriptor, I will present evaluation results showing the rotation invariant BRIEF significantly outperforms regular BRIEF where visual geometric changes are present and finally I will post a C++ implementation integrated into OpenCV3.

Just as a reminder, we had a general post on local image descriptors, an introductory post to binary descriptors and a post presenting the BRIEF descriptor. We also had posts on other binary descriptors:  ORB[2], BRISK[3] and FREAK[4].

We’ll start by a visual example, displaying the correct matches between a pair of images of the same scene, taken from different angles – once with the original version of BRIEF (first image pair) and one with the proposed rotation invariant version of BRIEF (second image pair):

Correct matches when using the BRIEF descriptor

Correct matches when using the BRIEF descriptor

Correct matches when using the rotation invariant BRIEF descriptor

Correct matches when using the rotation invariant BRIEF descriptor

It can be seen that there are much more correct matches when using the proposed rotation invariant of the BRIEF descriptor.

Continue reading

An Easy and Practical Guide to 3D Reconstruction

I recently came across a simple and easy package that can be used to create 3D reconstruction of objects. I wanted to share it and give an easy and practical explanation on how one can create visually appealing 3D models by running a few simple commands, no coding needed. I must emphasize that for keeping it simple, this post will not focus on theory as did the last few posts on binary descriptors, but instead will give an easy and practical guide to 3D reconstruction. Just to give you a taste of what can be done with the package, here’s an example of a 3D reconstruction I made (yes, that me in there):

Here you can see an original image vs. a screenshot of the 3D model:

3D reconstruction example

3D reconstruction example

Continue reading

A tutorial on binary descriptors – part 5 – The FREAK descriptor

This is our fifth post in the series about binary descriptors and here we will talk about the FREAK[4] descriptor. This is the last descriptor that we’ll talk about as the next and final post in the series will give a performance evaluation of the different binary descriptors. Just a remainder – we had an introduction to binary descriptors and posts about BRIEF[5], ORB[3] and BRISK[2].

FREAK descriptor

FREAK descriptor

Continue reading

A tutorial on binary descriptors – part 4 – The BRISK descriptor

This fourth post in our series about binary descriptors that will talk about the BRISK descriptor [1]. We had an introduction to patch descriptors, an introduction to binary descriptors and posts about the BRIEF [2] and the ORB [3] descriptors.

We’ll start by showing the following figure that shows an example of using BRISK to match between real world images with viewpoint change. Green lines are valid matches, red circles are detected keypoints.

BRISK descriptor - example of matching points using BRISK

BRISK descriptor – example of matching points using BRISK

Continue reading

Bag of Words Models for visual categorization

Lately, I’ve been reading a lot about BOW (Bag of Words) models [1] and I thought it would be nice to write a short post on the subject. The post is based on the slides from Li Fei-Fei taken from ICCV 2005 course about object detection:

As the name implies, the concept of BOW is actually taken from text analysis. The idea is to represent a document as a “bag” of important keywords, without ordering of the words (that’s why it’s a called “bag of words”, instead of a list for example).

Illustration of Bag of words model in documents

Illustration of Bag of words model in documents

In computer vision, the idea is similar. We represent an object as a bag of visual words – patches that described by a certain descriptor:

Illustration of Bag of words model in images

Illustration of Bag of words model in images

Continue reading

A Short introduction to descriptors

Since the next few posts will talk about binary descriptors, I thought it would be a good idea to post a short introduction to the subject of patch descriptors. The following post will talk about the motivation to patch descriptors, the common usage and highlight the Histogram of Oriented Gradients (HOG) based descriptors.

I think the best way to start is to consider one application of patch descriptors and to explain the common pipeline in their usage. Consider, for example, the application of image alignment: we would like to align two images of the same scene taken at slightly different viewpoints. One way of doing so is by applying the following steps:

  1. Compute distinctive keypoints in both images (for example, corners).

  2. Compare the keypoints between the two images to find matches.

  3. Use the matches to find a general mapping between the images (for example, a homography).

  4. Apply the mapping on the first image to align it to the second image.

Using descriptors to compare patches

Using descriptors to compare patches

Continue reading