This is our fifth post in the series about binary descriptors and here we will talk about the FREAK descriptor. This is the last descriptor that we’ll talk about as the next and final post in the series will give a performance evaluation of the different binary descriptors. Just a remainder – we had an introduction to binary descriptors and posts about BRIEF, ORB and BRISK.
As you may recall from the previous posts, a binary descriptor is composed out of three parts:
- A sampling pattern: where to sample points in the region around the descriptor.
- Orientation compensation: some mechanism to measure the orientation of the keypoint and rotate it to compensate for rotation changes.
- Sampling pairs: which pairs to compare when building the final descriptor.
Recall that to build the binary string representing a region around a keypoint we need to go over all the pairs and for each pair (p1, p2) – if the intensity at point p1 is greater than the intensity at point p2, we write 1 in the binary string and 0 otherwise.
In a nutshell, FREAK is similar to BRISK by having a handcrafter sampling pattern and also similar to ORB by using machine learning techniques to learn the optimal set of sampling pairs. FREAK also has an orientation mechanism that is similar to that of BRISK.
Retinal sampling pattern
Many sampling patterns are possible to compare pixel intensities. As we’ve seen in the previous posts, BRIEF uses random pairs, ORB uses learned pairs and BRISK uses a circular pattern where points are equally spaced on circles concentric, similar to DAISY.
FREAK suggests to use the retinal sampling grid which is also circular with the difference of having higher density of points near the center. The density of points drops exponentially as can be seen in the following figure:
Each sampling point is smoothed with a Gaussian kernel where the radius of the circle illustrates the size of the standard deviation of the kernel.
As can be seen in the following figure, the suggested sampling grid corresponds with the distribution of receptive fields over the retina:
Learning the sampling pairs
With few dozen sampling points, thousands of sampling pairs can be considered. However, many of the pairs might not be useful efficiently describe a patch. A possible strategy can be to follow BRISK’s approach and select pairs according to their spatial distance. However, the selected pairs can be highly correlated and not discriminant. Consequently, FREAKS follows ORB’s approach and tries to learn the pairs by maximizing variance of the pairs and taking pairs that are not correlated. ORB’s approach was explained in length in our post about ORB, so we won’t explain it again here.
Interestingly, there is a structure in the resulting pairs – a coarse-to-fine approach which matches our understanding of the model of the human retina. The first pairs that are selected mainly compare sampling points in the outer rings of the pattern where the last pairs compare mainly points in the inner rings of the pattern. This is similar to the way the human eye operates, as it first use the perifoveal receptive fields to estimate the location of an object of interest. Then, the validation is performed with the more densely distributed receptive fields in the fovea area.
The sampling pairs are illustrated in the following figure, where each figure contains 128 pairs (from left to right, top to bottom):
FREAKS takes advantage of this coarse-to-fine structure to further speed up the matching using a cascade approach: when matching two descriptors, we first compare only the first 128 bits. If the distance is smaller than a threshold, we further continue the comparison to the next 128 bits. As a result, a cascade of comparisons is performed accelerating even further the matching as more than 90% of the candidates are discarded with the first 128 bits of the descriptor. The following figure illustrates the cascade approach:
To somewhat compensate for rotation changes, FREAK measures the orientation of the keypoint and rotates the sampling pairs my measure angle. FREAK’s mechanism for measuring the orientation is similar to that of BRISK only that instead of using long distance pairs, FREAK uses a predefined set of 45 symmetric sampling pairs:
For more details on orientation assignment, you can refer to the post on BRISK.
Though we’ll have a whole post that will compare the performance of the binary descriptors, we should say a few words now.
- When there are changes due to blur, FREAK performs worse of all of the all the binary descriptors.
- When there are view-point changes, FREAK slightly outperforms ORB and BRISK, but performs worse than BRIEF.
- When there are zoom + rotation changes, FREAK slightly outperforms BRIEF and BRISK, but performs worse than ORB.
- When illumination changes or JPEG compression is present, FREAK slightly outperforms BRISK and ORB, but performs much worse than BRIEF.
The next post will be our last in the series about binary descriptors, which will give a performance evaluation of binary descriptors.
 Tola, Engin, Vincent Lepetit, and Pascal Fua. “A fast local descriptor for dense matching.” Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008.
 Leutenegger, Stefan, Margarita Chli, and Roland Y. Siegwart. “BRISK: Binary robust invariant scalable keypoints.” Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011.
 Rublee, Ethan, et al. “ORB: an efficient alternative to SIFT or SURF.” Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011.
 Alahi, Alexandre, Raphael Ortiz, and Pierre Vandergheynst. “Freak: Fast retina keypoint.” Computer Vision and Pattern Recognition (CVPR), 2012 IEEE Conference on. IEEE, 2012.
 Calonder, Michael, et al. “Brief: Binary robust independent elementary features.” Computer Vision–ECCV 2010. Springer Berlin Heidelberg, 2010. 778-792.