Lately, I’ve been reading a lot about BOW (Bag of Words) models [1] and I thought it would be nice to write a short post on the subject. The post is based on the slides from Li Fei-Fei taken from ICCV 2005 course about object detection:
As the name implies, the concept of BOW is actually taken from text analysis. The idea is to represent a document as a “bag” of important keywords, without ordering of the words (that’s why it’s a called “bag of words”, instead of a list for example).
In computer vision, the idea is similar. We represent an object as a bag of visual words – patches that described by a certain descriptor: