Abstract:
To describe the semantic characteristic of scene images efficiently, this paper proposes a scene image classification framework based on image patch context information. First, the patches of images are got by a regular grid, and their SIFT(scale invariant feature transform) features are extracted. Then the SIFT features of training images are clustered with the
K-means algorithm to form a codebook of the patches. We quantize the patches of images according to this codebook and get the visual word representation of the image, which forms a visual word map. In the map, two kinds of visual word models are set up: one is visual word pair with different words and the other is visual word group that consists of the same and consecutive words. Finally by applying spatial pyramid matching, the context pyramid features of visual words are obtained and classified with SVM. Experiments in frequently used scene image databases show that our method has got better performance than the existing typical methods in classifying scene images.