On Improvement of CNN’s Scale Invariance
|關鍵字:||視覺化搜索;旋積類神經網路;抵抗尺度變化;visual search;convolutional networks;scale variance|
Deep-learning-based convolutional neural network (CNN) has recently been applied widely to various image recognition tasks due to its superior ability to extract higher level features, such as objects or parts, from an image. Its performance however was found to be susceptible to image transformations, including translation, scaling, and rotation. To improve its scale invariance, this thesis takes a three-pronged approach, with aspects addressed including the structure of CNN, the training process, and the testing process. Specifically, inspired by the design of SIFT, we introduce filters of different sizes into the CNN pipeline, hoping to capture more meaningful features that may be variable in size. In the training process, we augment the training data with images of different scales, so that the weight parameters of the CNN can adapt themselves to variable-size features. During the testing stage, we pass multiple replicas of the query image transformed with cropping and/or scaling through the CNN and pool together their outputs for a more accurate prediction. Extensive experiments have been carried out to analyze the benefit from each of these enhancements and their combined effects in terms of accuracy in recognition tasks.
|Appears in Collections:||Thesis|