Research on data augmentation for people detection in aerial images
|關鍵字:||無人機;深度學習;空拍影像;資料擴增;drone;deep learning;aerial image;data augmentation|
本篇論文以人物為主要偵測對象，結合空拍影像與深度學習，在不用訓練空拍影像的前提下，利用一般影像去做資料擴增(data augmentation)改善偵測結果，由於一般的擴增方法不適用在空拍影像上，所以我們觀察空拍影像跟一般影像的差異後，我們提出三種擴增方法:填充影像邊界、影像旋轉、透視投影變換，使一般影像的人物能達到與空拍影像人物接近， 此外切割空拍影像再去偵測的方法也能改善偵測結果。|
Owing to the advance of technology in the recent years, drones have been widely used. Unlike the traditional surveillance cameras which are usually set in fixed places, people can actively control the locations of drones during photo taking. Due to the good mobility, drones have the potential to be applied in many jobs such as tracking and detection. On the other hand, the research on deep learning technology is also getting more and more popular thanks to the great progress of high performance graphics processing unit (GPU). With good training models and sufficient training data, deep learning has made a breakthrough in solving many problems in a lot of fields. In this thesis, we use deep learning in the people detection problem for aerial images. The state-of-the-art detection model, YOLO, is used. However, the detection result is not good enough because the model is trained with general images which are quite different from aerial images. There are many available datasets, such as ImageNet, Pascal VOC, and MS COCO, that can be used for model training and testing. However, these datasets consist of general images instead of aerial images. The datasets with aerial images are still few nowadays. Due to insufficient aerial images, this thesis aims at using data augmentation technology to make general images look like aerial images so that they can be used for model training to improve the detection results. We first observe the differences between general images and aerial images, and then proposed three augmentation methods: border padding, image rotation, perspective transformation. The results show that the proposed methods work well in improving the performance in terms of recall rates and precision. Besides, the image splitting also improves the results.
|Appears in Collections:||Thesis|