Few-shot learning for image classification and object detection

Zimudzi, Edward

Few-shot learning for image classification and object detection

Zimudzi, Edward

URI: https://hdl.handle.net/10500/29677

Date: 2021-11

Type: Thesis

Abstract:

Deep learning has successfully been applied in computer vision, including in image classification, object recognition and detection, and in image segmentation in applications such as remote sensing, scene understanding, autonomous driving, medical image analysis, robotics and video surveillance. The drawback of the majority of current approaches is that they demand huge quantities of annotated training data to produce results, and they use quite expensive computing resources. Data annotation is usually an expensive and tedious task. On the other hand, data can be rare or difficult to gather for some reasons, including for safety and ethical issues. Moreover, a deep learning model trained successfully for a specific task cannot be directly deployed for another task in another domain. It is therefore essential to develop models that can learn from few annotated samples of training data like humans do. Few-shot learning addresses the problem of closing the gap into deep learning models that learn from huge annotated datasets and humans in the challenging task of learning from few examples. The aim of this thesis is to propose novel methods in deep learning image processing that optimize the model’s ability to detect and recognise new object instances using few labelled data. We present several novel methods that tackle the problems of image classification, object detection, self-supervised knowledge distillation, and panoptic segmentation in fewshot learning settings. Even though multiple computer vision themes can be identified throughout this work, the most important is the limited data regime taken into account. We consider the few-shot learning setting where tasks associated with their support and query test data are received and trained in episodes. We introduce a novel few-shot metalearning classification model that consists of multiple learners supervised by a central controller to control a feature extraction and meta-learning for integrated inference and generalisation. Secondly, we introduce an approach for few-shot object detection that meta-learns object localisation and classification by eliminating region-wise prediction, and encoding support images and query images simultaneously into class-specific feature representations that automatically enters into a class-agnostic decoder to generate output predictions for the categories known beforehand. We also introduce a fully convolutional model for panoptic segmentation in few-shot settings that encodes each instance into a specific kernel and generates a prediction by convolutions directly, thereby predicting both instance objects and background stuff together. In this way, instance-aware and semantically consistent properties for object instances and their background can be respectively satisfied in a unified workflow. Finally, we introduce a two-stage knowledge distillation model that maximises the entropy of the feature embeddings of images using a self-supervised auxiliary loss. Experiments on some public few-shot learning benchmark datasets such as miniImageNet, Omniglot, COCO-20i and Mapillary Vistas demonstrate the effectiveness of the proposed methods for few-shot learning in computer vision.

Le ndaba ende yethula futhi ithuthukise izifanekiso zamanoveli amaningana abhekana nezinkinga zokubono zekhompyutha zokuhlukaniswa kwemifanekiso, ukutholwa kwezinto, ukuhluzwa kolwazi oluzigadile kanye nendlela yokuhlukanisa umfanekiso osetshenziselwa imisebenzi yokubona yekhompyutha eqoqweni lendlela yokufunda yomshini. Indlela yokufunda yomshini ihlose ukuvala igebe phakathi kwezifanekiso zokufunda ezijulile ezifunda eqoqweni elikhulu lemininingwane yolwazi ehlobene, ezinezichasiselo nakubantu emsebenzini oyinselele wokufunda ezibonelweni ezimbalwa ezinezichasiselo. Ngisho noma izindikimba eziningi zokubono zekhompyutha zingabonakala kuwo wonke lo msebenzi, okubaluleke kakhulu uhlelo lwemininingwane olulinganiselwe olucatshangelwayo. Sicabangela iqoqo lendlela yokufunda yomshini lapho imisebenzi yokubona yekhompuyutha inemininingwane elinganiselwe ehlotshaniswa nokusekelwa kwayo kanye neminingwane yokuhlola imibuzo iyatholwa futhi iqeqeshwe ngeziqephu. Okokuqala sethula isifanekiso sokuhlukaniswa kokufunda ukufunda kwenoveli okumbalwa okuhlanganisa abafundi abaningi abagadwe yisilawuli esimaphakathi ukuze kulawulwe ukukhishwa kwesici kanye nokufunda ukufunda ukuze kufinyelelwe ekucabangeni okudidiyelwe kanye nokujwayelekile, Okwesibili, sethula indlela yokuthola izinto zendlela yokufunda yomshini ethola futhi ebona izimo zento entsha ngokufunda ukufunda into ibe yasendaweni kanye nokuhlukaniswa ngendlela ebumbene, ngokususa ukuqagela okuhlakaniphile kwesifunda kanye nokubhala ngekhodi kokubili imifanekiso esekelayo nemibuzo yemifanekiso kube isigaba esithize. sezici ezibese zingena kudivayisi ejwayelekile yesigaba ukuze sikhiqize izibikezelo zezigaba ezithile. Siphinde sethula isifanekiso somphumela wokuhlunga inhloso evamile yemifanekiso ngokugcwele ngendlela yokuhlukanisa umfanekiso osetshenziselwa imisebenzi yokubona yekhompyutha eqoqweni lendlela yokufunda yomshini elihlanganisa isenzakalo ngasinye sibe uhlamvu oluthile futhi sikhiqize ukubikezela ngokuhlunga inhloso evamile yemifanekiso ngokuqondile, ngaleyo ndlela ibikezele kokubili izinto eziyisibonelo nezinto zasemuva ndawonye. Ngale ndlela, izakhiwo eziqaphelayo nezingaguquguquki ngokwezibalo zezenzakalo zento kanye nengemuva lazo zinganeliswa ngokulandelana kwazo ekuhambeni komsebenzi okuhlangene. Ekugcineni, sethula isifanekiso sezigaba ezimbili zolwazi oluhluziwe esenza isimo sokuphazamiseka sibe sikhulu sesici esishumekiwe semifanekiso kusetshenziswa ukulahlekelwa komsizi ozigadile. Izivivinyo kwamanye amaqoqo emininingwane endinganiso yendlela yokufunda yomshini ezinjengeminiImageNet, i-Omniglot, i-CIFAR-FS ne-Oxford Flowers102 yokuhlukaniswa kwemifanekiso, i-Pascal 5i ne- COCO-20i yokuthola into, kanye ne-Mapillary Vistas yokuhlukaniswa kwendlela yomfanekiso osetshenziselwa imisebenzi yokubona yekhompyutha ibonisa imingcele yokusebenza kanye nempumelelo yezindlela ezihlongozwayo zendlela yokufunda yomshini. Le ndaba ende ihlose ukuvala igebe phakathi kokufunda okujulile okujwayelekile nokufunda komuntu ngokudala izinhlelo zokubona zekhompyutha ezifunda ezibonelweni ezimbalwa zemininingwane yemifanekiso.

Thesese ye e laetsa le go somisa mekgwa e meswa e mmalwa yeo e somanago le mathata a pono ya khomphutha a tlhopho ya diswantsho, temogo ya dilo, phetiso ya tsebo ya boitekolo le karogantsho ya dilo ka maemong a few-shot learning. Maikemisetso a few-shot learning ke go tswalela sekgoba magareng ga mehuta ya go tsenelela ya go ithuta yeo e hwetsago tsebo go tswa ditlhalosong tse di filwego tsa dihlopha tsa tshedimoso le batho ka mosomo o boima wa go ithuta ka mehlala ye mmalwa ye e hlalositswego. Le ge dikgwekgwe tsa pono ya dikhomphutha tse ntsi di ka bonwa mosomong wo ka moka, se bohlokwa kudu ke mokgwa wo o lekaneditswego wa datha wo o etswego hloko. Re ela hloko maemo a few-shot learning moo mesongwana ya pono ya khomphutha ya go ba le datha ya thekgo ya yona le datha ya teko ya dipotsiso di amogetswe gape ka ditiragalo. Re thoma ka go tsebagatsa mmotlolo wa tlhopho ya few-shot learning le metalearning tseo di nago le baithuti ba bantsi bao ba hlokomelwago ke molaodi wa bogare go laola go tloswa ga dilo le meta-learning sephetho se se kopantswego le kakaretso. Sa bobedi, re tsebisa mokgwa wa temogo ya dilo wa few-shot wo o lemogang le go amogela mehlala ya dilo tse diswa ka meta-learning le tlhopho ka mokgwa wa botee, ka go tlosa kakanyo ya kgakanego le go fetolela diswantsho tseo di thekgang le diswantsho tsa potsiso go dibopego tsa karolo ye e ikgethileng go karolo ya tlhathollo ya go se kgodise go tsweletsa dikakanyo tsa dikarolo tse itsego. Re tsebisitse gape mokgwa wo o feletsego wa go latela dilo tsa go raragana ka go somisa few-shot learning go romela dilo lifelong le itsego gomme tsa tsweletsa kakanyo ya dilo thwii, gomme ya fa kakanyo ya ditiragalo tsa dilo tse pedi mmogo. Ka tsela ye, mehlala ya mokgwa wo le dilo tseo di latelanago tsa semanthiki tsa mesomo ya dilo le botso bja tsona di ka latelana gabotse go mesomo ye e kopantswego. Mafelelong re tsebisa mmotlolo wa phetiso ya tsebo wa dikgato tse pedi woo o kaonafatsago maemo a go raragana ga dilo tse di lokelwago diswantshong ka go somisa tahlegelo ya thekgo ya boihlokomelo. Boitekelo godimo ga dihlopha tsa datha tse dingwe tsa bohle tsa tekanetso ya few-shot learning bjale ka miniImageNet, Omniglot, CIFAR-FS le Oxford Flowers102, Pascal 5i le COCO-20i ya temogo ya dilo, le Mapillary Vistas ya karogantsho ya dilo go laetsa magomo a tshepediso le go soma gabotse ga mekgwa ye e sisintswego ya few-shot learning. Maikemisetso a thesese ye ke go tswalela sekgoba magareng ga thuto ye e tseneletsego ya tlwaelo le go ithuta ga batho ka go hlama mananeo a pono a khomphutha ao a ithutago go tswa mehlaleng ye mmalwa ya datha ya diswantsho.

Show full item record

Files in this item

Name: thesis_zimudzi_e.pdf

Size: 21.70Mb

Format: PDF

Description: Thesis

View/Open

Copyright Statement

Items in UNISA Institutional Repository are protected by copyright, with all rights reserved, unless otherwise indicated. Items may only be viewed and downloaded for private research and study purposes. Please acknowledge publications according to acceptable standards and norms.

This item appears in the following Collection(s)

Theses and Dissertations (School of Computing) [254]
Unisa ETD [12835]
Electronic versions of theses and dissertations submitted to Unisa since 2003

Few-shot learning for image classification and object detection

Few-shot learning for image classification and object detection

Abstract:

Files in this item

Copyright Statement

This item appears in the following Collection(s)

Search UnisaIR

Browse

All of UnisaIR

This Collection

My Account

Statistics