The project goal, (partnering with Baseline) was to achieve detection and classification of a small object, specifically a tennis ball in a sideview video footage. A dataset of tagged video was provided for training purposes, the videos were short and from different angles, specifically left and right points of view, different lighting conditions and differently colored courts. The provided dataset was too small for the task of training a deep neural network from scratch, which could have been task specific and efficient. Hence we engaged in transfer learning, using the presupposition that a pre-trained neural net, specifically one that was trained to recognize different types of balls, can be trained to achieve the set goal with a small dataset.
We picked a neural net with a YOLOv5 architecture, since it has low inference time and was trained on COCO dataset that includes sports balls as a class category. The network takes each frame independently and still could achieve real time inference. Training process included 6600 training images (84.6%) and 1200 test images (15.4%) that resulted in a network with mAP @.5 = 0.725 on the test set.