Monocular Depth Estimation

To understand football scenes, we aim at estimating depth map from monocular images in a new task named Monocular Depth Estimation (MDE).

Our task.

We introduce a Monocular Depth Estimation (MDE) task focused on football and basketball videos. The objective will be to assign, to each pixel of each frame of a team sports video sequence, a depth value. The different depth maps reconstructed will then be compared to the ground truth obtained from two video games.

Our data.

In total, the dataset encompasses a total of 12,398 frames, split following a 60/20/20 distribution with each game only appearing in one set. For football, there are 7,073 football frames in total, 4,071 for training, 1,423 for testing, and 1,579 for the validation set. For basketball, we provide a total of 5,325 basketball frames, 3,270 for training, 1,064for testing, and 991 for validation.

For the challenge, a new challenge set of data will be extracted similarly as the already existing dataset.

Our Metric.

To evaluate the different methods that will be proposed, 5 different metrics will be used:

Absolute relative error (Abs Rel)
Squared relative error (Sq Rel),
Root-mean-square-error (RMSE)
Root-mean-square error on the logarithm (RMSE log)
A scale invariant metric called SILog.

The evaluation code computes the average metric between the predictions obtained using a method and the ground truths from our dataset.

For more details, check out our development kit on github

Development Kit

Page updated

Google Sites

Report abuse