Sources: this paper,

Mission: Multiple Object Tracking (MOT)

According to [1], only detections from the previous and the current frame are presented to the tracker. Also, appearance features beyond the detection component are ignored in tracking and only the bounding box position and size are used for both motion estimation and data association. Furthermore, issues regarding short-term and long-term occlusion are also ignored, as they occur very rarely and their explicit treatment intro-duces undesirable complexity into the tracking framework.

They argue that incorporating complexity in the form of object re-identification adds significant overhead into the tracking framework – potentially limiting its use in realtime applications.

Mentioned methods

Convolutional Neural Network (CNN) based detector,

Kalman Filter [14] and Hungarian method[15] are employed to handle the motion prediction and data association components of the tracking problem respectively.