Invited talk: "Multi-People Tracking through Global Optimization"

Given two or more synchronized videos taken at eye level and from different angles, we show that we can effectively detect and track people, even when the only available data comes from the binary output of a simple blob detector and the number of present individuals is a priori unknown.

We start from occupancy probability estimates in a top view and rely on a generative model to yield probability images to be compared with the actual input images. We then refine the estimates so that the probability images match the binary input images as well as possible. Finally, having performed this computation independently at each time step, we use a linear programming technique to accurately follow individuals across thousands of frames. Our algorithm yields metrically accurate trajectories for each one of them in real-time, in spite of very significant occlusions.

In short, we combine a mathematically well-founded generative model that works in each frame individually with a simple approach to global optimization. This yields excellent performance using very simple models that could be further improved.