Learning Object-Centric Video Models by Contrasting Sets