I can't believe supervision for latent variable models is not better: The Case for Prediction Constrained Training