In many real-world problems we have to make predictions from feature vectors with missing values. However, we may also be able to observe some of the missing values in the feature vector at a cost. Given the currently observed values, how can we decide which missing values to observe next so that prediction accuracy increases as fast as possible as a function of the observation cost? This problem appears in many different application areas, including medical diagnosis, surveys, recommender systems, insurance, etc. In this talk, I will describe how to solve the problem using an information theoretic approach and novel variational autoencoder models that can effectively deal with missing data.