Towards Multimodal Simultaneous Neural Machine Translation