Towards Accurate and Reliable Energy Measurement of NLP Models