Using PRMSE to evaluate automated scoring systems in the presence of label noise