Mastering Data Splits for Accurate Model Evaluation