would you consider adding VMAF (Video Multi-Method Assessment Fusion) to your selection of tests? perhaps as a synthetic or rendering?
powershell would measure the time it takes ffmpeg to perform the task on a video file
for example, I generated colorbars at 4K, exported a lossless x265 600...