To match the lip movements with speech, they designed a "learning pipeline" to collect visual data from lip movements. An AI model uses this data for training, then generates reference points for ...