Yes it's perfectly possible to do it, just look at all the scarily clever stuff that's being done with 'deep fake' technology. But whether it's worth the time and money for the production company to do is another matter.
This photo was on the Daily Mails article about the programme*, I don't know if its just a publicity still or actually appeared in the programme.
It does look like there was someone else there just by the way they're positioned. The repeating patterns of the floor and the hedge make it fairly easy to photoshop someone out. A lot more difficult with moving images of course
*don't bother, it's several hundred words to say 'I don't know'
Most recent visual effects post production software includes Content Aware Fill tools to allow you to remove elements from moving sequences these days - some now use AI techniques. (I think After Effects added it a year or two ago - and other products have had it for a while). It may need a bit of assistance with some reference frames - but it's entirely do-able.
https://www.youtube.com/watch?v=4NSVDbuwpyQ
That is true – but as anybody who has used it will attest, it's far from perfect. It's amazing technology, but works much better with stationary shots and smaller details. Removing elements from video can be tricky because not only do you have to match the surrounding footage on a frame, you also have to match it through time, to allow for imperceptible variations in light, temperature and so on. The videos from Adobe et al show it at its best and, like I say, it's amazing technology but not always the right solution. It can be much easier to go for a 'manual' solution like the cropping mentioned above.
It can also be incredibly resource intensive. Certainly on the example of the walking Masterchef shot, only a few seconds long, it would be very doable – but for longer shots most editors would explore what other options are available before tying up processing time.
I appreciate that the point you were making was different, but for the benefit of other members who are less familiar with video editing technology I thought it was worth pointing out that content-aware fill is much harder in motion than for stills.