He actually handholds the camera and only the subject(s) move. Because the lens rotates within the camera giving a wide field, there is no need to swing the camera. Both locations will be in the field of view, which is larger than an Xpan produces.
Also, because exposure is sequential in vertical strips of film, exposure time remains constant at each point and the image remains sharp despite the rotation of the lens. So handholding is the norm and no tripod is required for sharp images.
Only the subject has to move quickly, faster than the lens rotates! You can try this by having your subject sit in one chair or on a bench, then move over when you say go, right after pressing the shutter. You will have your subject show up twice in the image. Identical twins if the subject doesn't put on a mask when he or she changes position!