CondGANCameraman : Teaching a CondGAN about Camera Transformations and Variable Length Sequences using Synthesized Training Data.

Class Advisor Date Language Ta'ed Code
None (pre-PhD Research Project) Karen Liu Spring 2017 (Personal Project) C++/python No Github Repo - Conditional GAN Cameraman

CondGANCameraman : Teaching a CondGAN about Camera Transformations and Variable Length Sequences using Synthesized Training Data.

(Spring 2017) This project I worked on mostly independently, although I initially began investigating GAN's at Professor Liu's suggestion. The hypothesis here was that, using a properly configured classification component the generator could be trained to produce desired sequences of any duration and exhibiting active camera transformations that it had never seen during training.

Example of Synthesized Training Data. Synthesized Training Data used with CondGAN.

I synthesized millions of training data consisting of variable length sequences of images of a biped moving its right arm in a trajectory of a lower case English letter (a-z). I varied the camera location and orientation for each sample trajectory, along with the colors and shapes of the head and hand, but all of these components remained fixed throughout the training trajectory. All of these descriptive components of each training trajectory were encoded in the class vector component of the GAN. I encoded the letter of the trajectory as a 1-hot vector, the camera position and orientation as matrices and quaternions, respectively, and the colors and location of each frame within the sequence as a "floating point" 1 hot of length 10 each, where the percentage of progress or color was encoded between 2 adjacent values within the span of the 1-hot vector with a sum of 1.0 - similar to a progress bar.

Once the CondGAN was trained, I was able to synthesize sequences, as shown below, that not only varied in duration but also varied the camera orientation/distance during the trajectory, as can be seen in the examples below. Across each row the letters a-z are being drawn - some of the trajectories are clipped but most are surprisingly coherent. The duration of the trajectories is determined by how quickly the progress bar/floating point 1-hot evolves.

Generator Output. These trajectories vary in length. Across each row, a-z is demonstrated (except the last row of course, which was truncated due to the limitations of my video card) while each row is progressivly slower in duration.
Generator Output. These trajectories also include varying camera translation.
Generator Output. These trajectories demonstrate camera rotation during the trajectory evolution.

An interesting discovery was that to synthesize a coherent trajectory, I needed to keep the noise vector fixed throughout the trajectory. By varying the other conditional class tags, the trajectory is evolved. One consequence of this is that many more unique trajectories are possible than were seen in training - for each fixed noise vector, a particular trajectory exhibiting certain characteristics can be synthesized.

I believe that synthesizing reasonable training data would be helpful for many ML tasks, and the ability of the GAN to produce reasonable camera transformations was surprising, at least to me.