Training GPT-2

I am using gpt-2-simple to do the training on google colab, especially for the 355M one would require more than 8GB of VRAM to train.