GPT-Code-Clippy (GPT-CC)
An open source version of GitHub Copilot, an AI-driven language model
What is GPT-Code-Clippy (GPT-CC)?
GPT-Code-Clippy (GPT-CC) is an open-source version of GitHub Copilot, a deep learning model based on GPT-3, known as GPT-Codex, which is specifically trained on publicly available code from GitHub.
The dataset used to train GPT-CC was collected from SEART GitHub Search based on the following criteria:
- 10+ GitHub stars
- 2+ commits
- Must have a licence
- Exclude forks
- Size < 70708 bytes
- In addition, the repositories from The Pile are also included.
You can find more details about this project here: [https://discuss.huggingface.co/t/pretrain-gpt-neo-for-open-source-github-copilot-model/7678?u=ncoop57]