Advertisement

StarCoder
StarCoder
StarCoderBase and StarCoder are extended language models (LLM Code), trained on data permissively licensed from GitHub. This includes data from over 80 programming languages, Git commits and issues, Jupyter notebooks, and Git commits.
We trained a 15 B parameter model for 1 trillion tokens, similar to LLaMA.
We fine-tuned StarCoderBase for 35 billion Python tokens. The result is a new model that we call StarCoder.
StarCoderBase is a model that outperforms other open code LLMs in popular programming benchmarks. It also matches or exceeds closed models like OpenAI’s code-cushman001, the original Codex model that powered early versions of GitHub Copilot. StarCoder models are capable of processing more inputs with a context length greater than 8,000 tokens than any other open LLM. This allows for a variety of interesting applications. By prompting the StarCoder model with a series of dialogs, we allowed it to act as a technical assistant.
Vote :








