Nvidia has teamed up with ServiceNow and Hugging Face to introduce a new family of open-access large language models (LLMs) for code generation.
The StarCoder2 platform was developed by the BigCode community with performance, transparency and cost-effectiveness in mind.
The wide scope of StarCoder2 is based on the formation of 619 programming languages. The AI code generator comes in three versions: 3 billion, 7 billion and 15 billion parameters.
StarCoder2 brings code generation to everyone
According to the announcement, the smaller variants were created to provide solid performance while managing computing costs. The smaller model was built alongside ServiceNow and promises to match the performance of the original StarCoder's 15 billion parameter option, while the mid-spec option receives Hugging Face support.
StarCoder2's 15 billion parameter option was trained on Nvidia's accelerated infrastructure.
The considerable improvements mean that while the Nvidia-accelerated option can unlock greater performance, even the most basic variant is a considerable step up from previous generations and requires less sophisticated infrastructure.
Jonathan Cohen, vice president of Applied Research at Nvidia. “Nvidia's collaboration with ServiceNow and Hugging Face introduces safe, responsibly developed models, and supports broader access to responsible generative AI that we hope will benefit the global community.”
Additionally, StarCoder2 uses a new code dataset called The Stack v2, which incorporates new training techniques for understanding low-resource programming languages, mathematics, and program source code discussions.
In addition to performance and efficiency improvements, organizations insist that StarCoder2 adhere to ethical AI practices, such as using data responsibly licensed from the Software Heritage digital commons. Developers can also choose not to use their data for training.