Opening Space for Open Generative AI Models with BigCode Project

Category Science

tldr #

Unveiling the BigCode project, a Northeastern professor shares the world of generative AI and the significance of open-sourcing the technology. Two models, StarCoder and SantaCoder, were developed by the project, which can be licensed and used for gaming, industrial automation, and more. It allows insight and involvement into the development of the models while evaluating their strengths and weaknesses.


content #

Believing in open scientific collaboration on AI technology, a Northeastern professor joined others in creating a state-of-the-art open generative model for programmers that can be licensed and adapted for different uses such as gaming and industrial automation. Generative artificial intelligence and large language models have taken the world by storm in the last few years, says Arjun Guha, associate professor of computer science at Khoury College of Computer Sciences at Northeastern University. They are having a particularly significant impact on programming.

This project is a collaboration between two private companies, Hugging Face and ServiceNow, and a Northeastern professor, Arjun Guha

Computer scientists, programmers and smaller-market players, however, have very limited insight into the development process of these models, and that prevents them from developing a deeper understanding of the technology. It also excludes them from meaningful participation in its further expansion.

That is why Guha and his research group got heavily involved in the BigCode project, launched by two private companies, Hugging Face and ServiceNow. Hugging Face, a company that hosts a large open-source machine learning community, and ServiceNow, which helps businesses optimize technology solutions, teamed up to support individuals with professional AI research background in responsible development and use of open large language models for coding. They committed significant people and hardware resources to the project. As a result, StarCoder, a state-of-the-art, open generative model for programmers can be now licensed and adapted by others for different uses.

The BigCode project’s primary focus was to create two open source models, StarCoder and SantaCoder, suitable for different uses such as gaming and industrial automation

"You can spend an enormous amount of money building one of these things and not actually know if it's any good," Guha says. The few multi-billion-dollar companies that have resources to build such learning models and "drop" them every now and then to stun the world, Guha says, are completely closed to the idea of sharing with the community what this technology is capable of.

"If you ask the people who make them, 'What can I do with it?,' I think the answer they will always give you disingenuously is 'anything,' which is misleading," he says.

The first step to building an LLM is to train it on data, which was done through Hugging Face’s supercomputer, Dwarf Fortress

Guha believes that academic research has a role to play in shaping generative AI technology. "An academic can come in and rigorously evaluate these things and say that here are its strengths and weaknesses. Yes, use it to do this, but please don't use it to do these other things without some serious guardrails," Guha says.

A much more pressing issue is people using this technology to make decisions that impact other people, for example, about a loan application or a job opening.

It is important that models are properly evaluated to know what they are capable of and what their weaknesses are

"We should talk about when it is not appropriate to use these models, when they are doing more harm than good," he says.

Guha dedicated a lot of energy to BigCode, which launched in September 2022, he says, leading a working group that focused on evaluating the open models, StarCoder and SantaCoder, created by the project.

Building an LLM first requires identifying the data that will be fed into the model to train it. When the model has been trained, Guha says, it should be evaluated on what it can and cannot actually do.

This project opens up a space for people with little insight into these generative AI models to be able to understand them better and further participate in the technology’s expansion

The models created by the BigCode project were trained at the Hugging Face supercomputer, Dwarf Fortress, which provides the largest collaborative infrastructure in the domain of large language models, and are publically available.


hashtags #
worddensity #

Share