News
Fresh
Official Definition of Open Source AI

Official Definition of Open Source AI

Open Source Initiative (OSI), the organization dedicated to open source issues, has released version 1.0 of its Open Source AI Definition (OSAID). This document is the result of years of collaboration with academic and industrial circles and aims to establish a standard that allows for determining whether an AI system qualifies as open source.

At first glance, one might question why having a clear definition of open source AI is important. As noted by Stefano Maffulli, OSI’s executive vice president, one of the main goals is to unite developers and policymakers for collaborative efforts.

“Regulators are already starting to pay attention to this area,” Maffulli said in an interview with TechCrunch. He emphasized the need for consistency among different stakeholders.

For an AI system to be considered open source under OSAID, it must provide enough information about its architecture, enabling other developers to "substantially" recreate the model. It also requires disclosing details about the training data, including its source, processing methods, and licensing.

“Open source AI allows users to fully understand the process of its creation,” Maffulli added. This means all components, such as the code used for training and data processing, must be accessible.

OSAID also outlines the rights that developers can expect from open models, such as the freedom to use and modify them without prior permission. “It’s essential that you can build on existing models,” Maffulli pointed out.

However, OSI lacks enforcement mechanisms to compel developers to adhere to OSAID. Nevertheless, the organization plans to call out models that are improperly labeled as open source.

“We hope the community will respond when someone tries to misuse the term ‘open source’,” Maffulli stated. Historically, such attempts have had mixed results, but they haven’t been completely ineffective.

Some startups and large companies, like Meta, frequently use the term "open source" for their AI models, but few meet OSAID’s criteria. For instance, Meta requires platforms with over 700 million active users to request special licenses for using its Llama models.

Maffulli expressed discontent regarding Meta’s approach to labeling its models as "open." After discussions with OSI, Google and Microsoft agreed to stop using this term for models that are not fully open, but Meta did not follow suit.

Stability AI has also promoted its models as “open,” yet it requires a corporate license from companies earning over a million dollars. The French startup Mistral imposes restrictions on using its models for commercial purposes.

Research conducted by Signal Foundation and other organizations revealed that many models claimed to be open are only nominally so. Training data often remains confidential, and the computational resources required for most developers are inaccessible, leading to increased centralization of power in this domain.

Some experts argue that OSAID does not encompass all aspects related to training data licensing. For example, can a model be considered open if it requires payment to access training data?

“An open source AI definition must provide assurance about licensing,” noted Luca Antiga from Lightning AI. He added that neglecting data licensing can undermine the effectiveness of the definition.

Version 1.0 of OSAID does not address copyright issues concerning AI models, which could also become problematic. If courts decide that models can be protected by copyright, new legal tools may be required for appropriately opening such models.

Maffulli agreed that the definition needs updates, and to this end, OSI has formed a committee to monitor its application and propose revisions.

“This is a collective effort conducted openly with various stakeholders,” he concluded.
Open Source for Artificial Intelligence: New OSAID Definition