The use of open-source software code and databases has become unavoidable when developing software and database products. Yet such open-source licenses contain an inherent risk - some of them might not allow commercialisation, AI training or other purposes. To avoid creating a legally flawed product, it is of utmost importance to understand the impact of the wording of the license terms. When selecting input materials, make sure they are governed by terms that are commercially viable, suitable for the intended purpose of the final software product, and compatible with other open-source licensing requirements. In the rush to bring new software and database products to market, open-source licenses are often insufficiently considered, and this can lead to significant legal risks, including a possible devaluation of the company during investments or buyouts, or even lawsuits.
To interpret a software license, you need to understand the relevant copyright law and the contractual terms of the licence agreement. You may also need to be aware of the legal principles of patents, designs, databases and trade secrets.
Contracts related to software products are essentially based on applicable copyright law. The developer of the software code is the author, and therefore the copyright holder. Under the laws of most jurisdictions, the copyright ownership of software products created by employees automatically vests in the employer, provided the work is done within the scope of employment or in the course of their duties. Copyright protection grants the copyright owner a set of exclusive rights that generally prohibit others from using the protected work without permission, but these rights are not limited solely to use “in trade” or in commercial contexts. The copyright right holder can grant permission for the use of the software by others in a contract, namely a software license agreement. The software licence agreement outlines the terms and conditions governing the permitted uses of the software. Those terms may include, for example, the geographical area, intended purpose, or the number of users allowed.
If the software or any part of it is protected by a patent, design right or trade secret (know-how), the owner of these rights (assuming that all the right holders are the same) may permit the use of the software under multiple licenses. If the software is accompanied by a database, the terms of use of the database must also be taken into account. In the case of open-source software, it is also possible that several parallel licenses must be complied with simultaneously.
Should any third party breach the licence terms and conditions associated with software and/or its databases, this constitutes a breach of contract and an infringement of copyright and/or other intellectual property (IP) rights. If there was no contractual relationship between the parties, then the violation of the license terms and conditions “only” constitutes an infringement of copyright and/or other IP rights. Such infringements may result in various civil law consequences, including the right to reclaim the enrichment obtained through the infringement or the right to compensation.
Although open-source means that the code is publicly available, different open-source licenses have different terms and conditions that can create challenges for general commercial use or the use of AI-based software code. In this sense, code or database components subject to particularly restrictive licenses may be “toxic,” as their terms can impose significant legal and business risks.
Take this simple example: when developing software for commercial purposes, it is essential that code that can be used under a given type of license is integrated into the software under development. This code can be used legally if the terms and conditions of the license are fully adhered to when using the software product. The more licensed code is incorporated in the software, the more license terms must be complied with. Further, open-source software may include both a main license and additional component licenses that only cover certain files or code lines. In particular, when it comes to commercial use, a failure to fully comply with specific licence terms and conditions, even concerning a single integrated code, can lead to licence-related disputes.
Or this example: when developing AI-based software, developers normally compile a training dataset (database), which may contain data subject to a special license. If an open-source licensed dataset is incorporated into the training dataset and the open-source license imposes restrictions, such as prohibiting AI training or commercial exploitation, these restrictions will impact the final software product, because once the algorithm, recently most commonly a neural network, is created using a training dataset containing toxic data, the neural network itself — considered a database from a legal standpoint — becomes “contaminated” by those data. This also means that, since the AI-based software uses the toxic neural network every time it’s run – it itself becomes toxic and generates toxic output. If this toxic neural network (or the first one in the case of further training) cannot be identified through version control, then the software product can never be legally pure.
Infringing software can pose several business risks, including:
In principle, the later a toxic component in software is discovered, the more expensive it is to solve the problem.
When developing software, it is important to understand how different types of licenses work, i.e., what conditions have to be met when using them. It’s also essential to have appropriate tools and protocols in place for identifying and managing all the licenses used in the software. Today, there are many online tools that can track and identify licenses in software code.
Open-source software compliance is achieved when all the components of the software meet the requirements of the license. Below we summarise how these requirements can be met and validated - before, during and after development.
Before and during development, it is essential that all developers on the team are thoroughly informed and educated about the specifics of each licence, including open-source licences, to understand their precise implications. The best way to do this is to establish a policy for managing open-source licenses. The policy should clearly and transparently summarise: how to identify and label the respective licenses in the software code, which licenses are the most frequently occurring, which ones are preferred, and which to avoid.
When auditing software code or database elements for open-source elements, it’s not enough to identify the license itself. It’s also crucial to grasp how the open-source code or database is utilised within the product, and what the interplay is between this and the license terms. It may be that the license is only needed for product validation and does not appear in the final product, so it must be handled differently. In many cases, you need an expert to interpret the software product and license terms.
If, after development, you discover that a software product doesn’t comply with all the licence conditions, there are various ways to rectify the problem. To properly handle the problem, it is also necessary to assess which solution is the safest, most common, fastest and most cost-effective for the company. The obvious option is trying to replace the toxic software code or database, but if the affected component provides the core of the software, this may be impossible or would take too much time and resources. Another option is to obtain an alternative license to use the infringing component in the software product. This option can be problematic, for the costs associated with commercial use may be excessive. In extreme cases, for example, if the toxic component cannot be used under alternative license terms due to limitations on a grant or other funding, this is not a plausible solution, meaning the component must be removed from the product. Remember, if none of these solutions is implemented, the infringing software product poses a significant business risk, as set out above.
Software in bioinformatics and life sciences often links to databases, such as pre-clinical and clinical trial databases, which can only be used under a special license. If the software product is associated with a medical device, the distribution of the device itself may also be affected by compliance with the license.
In the medical field, patent applications are common. When technical know-how related to software or database elements is disclosed in a patent application or associated articles, but ultimately the patent is not granted, this information enters the public domain. Consequently, while it cannot be patented again and becomes available for general use, copyright and database rights may continue to protect it. Information on what specific applicable rights can be determined by examining the related technical publications, or through a dedicated background investigation.
The above shows that establishing a set of rules at the outset of software development is crucial to ensuring the creation of legally compliant software. A failure to do so can lead to serious, potentially irreparable issues, which could even lead to the downfall of the company.