Report from Data Business Congress – Part 1

By Toby Bond, Nick Aries

02-2020

Held over two days in San Jose, California the inaugural Data Business Congress provided a fascinating insight into the opportunities and challenges in turning data into a key business asset.

Co-hosted by Intellectual Asset Management and Global Data Review the Data Business Congress is one of the first conferences to go beyond data privacy issues and take a holistic look at the issues associated with generating value from data assets. With a focus on the role of data in digital transformation and the digital economy the well-crafted program covered data creation and capture, safeguarding data, competition and privacy law, data transactions and data ethics along with specific insights into the role of data in the automotive and healthcare industries. The need for a holistic approach to data commercialisation was reflected by the diverse nature of the attendees who came from a mix of commercial, data science and legal backgrounds. The pervasive nature of the subject matter was also reflected by the wide range of industries represented.

Bird & Bird's Toby Bond and Nick Aries attended both days of the Data Business Congress. In Part 1 of their report they summarise key themes relating to open vs closed data and the role of regulators in data commercialisation. Part 2 will cover organisational challenges for data businesses and practical advice on how to overcome them.

Closed vs open data

Should data be managed as a closed proprietary asset protected at all costs, or made as open as possible to allow others to use the data to generate value? This core tension is at the heart of data commercialisation and was a reoccurring theme throughout the conference. On the "closed data" side of the debate data is described as the new oil which reflects a proprietary, asset based thinking and encourages amassing as much data as possible while preventing competitors from accessing it. Under this world view the winners are those who hold more data than their competitors. On the "open data" side of the debate are those who suggest that sharing data as freely as possible is the most effective way to unlock the value of data and encourage data based innovation.

Motivations for open data

Advocates of open data point to the history of open source software and the shift in mind-set from viewing source code as a proprietary asset which can only generate value if it is kept secret to the philosophy underlying the open source movement which sees the value of code residing in its implementation in a specific environment and openly sharing source code as a highly efficient model for value creation.

A potential AI "opportunity gap" is an important motivation for advocates of open data. Developing AI systems often requires access to large datasets and limited datasets are often cited as the source of unwanted bias in AI systems. Individually held data sets can also suffer from integrity and freshness issues and when viewed from a macro level their collection often involves a significant duplication of effort between organisations. One speaker gave the striking example of multiple autonomous vehicle companies driving test vehicles on the same stretches of road in Silicon Valley when a key challenge to general deployment is the huge diversity of road conditions around the world.

Open data could also go some way to address concerns about the market power of businesses which already have exclusive access to large volumes of data, putting them at a competitive advantage to rivals without this data. It was however recognised that this needs to be approached with caution; opening certain datasets but not others could potentially add to the data available to the large players, further entrenching their market position.

Drivers for closed data

The motivations for open data are clear but aside from the "asset based" thinking discussed above there other factors which motivate organisations to keep their data closed. Data privacy is obviously a key consideration as it set limits on the dissemination and use of personal information with reference to the rights of the data subject. A reoccurring theme throughout the conference was also restoring public trust in the use of data. The reasons for public distrust in corporate use of data are well known and often stem from uses of personal data which are not expected by the data subject and are felt to disadvantage them individually or society as a whole. Where data sets contain personally identifiable information, pushing for more data sharing without adequate safeguards could cut across existing data protection legislation and do nothing to restore trust in corporate use of data. How would an individual's right (or desire) to be forgotten operate where data about them formed part of an open data set which had been widely accessed?

A separate (but often related) motivation to keep data closed is data security. Data forms a cornerstone of many automated decision making processes and, following the old adage of garbage-in-garbage-out, these decisions will only be as good as the data they are based on. Data integrity and its susceptibility to malicious manipulation are already business critical issues and will increasingly become safety critical issues as automated decision making becomes embodied in more physical systems. Information security and integrity is intimately related to the overall cybersecurity of systems, applications and communication networks. As supply chains become more complex and cyber threats evolve the trend has been to keep data locked down in secure systems with restricted access. A drive for open data requires careful consideration of data security issues. For example a project to share medical imaging data between researches at different hospitals would require careful consideration of how the data could be shared in a secure manner.

The current lack of legal certainty around rights in data can also create a barrier to sharing data. Software source code has a clear form of legal protection under national copyright laws which have a reasonable degree of harmonisation as the result of international agreements. This protection provides a clear basis for open source licences and enforcement action where the terms are breached. In contrast to software, rights in data are often perceived as opaque (even amongst lawyers) and are not harmonised at an international level. While it is quite common to speak of data being "owned" there are very few (if any) jurisdictions with legal systems which treat data as property. Instead data and databases are subject to a patchwork of legal rights. These include rights in trade secrets/confidential information, contractual rights, rights based on unfair competition laws, copyright in certain data and databases and within the EU a sui generis database right. Establishing which rights apply to a particular set of data is often complex and the rights often vary between jurisdictions. Managing these rights contractually can be complex and open source software agreements are not generally suited to licensing data.

The role of regulation

The two areas of regulation which received most attention were data privacy and competition policy. Given the discussion around trust in data it was perhaps not surprising that compliance with GDPR received significant attention as a core part of a data monetisation strategy as it places strict controls on the use and sharing of personal data. Some criticised the cost of compliance with GDPR and others questioned its effectiveness where transactions in personal data can be brokered through jurisdictions with less restrictive regimes. However there did appear to be general acceptance that data privacy legislation plays an important part in public trust around the use of data. "Be fair, ethical and benefit the customer" and "don’t be creepy" were two particularly succinct proposals for how organisations should behave. Recently coming into force the CCPA also featured regularly in discussions and it was suggested that addressing data privacy at a US federal level may soon become necessary to avoid the complexity which would be caused by a proliferation of state level legislation.

Legal obligations to store data in certain locations (often known as data localisation) also featured in the discussions. Russia and China for example both require personal data about their nationals to be stored inside their jurisdiction (although China's legislation also extends to data concerning Chinese critical information infrastructure). Legislation of this nature clearly has the potential to inhibit data sharing and open data.

Competition policy also forms an important dimension of data commercialisation. European regulators were suggested to have been ahead of US in their thinking about competition law and data thus far, but data and competition are now receiving significant attention on both sides of the Atlantic. The FTC held 13 hearings over 9 months relating to big data and big tech and both the FTC and DoJ have appointed big tech related taskforces. A number of State Attorney Generals are also investigating big tech. From a European perspective the main focus thus far has been on data as part of merger control with the assumption that compliance with data privacy legislation will prevent data being aggregated in a way which could potentially harm competition. That thinking appears now to be changing and European regulators are more focused on the possible uses of data post-merger and whether this could have an anticompetitive effect. An example of this new thinking is the Netherlands’ Authority for Consumers and Markets which, last August, approved publisher Sanoma Learning’s acquisition of Iddink Group but made the deal subject to an access to data remedy (licences based on FRAND terms) and an equal access to a digital platform remedy. The UK Competition and Markets Authority 2016 ruling requiring nine UK banks to allow access to their transaction level data and the subsequent Open Banking initiative which led to a growth in Fintech companies offering services based on this data can also be seen as an example of competition policy fostering data innovation.

Discussion is also ongoing about the extent to which competition law should require those in a dominant position to provide access to their data as a potential remedy in conduct based cases. The threshold under European law (following the ECJ's 2004 decision in IMS Health) to require mandatory access to intangible assets has been very high, requiring a dominant position in relation to a specific input which is preventing others from servicing a demand for a product or service which isn’t currently being met. The mood music coming from the European Commission appears to suggest a change of tack in the near future. The US approach to essential facilities has also only required a dominant undertaking to provide access to a facility they control which is seen as necessary for effective competition only in very limited circumstances. Whether this will change in the future is currently unclear. The DoJ appears to hold the view that forcing access undermines incentives to innovate while the FTC has suggested that it prefers structural remedies to behavioural ones.

Finally, intellectual property policy in the context of data received less attention during the conference but is also a topic currently on the agenda for a number of regulators. In 2017 the European Commission considered whether a new data producers' right would help facilitate transactions in data although the proposal was heavily criticised by legal commentators and consumer advocates at the time. Another possible area for development is rights which protect the investment in databases. Currently the EU is the only jurisdiction which had adopted a sui generis database right and proposals to adopt a similar right in the US were dropped in the late 1990s. The increased focus on the value of data in the context of AI systems may however revive interest in the IP rights in data and databases, both in Europe the US and beyond.

For further reading on the subject of rights in AI training data, see here.