Establishing trust in data is an essential requirement for businesses and entities for whom credible, reliable information is the lifeblood. As enterprises seek to manage data as an asset, it becomes increasingly vital that data sources are trusted and verifiable.
I wrote a few weeks ago about the MIT initiative to establish a framework for trusted data, and the resulting position paper, "Towards an Internet of Trusted Data: A New Framework for Identity and Data Sharing". The authors highlight the criticality and need for "trustworthy, auditable data provenance" where "systems must automatically track every change that is made to data, so it is auditable and completely trustworthy". One of the key recommendations of the study was to improve the process and quality of data sharing. One suggestion was to move the algorithm to the data, explaining "The concept here is to perform the algorithm (i.e. query) execution at the location of data (referred to as the data-repository). This implies that raw-data should never leave its repository, and access to it is controlled by the repository/data owner".
Tom Dunlap has been at the center of issues of data trust, standardization, and normalization for well over a decade. Dunlap most recently served as a managing director at Goldman Sachs, where he was global head of enterprise data strategy and reference data operations during his seventeen-year tenure with the firm. Among other responsibilities, Dunlap served on Goldman Sachs operations data digitization council and financial reform steering group. He also serves as a member of the Financial Research Advisory Committee at the US Treasury Department's Office of Financial Research.
From his catbird seat at the heart of the action in financial services, Dunlap developed some informed perspectives on issues of data trust and data reliability. He sees the financial services industry progressing on a path to enriched data quality and reliability. Dunlap notes, "From the top on down, financial services firms are viewing data as a corporate asset, where data is seen as being foundational to achieving not only compulsory needs with regulatory reporting, but also as improving the client experience and enabling commercial initiatives". Dunlap sites as an example the introduction of Legal Entity Identifier (LEI), which is being employed by financial services firms to manage systemic risk. In addition, financial services firms are tracking data lineage and definitions of data, with the result that data can be traced from production through consumption, to accurately understand the points at which data is being used and how that data is being transformed during its lifecycle. The result, notes Dunlap, is that "data can now be trusted, and verified, from the source, with fewer data quality problems being experienced". The benefit is that higher levels of data quality translate into faster time-to-market for activities including product profiling and pricing, and faster trade executions. The net result is that client experience has improved.
As data has proliferated, so have the variety of new data types under review, including what are known as "unstructured" data sources. Examples would include documents, pictures, texts, and other free-form images. It is in addressing the challenges of managing unstructured data that Artificial Intelligence (AI) and machine learning are enabling breakthroughs. Dunlap cites the example of "derivative contracts", where formats may differ across financial institutions. AI and machine learning capabilities can be used to look within documents to automatically detect key data elements, such as legal entity names and economic terms. Firms are applying AI and machine learning to search for these data points, perform language translations as needed, match Legal Entity Identifiers, and load the resulting output into categories that have been assigned predictive levels of completeness and accuracy, which are usually quite high. Over time, AI and machine learning algorithms become very good at knowing what key data attributes to look for, where to bucket these attributes across workflows, and delivering recommendations on data enrichment. The result is that data capture and matching processes which had taken a full day to complete have now been reduced to a matter of minutes, even seconds in some instances.
Blockchain offers an alternative model to access data and a different way to imbue trust in data quality. David Shrier has been a trailblazer in the movement to establish trusted data. In addition to serving on the MIT commission which produced the policy paper on trusted data, Shrier is a lecturer and futurist with MIT Media Lab, an advisory member to the Financial Industry Regulatory Authority (FINRA), and an associate fellow at Oxford University, where he is engaged in the delivery of global online Fintech and Blockchain initiatives through Oxford Fintech and Oxford Blockchain Strategy. Shrier observes, "Blockchain is a completely different kind of database, one with the potential for greater transparency into the data for multi-stakeholder environments, and greater cyber-resilience if certain types of Blockchain and other technology are combined". He continues, "The old-school concepts of data lake, data warehouse, and data mart still rely on the concept of having a centralized database which provides for a single point of failure and an attractive attack surface for hackers".
Shrier goes on to note, "We are just beginning to explore the potential of Blockchain to help transform society. Blockchain has given birth to a new model of funding, of distributed capital formation, for businesses called ICOs (initial coin offerings). This is particularly important in Europe, for example, where today 70% of the funding for businesses relies on banks. In the US, most innovation funding is concentrated in Silicon Valley, and ICO's hold the potential to democratize innovation funding if the regulators don't shut it down". He continues, "Consumers can have better digital identity, lower cost financial services, new employment and community models, better control over their assets, and more, through Blockchain systems". Shrier concludes, "It's still very early in the development of applications for consumers. In 1994 internet tech, we had no conception of Airbnb or Uber, and I think we're in a similar stage with Blockchain technology".
The biggest issues surrounding the use of personal data today come from not knowing where this data is stored, who is looking at it, or what is being done with this information. While the new European data protection law, the General Data Protection Regulation (GDPR), begins to address these issues, there is still a need to provide technology infrastructure that will enable trusted data sharing. Blockchain approaches, as described in the MIT Trust Data initiative, provide a path to a trusted data framework which can ensure:
more secure personal information
better access to data through a personal data store
an unchangeable audit trail of who's done what with personal information.
Shrier reflects in conclusion, "Society as a whole can benefit from more reliable, distributed data and information. In this era of fake news and state actor interference in elections, creating technology-driven trust offers the potential to restore faith in our shared institutions".