Model governance and explainability are essential in building ethical and auditable blockchain and AI technology.
Even though Bitcoin is the most famous instantiation of blockchain technology, we are just beginning to discover the true potential of this system, which records transactions of any kind and maintains the record across a peer-to-peer network.
In 2018, I turned my thoughts on blockchain inward, producing a patent application (16/128,359 USA) around using blockchain to ensure that all of the decisions made about a machine learning (ML) model are being recorded and are auditable.
These include the model’s variables, model design, training and test data utilized, selection of features, the ability to view the model’s raw latent features, and recording to the blockchain all scientists who built different portions of the variable sets, participated in model weight creation and model testing.
As enabled by blockchain technology, the sum and total record of these decisions provides the visibility required to effectively govern models internally and satisfy regulators.
Model governance and explainability are essential in building ethical AI technology which is auditable; as a data scientist and member of the global analytics community, creating ethical analytic technology is very important to me, particularly in my role of serving financial and enterprise customers.
Before blockchain: Analytic models adrift
Before blockchain became a buzzword, I began implementing a similar approach in my data science organization. In 2010 I instituted a development process centered on an analytic tracking document (ATD). This approach detailed model design, variable sets, scientists assigned, train and testing data, and success criteria, breaking down the entire development process into three or more agile sprints.
I recognized that a structured approach with ATDs was required because I’d seen far too many negative outcomes from what had become the norm across much of the banking industry: a lack of validation and accountability. A decade ago, the typical lifespan of an analytic model looked like this:
- A data scientist builds a model, self-selecting the variables it contains. This led to scientists creating redundant variables, not using validated variable design and creating of new errors in model code. In the worst cases, a data scientist might make decisions with variables that could introduce bias, model sensitivity, or target leaks.
- When the same data scientist leaves the organization, his or her directories are typically deleted. Often, there were a number different directories and it was unclear what directory(ies) were responsible for the final model. The company doesn’t have the source code for the model or might have just pieces of it. No one definitively understands how the model was built, the data on which it was built, and the assumptions that factored into the model build.
- Ultimately the bank can be put in a high-risk situation by assuming the model was built properly and will behave well—but not really knowing either. The bank is unable to validate the model or understand under what conditions it will need to be very careful in using it. These realities result in unnecessary risk or a large number of models being discarded and rebuilt. often repeating the journey above.
A blockchain to codify accountability
My patent describes how to codify analytic and machine learning model development using blockchain technology to associate a chain of entities, work tasks and requirements with a model, including testing and validation checks. It replicates the approach I use to build models in my organization—the ATD remains essentially a contract between my scientists, managers and me that describes:
- What the model is
- The model’s objectives
- How we’ll build that model
- Areas that the model must improve upon, for example, a 30% improvement in card not present (CNP) fraud at a transaction level
- The degrees of freedom the scientists have to solve the problem, and those which they don’t
- Re-use of trusted and validated variable and model code snip-its
- Training and Test data requirements
- Specific model testing and model validation checklists
- Specific assigned analytic scientists to build the variables, models, train them and those who will validate code, confirm results, perform testing of the model variables and model output
- Specific success criteria for the model and specific customer segments
- Specific analytic sprints, tasks and scientists assigned, and formal sprint reviews/approvals of requirements met.
As you can see, the analytic tracking document informs a set of requirements that is very specific. The team includes me as owner of the agile model development process, and additionally consists of the direct modeling manager, and the group of data scientists assigned to the project. Everyone on the team signs the ATD as a contract once we’ve all negotiated of our roles, responsibilities, timelines, and requirements of the build. The ATD becomes the document by which we define the entire Agile model development process. It then gets broken into a set of requirements, roles, and tasks which are put on the blockchain to be formally assigned, worked, validated, and completed.
Having individuals who are tracked against each of the requirements, the team then assesses a set of existing collateral, which are typically pieces of previous validated variable code and models. Some variables have been approved in the past, others will be adjusted, and still others will be new. The blockchain then records each time the variable is used in this model—for example, any code that was adopted from code stores, written new, and changes that were made—who did it, which tests were done, and the modeling manager who approved it, and my sign-off.
Importantly, the blockchain instantiates a trail of decision-making. It shows if a variable is acceptable, if it introduces bias into the model, and if the variable is utilized properly. We can see at a very granular level:
- The pieces of the model
- The way the model functions
- The way it responds to expected data, rejects bad data or responds to a simulated changing environment.
All of these items are codified in the context of who worked on the model and who approved each action. At the end of the project we can see that, for example, of the variables contained in this critical model, that each one has been reviewed, put on the blockchain and approved.
This approach provides a high level of confidence that no one has added a variable to the model that performs poorly or introduces some form of bias into the model. It ensures that no one used an incorrect field in their data specification or changed validated variables without permission and validation. Without the critical review process afforded by the ATD and now blockchain to hold it auditable, my data science organization could inadvertently introduce a model with errors, particularly as these models and associated algorithms become more and more complex.
Models with more explainability and less bias
In sum, overlaying the model development process on the blockchain gives the analytic model its own entity, life, structure and description. Model development becomes a structured process, at the end of which detailed documentation can be produced to ensure that all elements have gone through the proper review. These can be also revisited at any time in the future, and essential assets for use in model governance.
This utilization of blockchain to orchestrate the ATD and agile model development process can be used by parties outside the development organization, such as the bank’s governance team and its regulatory units. If a regulatory agency wanted to audit the way a critical model was built, their review could produce a statement such as, “all variables haven been reviewed and were approved by …” Likewise, if a revision or change in regulatory environment made us want to understand all uses in production of variable #51817 from our large asset inventory of variables, we could easily query the blockchain to determine any usages in production models.
In this way, analytic model development becomes highly explainable and decisions auditable, a critical factor in delivering Ethical and Explainable AI technology. Explainability is essential in eradicating bias from the analytic models used to make decisions that affect individuals’ financial lives.