Creating the best algorithms can’t be magic; it needs to be a replicable process for your team. What are the critical needs for this to happen?
According to Gartner, “Companies will be valued not just on their big data, but on the algorithms that turn that data into actions and impact customers.” This emphasizes the need for organizations to embrace algorithms as the foundation of their business logic.
An effective algorithm is not a hit-or-miss proposition, but a creation that depends on a complete, end-to-end and repeatable process: from data gathering and testing to retraining and revision.
Here are four ways to put your organization in an optimal position to benefit from a continuous flow of ever-improving algorithms – and positive business outcomes.
#1: Streamline Data Gathering
Thanks to the masses of data now coming from devices and applications, data scientists have more raw material than ever to work with. This flow of data creates a technical challenge because, for any given project, you need to separate the meaningful data from the noise.
It is not always immediately apparent exactly what data is meaningful since the design and development process can require many iterations, with each benefitting from a continuing influx of new data.
The solution: don’t throw anything away. Keep all data in its original form, or as close as possible, so that it is easily accessible throughout the design project. This can put a strain on computing resources, but today’s cloud-based analytics platforms are well-equipped to handle such workloads.
#2: Facilitate Teamwork
At a high level, the algorithm-design workflow is relatively straightforward: define questions; gather the data; explore and validate the data; build a model; perform an analysis, which may require adding more data; generate outputs, whether a model, a visualization, new KPIs or a new process; deploy and evaluate.
Typical team makeup is also straightforward. As the project owner, the data scientist interprets the problem, decides what inputs to use, designs the model, and sequences test and training iterations. The business analyst represents the customer and helps form the questions or KPIs to be considered, evaluate tests, and validate outputs along the way.
The software engineer’s main job is to turn the model into an operational asset that will be reliable, repeatable, and as efficient as possible.
The design workflow can be challenging for a number of reasons. It’s largely a creative process, for one, and it doesn’t fit neatly into a project-management worksheet since there can be unexpected delays in collecting new data needed to inform the model.
For this reason, the data scientist, as the team lead, must work to balance and expedite the workloads of the team members. This can be difficult because the people filling the roles of business analyst and software engineer may have other jobs to do in the organization. Good planning and clear, regular communication are necessary to make the best use of their time.
It’s also valuable for the data scientist to be as self-sufficient as possible so he or she doesn’t have to rely on team members to do jobs outside of their main responsibilities. For instance, if there’s suddenly a need for some historical data locked in an obscure database, the data scientist might need the software engineer to help them gain access – which may work against streamlining the process.
This is one of several places where technology can help. If the data scientist has access to self-service analytics, he or she can do spur-of-the-moment data queries. This saves time and benefits the clear-mindedness needed for the creative process.
A robust data management platform that supports common data formats and can maintain large volumes of “hot” data simplifies the process of collecting historical data. This reduces the need for help from IT and saves time while also bolstering the creative process.
#3: Track/Adjust Performance Over Time
Getting the algorithm into production is an important step, but the job is not yet finished. It’s now up to the team to determine how well the algorithm performs and when retraining will be necessary. This is critical to the process because business conditions are constantly changing. For instance, the business might change the price of some products or add new versions. During this time the algorithm will need to be retrained to learn the impact of these changes.
To streamline this process it is valuable to instrument the algorithm appropriately and to take the time and resources necessary to track operational performance.
#4: Maintain Commitment
It is also vitally important for the project, and the team, to have the organization’s full commitment to the long-term well-being of data development efforts in general.
In fact, for the success of the organization’s algorithms – and for the business itself – it’s essential that the data science function be tied as closely as possible to top-tier strategic planning. In a well-run organization, data will have first-class citizenship and the data scientist will have regular C-level access. The reason: the work of designing and producing algorithms spans the entire business – from operations to marketing and sales – and therefore should not be buried in the organization.
Top-tier companies know this – there’s a good likelihood that the data scientist will be reporting to the head of data science, with that person reporting to the CEO or COO. That’s the best way to streamline the design, testing, and production of algorithms, and the best way to position the business in the brave new world of digital transformation.