Industry-led contests have long been a reliable way to gather interest in a topic, but can they improve an ML algorithm?
Think you know the price of your home better than Zillow does?There might be a $1 million prize for those who can prove it.
The Zestimate score was launched in 2006 to help people estimate the price of a home, whether it’s their own and they’re looking to see whether their asset is gaining in value, or whether they’re looking for a new home and want to see if their offer is fair. Prior to the Zestimate becoming available, only appraisers, mortgage lenders, and real estate agents had access to computer-driven valuations of homes, and the score has been a fundamental component of the company’s growth over the last decade
But, according to Zillow, the Zestimate algorithm is in need of an overhaul, and they’re hoping that a team of data scientists and engineers can push their code to a new level of refinement.
That said, the algorithm is already surprisingly expansive and effective—more than 110 million homes across the country are scored with 7.5 million machine learning (ML) models that examine “hundreds of data points on each individual home.” Some of those data points include data from country and tax assessor records, plus facts uploaded directly by homeowners. All of these calculations result in a median absolute percent error of 5 percent, much improved from 14 percent when the algorithm was first put into action back in 2006.
Stan Humphries, creator of the Zestimate home valuation and Zillow Group chief analytics officer, said in a statement, “While that error rate is incredibly low, we know the next round of innovation will come from imaginative solutions involving everything from deep learning to hyperlocal data sets—the type of work perfect for crowdsourcing within a competitive environment.”
When the contest is completed, and the new algorithm deployed, it will be the biggest change in the algorithm since 2011. And perhaps the change can’t come fast enough, considering that Zillow was just sued by a disgruntled homeowner claiming that the Zestimate repeatedly undervalued her home, making it impossible to sell.
This contest, managed by Kaggle, represents the first time that anyone outside Zillow will be able to see how the Zestimate algorithm works. Between now and October 2017, individuals or teams can download a competition dataset for analysis, and work to develop a model that improves the Zestimate’s residual error. The most successful 100 teams will be invited for a final round, where their predicted values will be compared against actual home sales between August and October 2018.
The winner will be decided on which is most successful beating out Zillow’s own algorithm, plus that of the competition.
Can contests really generate ML change?
At first glance, the Zillow Prize might seem like an odd method of enabling innovation among the company’s data science and algorithmic technology. It seems like one more obvious solution would be to simply hire more data scientists and engineers who can throw more ideas at the problem.
One reason might be that public, industry-led contests have long been a reliable way to gather interest in a topic that might have otherwise gone unnoticed, or convince people to start searching for better ways of performing old tasks. There is also something to be said for having a set of fresh eyes on a given problem.
The XPRIZE organization is famous for its wide-ranging public competitions for everything from suborbital spaceflight to oil cleanups and much more. In fact, much earlier, both early aviators and adventurous sailors were pushed to innovate because of big cash prizes.
More recently, prizes have been used to create better AI-driven cancer screening solutions, which have resulting in pathology screenings that combine human and AI intelligence to achieve a 99.5 percent success rate—far better than the 96 percent that humans could achieve alone.
These prizes end up generating a certain degree of buzz that simply isn’t possible with internal developments, and generally speaking, the results tend to be made at least partially public, which results in even further development.
It’s not dissimilar to the difference between closed and open source technology, the latter of which is used widely in AI and ML implementations—the leading ML platforms, such as TensorFlow, Caffe, the Microsoft Cognitive Toolkit (CNTK), Apache’s PredictionIO, are all freely available for researchers and developers who want to leverage their capabilities.
One can only hope that, in the spirit of opening this effort up to the public, Zillow would follow through with some new information that others can download and learn from or innovate off of. There is no word from Zillow at this time, so we’ll have to wait until sometime in 2019 to hear more.