March 12th, 2018
The stats. The uniforms. Sheer wild guesses. Everyone has a strategy for making their picks for the NCAA’s March Madness tournament. But this year there’s a new play in the book: machine learning.
Google Cloud has teamed up with the NCAA to host a competition on Kaggle, the world's largest online community of data scientists, challenging participants to build and train machine learning models to forecast the games’ outcomes. Kaggle has hosted contests for the tournament in the past, but this year’s competition is taking things to the next round with a new data set that contains every play-by-play moment in men’s and women’s NCAA Division I basketball since 2009—more than 40 million plays.
The submission deadline for the competition is this Thursday, prior to the start of the tournament. Submissions will be scored by log loss, a common way of measuring accuracy of machine learning models. A total of $100,000 will be awarded across both the men’s and women’s competitions for the best performing applications of machine learning (which probably outdoes whatever happens in your office pool). And because the competition is based on ML models, not basketball know-how, it’s anyone’s game to win. Talk about a Cinderella story.
The massive data set being used in this competition represents just one area where Google Cloud is teaming up with the NCAA as their official public cloud provider. The NCAA is also in the process of migrating 80+ years of data across 24 sports to Google Cloud Platform (GCP), using Google tools to power analyses of teams and players. But for the data scientists and machine learning enthusiasts participating in the Kaggle competition, the fun is already underway. May the best model survive, advance and take home the championship!