My thoughts on the Pittsburgh Hockey Analytics Workshop

Last Saturday, I along with 200 others showed up at Carnegie Mellon University to watch six hours of insightful presentations at the Pittsburgh Hockey Analytics Workshop. It was a historic moment amongst the hockey analytics community as it was the first of it’s kind in the United States. The event was hosted by Andrew Thomas and Sam Ventura; the makers of the hockey stats website war-on-ice.com. Thomas is a faculty member at the University, while Ventura is about to finish his PhD degree in statistics.

It was not too long ago that all hockey media and fans alike were losing their minds when the first wave of hockey analytics writers and web makers got hired by NHL teams. Among them were Eric Tulsky, Corey Sznajder, Cam Charron, Tim Barnes (aka Vic Ferrari) and Tyler Dellow. Some wonder if snatching up such visionaries would leave the public with such a talent drain that will struggle to fill up again. The biggest of the blows was Darryl Metcalf: the creator of eliteskater.com. What Metcalf brought to his website was the first to generate live game logs for a team’s and individual player’s shot attempts and present that data under different score effects (close-score, trailing, leading by more than two goals, etc.). Along with that, Metcalf generated his own player usage charts like the one’s Rob Vollman and Robb Tufts first created on hockeyabstract.com, but one difference is the use of Quality of Competition being measured by their opponent’s Time On Ice instead of relative opposing puck possession. Metcalf also created leaderboards for any analytical stat and you can dwindle the number of qualified players based on games played and which team they are on via a drop down list. Additional data from the 2014 Winter Olympics and the Canadian Junior Leagues (Western, Ontario and Quebec Major Junior Hockey Leagues) were also being displayed. Sites like behindthenet.ca and stats.hockeyanalysis.com have been great for showing season-long player and team data, but never have the breakdown of the numbers been so simple and accessible like extraskater.com had been.

Some have come out and vowed to create a replacement for the website and this NHL season has brought us many new ones, including an additional one that Russian Machine Never Break’s Peter Hassett hopes to finish in the near future. Along with war-on-ice.com, sites like hockeystats.ca, progressivehockey.com, naturalstattrick.com and puckalytics.com have come out and brought on the new wave of data collection into the hockey analytics community. What we learned is that all it takes are some top-tier programming and web-making skills to come together, pull NHL.com’s RTSS (or Real Time Scoring Statistics) data reports and make it into beautiful tables like Metcalf and others make. Even if the new faces of the analytics community get snatched by NHL teams, some have vowed to set up their website to remain public and not have all their lives work shutdown like Metcalf was forced to do.

Back at the workshop, there were plenty of fascinating presentations. Jen Lute Costella’s piece on shot suppression was about as well-articulated as you can get and proved why she is among the best in the business and why she was recently hired to write for Yahoo Sports’ Puck Daddy blog. Matt Cane’s presentation about how well player’s shoot from their off-hand versus their on-hand plus his introduction to side bias and how well defenders did in giving up shots from their side was truly innovative and thought provoking.

Ventura introduced zone transition time to see if teams that get out of a team or player’s defensive zone the fastest and maintain constant pressure in the offensive zone can lead to future success.  Many argue that this can be flawed because of a few reasons outside of a lack of a sample size. There is a chance of poor interpretation of the RTSS play-by-play data because it only measures events that occur in the game and not when teams or players clear and enter the zone (here is an example of one). Also, teams and players that enter the offensive zone really fast to attempt a shot without a cycle or long fore-check could come out poorly in the data because they don’t last in the zone for a long period time.

This is not to so much to criticize a presentation such as Ventura’s (who is significantly smarter than me and we don’t need to do any measurements on that) but more to emphasize that even if we as hockey statisticians like to think we have figured everything out, we are still experimenting, developing, learning and growing to find as many measurements as we can to learn more about what teams and players do, how well they perform and why calculating such a number would be so important.

Joe Walsh had a very unique piece about measuring which fighters are the most likely to win based on results from hockeyfights.com polls and Nilesh Shah presented his calculations to see which teams will advance to each round of the playoffs based on a fixed set of categories from the entire regular season. Would you call the findings in their presentations perfectly accurate? We’re not a hundred percent sure, but the fact that they make strong cases with solid data to support their work should not go unnoticed when thinking about the standards for presentations at future hockey analytics conferences.

In my opinion, the two most fascinating topics were the presentations by YinzCams about data visualization and James Santelli’s piece about the journey of how baseball analytics seeped into Major League front offices and the public consciousness. In all, both presentations reminded us that good communication between those that hold the data and know hockey analytics to those that don’t are critical if the message wants to be spread correctly. Santelli provided examples of how the Pittsburgh Pirates hired a proverbial go-between to communicate the front office’s analytical findings over to the coaching staff and it resulted in the implementation of more defensive shifts than anyone in Major League Baseball. For YinzCam’s, they proposed on implementing better graphs and charts that can find a way to display a player’s puck possession and deployment better than the usage charts we use today.

Lastly, I want to talk about Stephen Burtch. Burtch is a math teacher by day and a Rogers Sportsnet writer by night who presented his findings on delta-corsi: a stat who’s purpose is to eliminate the flaws of corsi relative (blocked and unblocked shot attempts relative to a player’s team) by subtracting expected corsi with observed corsi. His belief is that over time, delta-corsi for most players should stay the same because expected corsi is measured based on coaches deployment and that is usually expected to change appropriate to a player’s observed output. When the conference was over and a good portion of those participating went to Hough’s to watch the Pittsburgh Penguins 6-2 win over the Buffalo Sabres, Burtch owned the room by talking about many subjects, including the trials and tribulations to get his analytical points across to the higher ups at Sportsnet. Remember, this is the company that employs people like Glenn Healy and gives him a very strong voice in the hockey world. I left that night learning so much from him and I could not be more thankful to meet a guy like him to teach me and many others more about this beautiful sport and the life of being a hockey stats writer.

In short, I learned two things last weekend. First is that any possession-based stat (Corsi, Fenwick, Shots, etc.) has to always take into account for and against as two separate numbers the same way on-ice even strength shooting percentage and on-ice even strength save percentage should be treated in PDO. The second is that any notion of hockey analytics hitting a plateau is not witnessing what is still going on. Some have made some valid points, but personally, I don’t think we have measured everything yet. While baseball has WAR and basketball has PER, hockey still does not have it’s own version of a singular stat to measure how valuable a player is to an NHL team. The RTSS data by NHL.com can only give us some information, but not to the point where we can measure everything. We still are wondering if SportVU will ever be implemented in NHL buildings and if the glowing puck is really coming back to the game, but actually to give benefit to hockey statisticians like PitchFX does for baseball. Once we grab all this new-fangled information, another proverbial Bill James will appear and a new way of thinking about hockey will be presented.

Lastly, this will certainly not be the last hockey analytics conference. Even if I could have watched the entire conference at home by watching it via a live youtube video feed or watched it in sections later on and grabbed all the slides from war-on-ice.com’s blog post, I have no regrets making the four-hour trek to Pittsburgh and being able to talk to so many new people wanting to learn and connect as much as I do. A very nice woman named Carrie wanted to start a monthly hockey analytics get-together and Robb Tufts has proposed at setting up an analytics workshop in Washington D.C. Things are still in the works, but plans are being brought ahead and with enough effort, like we have seen before, progress will be made and the hockey analytics community will once again prove that it is not dead.

Leave a comment