## Joe thornton and a strange and troubling statistic

San Jose Sharks center Joe Thornton has just finished a Hart Trophy worthy season. One statistic that has been used quite a bit to justify his Hart candidacy is the number of consecutive wins for San Jose during which Joe Thornton has scored at least one point. Thornton scored at least one point in each of the Sharks last 33 wins.

I did not use this statistic in support of Thornton in our Facing Off discussion over the Hart Trophy because I really was not sure what to make of it. So I decided to dig a bit deeper. My conclusion is that it isn't what it seems. A streak is a combination of elements. Some more obvious than others.

Before I go any further, let me define the term 'streak' for this particular context. It is the number of consecutive games a team has won where one specific player has at least one point in each of the wins. That definition seems simple enough, but it is more nuanced than it might appear.

I'll forewarn the readers. I've tried to make it easy to read for those without statistical backgrounds. You'll let me know if I succeeded. The Challenges of Interpreting a Streak

Statistics that come in the form of streaks can be difficult to interpret. They often represent an atypical distribution coupled with personal excellence. Perhaps the most famous streak in sports history is Joe DiMaggio's 56 game hitting streak in 1941. His streak is a delicious combination of superb play, human factors and statistical flukiness. Ironically, it is not even clear that DiMaggio was the best hitter in baseball during that streak. I've put several interesting details about DiMaggio's streak in my notes following the article.

Embedded in the structure of this statistic is another quirk. It carries a strong 'predictive bias' towards the outcome of a game. By requiring the designated player to score a point in the game, it means his team has a scored at least one goal. This leads to a corresponding requirement: the other team must score a minimum of two goals to win. That relationship is important, as a substantial portion of NHL games have a team score one goal or less. This is a cousin, statistically speaking, of oft-used statements like "when this team scores first, it has an 82 percent chance of winning …".

Also embedded in the statistic are games that do not count. By definition, not a single loss is part of the streak. Thornton's streak is 33 games long, but it is a subset of the 61 games the Sharks have played since their November 22, 2015 game. That day, the Sharks played their 21st game of the season. They beat Columbus 5-3 while Thornton was held without a point. It marked the last game during the regular season the Sharks would win without a Thornton point.

There were 28 losses in the streak. The Sharks were 0-12-1 when Thornton did not score. There were 15 games where Thornton scored, but the team lost; ten regulation losses and five in either overtime or the shootout. Over the 61 games, the Sharks were 33-22-6. Predictive Bias

As mentioned before, the predictive bias (the requirement that the other team score at least two goals to stop a streak, which is derived from the definition of the stat) plays a role. In 18 of the 33 wins (55%) during the streak, the Sharks opponents scored one goal or less. In 27 of the 33 wins (82%), they scored two goals or less. Given the bias that the Sharks were known to have scored at least one goal, this shows the Sharks had a better chance at winning these games compared to arbitrarily chosen games.

In order to examine this predictive bias, I took an empirical approach, I looked at what other guys did. I took three groups of players and applied similar criteria. I started with a group of Thornton and his peers. These are high scoring players who played all (or very close) to 82 games for teams which had a good season. The players in this peer group are Patrick Kane, Sidney Crosby, Jaime Benn and Joe Thornton.

In the second group, I selected a teammate of each player in the first group. I selected modest scoring players who played almost every game. These players had between 21 and 32 points for the season. They are Justin Braun, Johnny Oduya, Matt Cullen and Niklas Hjalmarsson. I looked at the team's record in games these mid-level players had a point.

The third group consisted of three high scoring players from weaker teams, Blake Wheeler of Winnipeg, Taylor Hall of Edmonton and Erik Karlsson of Ottawa.

I looked at the team's record when the given player scores a point and when they did not. I added up the team points and divided by the number of games played to get 'team points per game'. We should see the bias show up in these numbers. It should show higher team points per game when the specified player has a player point in the game. A phrase used to describe statistics with a large inherent bias is "paint the bulls-eye around the arrow". If you shoot the arrow first and then decide where it lands is the target, you get great results.