Wednesday, April 18, 2007

slow starts and run differential

the internet is one of the greatest tools for community research yet invented, dear reader, and when it works it can work very well.

as part of a dialogue at another cubs blog, this writer revisited some earlier commentary made here.

and then there is the situation of wins and losses itself. as was noted by contributor john dooley in the comments, no cub club has started 4-7 or worse and reached the playoffs -- and since 1972, a sample of eight teams, no cub club has started 4-7 or worse and broken .500. as jd looked further, now at the entirety of the major leagues since the advent of postseason play, only 27 of 326 playoff teams have started 4-7 or worse -- and since 1995, a period in which 25% of clubs get to the postseason, only 9 of 96 playoff clubs (9%) started 4-7 or worse.

subsequently i added:

i looked back at both leagues on april 15 for the last seven seasons (2000-2006). these teams were playing .400 baseball or worse: ...

58 teams, 9 playoff appearances (including two world champions).

there are 210 club-seasons and 56 playoff-seasons in this seven year sample.

that means those who were over .400 as of april 15 (152 teams) took the other 47 playoff slots.

to reiterate:

under .400 - 9/58 - 15.5%
over .400 — 47/152 - 30.1%

i’d say that the cubs’ chances of appearing in the postseason have been cut in about half. a lot of things *can* happen, of course, but that’s the probability.

also, 38 of the 58 (66%) did not break .500 — meaning that the cubs are in a pool that sees two out of three fail to break 81 wins.

it was rightly noted there that the cubs have thusfar this season scored more runs than they have allowed, and that this rarity (for a 4-7 ballclub, now 5-8) may offer some reprieve from that harsh judgment of probability.

the results of further examination find some difference, but unfortunately not very much.

just four of the 58 had positive run differentials — but only one of those four broke .500, none making the playoffs. four isn’t anything like a significant sample, however, so this writer also looked back to 1973 for sub-.400 clubs through the first eleven games with at least as many runs scored as allowed, arriving at the following list.

yearteamwins lossesrsra
1999ari 100625048
1988mil 87754641
1984mil 67945349
1982mil 95675352
24 80.181.7  

as we can see, of the 24 clubs that qualify for the sample since 1973, only nine (highlighted in yellow) finished over .500 -- or 37.5%. the average win total for the sample was 80.1.

in the earlier 58-team sample, 20 clubs managed .500 clubs (34.5%) with the average win total at just 74.8.

clearly, in taking the narrower qualifier that applies to the 2007 cubs, we have improved our view of their chances -- but probably not by large enough a margin to see them as a playoff club. and that should make some sense, for at this early stage runs scored and allowed is a measure of future performance with more noise than signal.

this plot is of the 2006 cubs 13-day cumulative run differential -- that is, it plots the difference between runs scored and runs allowed over every 13-game stretch of last year (which is also how many games they've played in 2007 thusfar). as one can see, though the 2006 cubs were a bad team that finished the year with a 118-run deficit and the average 13-game differential was a crushing (-12), what they had done in any 13-game stretch varied between (-54) and +19, with a standard deviation of 16 runs -- that is to say, the 2006 cubs in any 13-game stretch had a run differential of (-12) +/- 16 -- which means they were between +4 and (-28) about 70% of the time. that kind of very wide spread is characteristic of any 13-game measure of run differential, for any club of any record or quality.

so, when we look at today's +8 for the 2007 cubs, how should we interpret it? it implies a .567 pythagorean winning percentage, but does it lie toward the top of the variance? the bottom? the middle? one can suppose many things, but there is really no way of knowing at this point -- there is not enough data, and so we may safely presume to learn very little from run differential so early in the year. because any 13 games of run differential should not be, therefore, closely correlated with future performance, sorting for it in our sample of sub-.400 clubs makes only a slight difference in the outcome.

in truth, dear reader, runs scored and allowed simply doesn't have much to say this early in the year -- the randomness of scoring overwhelms the underlying signal.

we might also note, however, that the standard deviation of wins in our narrowed 24-team sample is also 10.6 from the average of 80.1 -- meaning that, based only upon what we know about slow-starting ballclubs with positive run differentials, there is significant room for the strangeness and uniqueness of the 2007 cubs to take hold, and that some 15% or so of ballclubs should manage to win 90 games from even this lowly start.

No comments: