1060west: slow starts and run differential

the internet is one of the greatest tools for community research yet invented, dear reader, and when it works it can work very well.

as part of a dialogue at another cubs blog, this writer revisited some earlier commentary made here.

and then there is the situation of wins and losses itself. as was noted by contributor john dooley in the comments, no cub club has started 4-7 or worse and reached the playoffs -- and since 1972, a sample of eight teams, no cub club has started 4-7 or worse and broken .500. as jd looked further, now at the entirety of the major leagues since the advent of postseason play, only 27 of 326 playoff teams have started 4-7 or worse -- and since 1995, a period in which 25% of clubs get to the postseason, only 9 of 96 playoff clubs (9%) started 4-7 or worse.

subsequently i added:

i looked back at both leagues on april 15 for the last seven seasons (2000-2006). these teams were playing .400 baseball or worse: ...

58 teams, 9 playoff appearances (including two world champions).

there are 210 club-seasons and 56 playoff-seasons in this seven year sample.

that means those who were over .400 as of april 15 (152 teams) took the other 47 playoff slots.

to reiterate:

under .400 - 9/58 - 15.5%
over .400 — 47/152 - 30.1%

i’d say that the cubs’ chances of appearing in the postseason have been cut in about half. a lot of things *can* happen, of course, but that’s the probability.

also, 38 of the 58 (66%) did not break .500 — meaning that the cubs are in a pool that sees two out of three fail to break 81 wins.

it was rightly noted there that the cubs have thusfar this season scored more runs than they have allowed, and that this rarity (for a 4-7 ballclub, now 5-8) may offer some reprieve from that harsh judgment of probability.

the results of further examination find some difference, but unfortunately not very much.

just four of the 58 had positive run differentials — but only one of those four broke .500, none making the playoffs. four isn’t anything like a significant sample, however, so this writer also looked back to 1973 for sub-.400 clubs through the first eleven games with at least as many runs scored as allowed, arriving at the following list.

year	team	wins	losses	rs	ra
2005	det	71	91	57	53
2005	chc	79	83	52	50
2003	lad	85	77	47	45
2001	fla	76	86	57	51
1999	mon	68	94	52	52
1999	ari	100	62	50	48
1999	col	72	90	33	33
1997	tor	76	86	39	31
1997	nyy	96	66	70	59
1991	sfg	75	87	49	46
1990	sfg	85	77	43	42
1988	mil	87	75	46	41
1987	chc	76	85	41	41
1986	atl	72	89	33	32
1984	mil	67	94	53	49
1984	hou	80	82	44	42
1983	tor	89	73	45	43
1982	nyy	79	83	38	35
1982	mil	95	67	53	52
1980	nyy	103	59	41	33
1979	nym	63	99	40	38
1978	cle	69	90	33	31
1975	nym	82	80	29	27
1974	kcr	77	85	45	39

24		80.1	81.7

as we can see, of the 24 clubs that qualify for the sample since 1973, only nine (highlighted in yellow) finished over .500 -- or 37.5%. the average win total for the sample was 80.1.

in the earlier 58-team sample, 20 clubs managed .500 clubs (34.5%) with the average win total at just 74.8.

clearly, in taking the narrower qualifier that applies to the 2007 cubs, we have improved our view of their chances -- but probably not by large enough a margin to see them as a playoff club. and that should make some sense, for at this early stage runs scored and allowed is a measure of future performance with more noise than signal.

this plot is of the 2006 cubs 13-day cumulative run differential -- that is, it plots the difference between runs scored and runs allowed over every 13-game stretch of last year (which is also how many games they've played in 2007 thusfar). as one can see, though the 2006 cubs were a bad team that finished the year with a 118-run deficit and the average 13-game differential was a crushing (-12), what they had done in any 13-game stretch varied between (-54) and +19, with a standard deviation of 16 runs -- that is to say, the 2006 cubs in any 13-game stretch had a run differential of (-12) +/- 16 -- which means they were between +4 and (-28) about 70% of the time. that kind of very wide spread is characteristic of any 13-game measure of run differential, for any club of any record or quality.

so, when we look at today's +8 for the 2007 cubs, how should we interpret it? it implies a .567 pythagorean winning percentage, but does it lie toward the top of the variance? the bottom? the middle? one can suppose many things, but there is really no way of knowing at this point -- there is not enough data, and so we may safely presume to learn very little from run differential so early in the year. because any 13 games of run differential should not be, therefore, closely correlated with future performance, sorting for it in our sample of sub-.400 clubs makes only a slight difference in the outcome.

in truth, dear reader, runs scored and allowed simply doesn't have much to say this early in the year -- the randomness of scoring overwhelms the underlying signal.

we might also note, however, that the standard deviation of wins in our narrowed 24-team sample is also 10.6 from the average of 80.1 -- meaning that, based only upon what we know about slow-starting ballclubs with positive run differentials, there is significant room for the strangeness and uniqueness of the 2007 cubs to take hold, and that some 15% or so of ballclubs should manage to win 90 games from even this lowly start.

1060west

Wednesday, April 18, 2007

slow starts and run differential

No comments:

1060w Archive

Admin