Wednesday, May 02, 2007

the five back rule revisited

this page has taken a fairly firm stance that, at least as far as the playoffs go, this year is already over. falling back so far of the division leader so early in the season, as has been documented here and here, is highly indicative of clubs that are going to fail not only to make the playoffs but to breach .500 at year end. as to the disparity of runs scored to runs allowed, one need only observe the weakness of the correlation between april run differential and final fates to understand that was has thusfar transpired in these terms means little.

the cubs pythagorean projection at this time is .593 -- implying a 96-win pace. that is the third-highest such figure in the lot, and the peer group is a somewhat happier one. of the top 15, fully 8 broke .500 even though only two continued to post run differential at such a high pace as they did in april. six of this sample even managed to break a .525 winning percentage (corresponding to 85 wins). though the mean winning percentage of these fifteen is .491, the standard deviation is .064 -- implying a probable range of .427 to .555.

indeed, as much can be confirmed by a presentation of previous data regarding the five back rule, updated, expanded and newly tabular.

year5 back game10 back gameat year endthrough 4/30
playofffinal gbwlwin%wlrsrapythag
200719      1014112910.593

from this presentation, it doesn't take a mathematician to see that the 2007 cubs are in very deep trouble. if they go on to win their 82nd game later this season, they will be only the second cub club in the last 25 to do so in spite of falling five games back of the division leader before their 70th game was played. (a third, the 1995 club, may have done so had not that season been shortened by a labor dispute.)

looking at the plight of early laggards from the other side, espn's jayson stark offered this analysis.

We looked at every full season since 1982. Here's what we found:

• Of the 144 teams that made it to the postseason in that span, only eight (or 5.6 percent) came out of April more than three games under .500. Clubs that need to worry most about that history lesson: the Yankees (9-14), Astros (10-14), Cardinals (10-14), Cubs (10-14) and Rangers (10-15).

• Just six of those 144 playoff teams (or 4.2 percent) found themselves more than 4½ games out of a playoff spot after April. Clubs that ought to get nervous about that trend: the Cubs, Cardinals and Astros (all five games out).

• And you wouldn't think the standings would mean much this time of year. But more than half of the 120 teams that found themselves in first place after April (66 of 120) wound up finishing first. And 98 of the 120 (81.7 percent) of the teams that finished the season in first place either led their division or were within 2½ games of the lead at the end of April.

going into today's games, the cubs had further fallen six afoul of the milwaukee brewers. again, it must be said -- the greatest of a handful of barriers now is the inertia of the record itself. with the brewers at 17-9, should they play just .500 ball henceforth -- something they're probable to do or better -- they'll finish at 85-77. this cubs team simply aren't going to be that good barring a miracle. matching that record means 75-63 -- 12 over .500, or .543 baseball.

this page would not dispute that there are worse ways to be 10-14 and six back. the cubs are chasing only one good club -- the brewers -- and not three or four, which would all but nail the coffin shut. and having scored more than they've allowed is certainly a more promising indicator than its opposite -- after all, the 1984 and 2003 playoff sides both posted positive run differentials through april that only modestly exceed this one's.

however, so did the 1985 and 1975 clubs -- teams which finished with 77 and 75 wins, respectively. furthermore -- indicative of the tremedous variance of just a month's worth of play -- the 1989 club that finished with 93 wins had allowed ten more (95) than they had scored (85) through the same date. as can be seen in the following chart, the correlation between pythagorean figures and actual ones over large sets of data are good -- the ratio of theoretical to actual records should approach one, as they do here -- but the quality of the correlation in individual data points is poor.

not as tightly scattered as one would like

No comments: