Wednesday, March 28, 2007

the fallacy of the weak division

in edified circles, many an optimistic analysis of how the chicago cubs might fare in 2007 in terms of wins and losses has for some months hinged on the argument of the weakness of the division in which the cubs play. it is a point this page has made with reference not only to this year but 2006 as well -- more's the pity.

to the credit of analytical optimists commenting on the 2007 cubs, few have been trying to say the cubs are really any good -- just that they're better than their selected lot. indeed, a recent back-of-the-napkin figuring for the pitching staff runs as follows:

taking an average factor for injury:

zambrano - 220 ip - 3.75 ra ... hill - 200 ip - 4.30 ra ... lilly - 120 ip - 4.60 ra ... marquis - 200 ip - 5.20 ra ... this is what to expect out of the starting five, imo -- 740 innings in which they allow about 365 runs -- ~4.43 runs a game. the team is going to have to pitch about 1460 innings.

that means the rest of the staff -- wade miller, angel guzman, sean marshall and whomever else ends up with a start -- plus the extended bullpen that includes wood, novoa, cherry and whomever else -- pitch about 720.

with wood out, the bullpen is not radically changed this year from last -- nor is the starting staff probably going to eat vastly more innings. last year the cubs pen tossed 562 innings allowing 282 runs. let's say they go 540 allowing 280 (which is better than i think they'll actually do, fwiw). totals so far -- 1280 ip -- 645 runs.

now, the remaining 180 innings are usually a disaster for most clubs which highlights their lack of rotation depth. not unusual for this to be 6+ r/g spread over several spot starters -- in the cubs case, miller, guzman, cotts, marshall, etc.

if we figure (conservatively) 6.00 for these last 180, we get about 120 runs. our new totals -- 1460 ip -- 765 runs.

that's 4.71 runs a game, and a 74-run improvement on 2006's 834. in last year's nl, that would've been good for 6th among all staffs. and this is the conservative estimate, mind you. i'll personally be surprised if the staff ra gets under 4.8 runs/game.


this is only one of a few different casual approximations that puts the cub staff in this area. offensive calculations for run generation have already settled in an area around 4.8 runs per game -- making the cubs then to score about as much as they allow, the very definition of a .500 baseball club.

this all, however, is a framework constructed to establish the cubs in a historical context contrasted to the national league in general. should the proper context in fact allow for the projected division instead?

there's merit to that, but only to a point.

the league weights the schedule to intensify divisional play, and central teams indeed play about half their games in division -- a greater proportion than either east or west thanks to the extra team in the central, but still just half. for half the schedule, the team is playing the divisions on either coast. so while half clearly are played against what is probably the weakest division in baseball, the equal remainder are not. but what is that worth?

first, it must be recognized that if, say, the nl west is the weakest division, the difference between that situation and this is one-quarter of the games played -- where you expect now to play half in the weakest division, you would have played a quarter. this amounts to 40 games or so.

second, how much weaker is weak -- that is, what advantage in winning percentage can be expected? in a sport in which the best clubs beat the worst clubs perhaps seven times in ten while evenly matched clubs play to loggerheads, it should be seen that -- even if the cubs were the metaphorical 1927 yankees and the weak division consisted of 1962 mets all -- the difference over 40 games would amount to about eight contests. that is of course very significant -- but the cubs are not the 1927 yankees. they're an average ballclub. in that context, this page would estimate the advantage to the cubs for playing in the weakest division in baseball as opposed to a historically average one probably amounts to no more than two wins.

that is a fair enough basis in reason. but can more be said empirically?

2006IntradivisionalExtradivisional
 WLPctWLPct
       
 NY45290.60846270.630
 Philadelphia41340.54739300.565
 Atlanta35380.47939350.527
 Florida33420.44036330.522
 Washington31420.42533380.465
 1851850.5001931630.542
       
 St. Louis39420.48139260.600
 Houston45320.58430370.448
 Cincinnati46380.54828350.444
 Milwaukee37450.45132330.492
 Pittsburgh34440.43630390.435
 Chicago42420.50020430.317
 2432430.5001792130.457
       
 SD39360.52042300.583
 LA43310.58140330.548
 SF37380.49331400.437
 Colorado31440.41334380.472
 Arizona37380.49335370.486
 1871870.5001821780.506


in 2006, the central was also the weakest division in the game as measured by extradivisional wins and losses (a record of 179-213, or .457). did the leaders indeed fatten up on the sorry clubs of the bottom rung with the advantage of playing them more often, offsetting their sorry beatings to teams like the mets and dodgers?

as it turns out, insignificantly. the two leading clubs -- saint louis and houston, the only two winning records in the division -- went an aggregate 84-74 (.531) within the central and 69-63 (.522) outside of it (the remainder of the games having been played against the american league). the difference in winning percentages implies, over a 40-game sample, a net gain to each club of 0.4 wins.

this experiment can of course be repeated for past seasons as well -- so this page did, for 2005 and 2004.

2005IntradivisionalExtradivisional
 WLPctWLPct
       
 Atlanta42330.56041310.569
 Philadelphia38370.50743290.597
 Florida34390.46639350.527
 NYMets38360.51440330.548
 Washington34410.45335340.507
 1861860.5001981620.550
       
 St.Louis51290.63839280.582
 Houston43360.54439290.574
 Milwaukee38410.48135330.515
 ChicagoCubs43360.54430380.441
 Cincinnati33460.41833350.485
 Pittsburgh30500.37532380.457
 2382380.5002082010.509
       
 SanDiego39340.53436350.507
 Arizona41320.56228430.394
 SanFrancisco38350.52131400.437
 LADodgers33410.44633370.471
 Colorado32410.43829450.392
 1831830.5001572000.440


in 2005, the weakest division in the league was the west, with an aggregate extradivisional winning percentage of just .440; its lone winning club, san diego at 82-80, went 39-34 (.534) in division and 36-35 (.507) outside of it. the implied difference in wins over a 40-game lot is 1.1.


2004IntradivisionalExtradivisional
 WLPctWLPct
       
 Atlanta51250.67137310.544
 Philadelphia39370.51338300.559
 Florida43330.56633350.485
 NYMets29470.38232360.471
 Montreal28480.36832360.471
 1901900.5001721680.506
 St.Louis54360.60040200.667
 Houston55350.61130300.500
 ChicagoCubs50400.55631290.517
 Cincinnati38520.42233270.550
 Pittsburgh37520.41633270.550
 Milwaukee35540.39324360.400
  269  269 0.500 191  169 0.531
 LADodgers47290.61836320.529
 SanFrancisco41350.53939290.574
 SanDiego42340.55337310.544
 Colorado39370.51321470.309
 Arizona21550.27624440.353
 1901900.5001571830.462


in 2004 the weakling was again the west at .462; weighted down by not only 68-win colorado but an epically bad 51-win arizona club. surely here, the division leaders must have feasted -- but the data shows that the dodgers, giants and padres, all of whom won 87 or more, went an aggregate 130-98 in division (.570) and 112-92 (.549) without. again, the implied difference in wins over a 40-game lot is 0.8.

as should now be clear, playing in the weak sister of divisions is not the panacea that many a cub fan (including yours truly) would like it to be -- and to that we must add the uncertainty about whether or not the central will really be as weak as so many expect. it seems to this page that many have overestimated the likely benefit, and in so doing committed a common-enough error of selection bias without thorough statistical vetting. if playing in the central adds a win to the cub total in the end, so much the better -- but this cub club is going to have to stand on its own merits, and not the weaknesses (real or perceived) of its geographic rivals.

No comments: