Thursday, July 12, 2007

case studies in team babip extremes

i pointed out in a long, rambling diatribe the other day this little nugget:

as difficult as this may be for many to comprehend (as it is certainly not what anyone wants to believe), i think these cubs have actually been not at all unlucky to get to 44-43. it's true that their runs-based pythagorean estimate is (at 46-41) a couple games better, but that misses the point about how these runs were scored and allowed. this cubs offense posted a .306 team babip in the first half; that is the highest such figure since the first half of 1984 (.308). on the pitching side, the team benefitted from a first-half .279 babip against; that is the lowest such figure since the first half of 1992 (.269). both these clubs -- and virtually all who see such disparities in any half -- saw reversions back toward the league mean follow. moreover, they've hit quite well with men on and with runners in scoring position and pitched even better -- both being trends equally unlikely as babip to last.


hidden as it was well into a loosely "organized" rant, i thought it might be a good idea to highlight just how weirdly lucky the 2007 cubs have been through the 87 games leading up to the break, and what a break in that luck might mean.

most people who would bother to read this site know most of the basics about batting average on balls in play (babip), i suspect. there's a lot of resources on the web about it, and we don't need to repeat all that here. suffice to hit a few high points:

exhaustive study available on the web has dmonstrated that pitchers generally have little control over whether balls put in play are line drives, popups, groundballs or flyballs. hitters seem to have more control, but still not a great deal -- those who end a greater percentage of their at-bats with line drives tend to have a higher natural babip, but the human ceiling for that sustainable skill appears to be a percentage in the high-twenties. the major league average is around 18%.

in any very large typical sample encompassing the major leagues, a ball in play will fall for a hit about 29.5% of the time -- a babip of .295 -- give or take a couple tenths of a percent. the typical variation from that mean depends on how many at-bats are in the sample. as can be seen by looking at the past 20 years of cub teams by halves (simplified as pre- and post-all-star-break), extremes can lie from the 310s to the 270s over any single half-season of baseball.

in smaller samples -- such as any individual half-season for any player -- variation can be much on the order of 30% of the value. this for two reasons. first, any player often will for short times hit more or fewer line drives than he will over longer timeframes; second, batted balls of all types can either fall in or be converted into outs at unusual rates. both of these factors, having no apparent causation to me and being random and transient, are what i would term "luck".

approximating the power of this randomness is what i'm trying to address here. i don't think anyone who has watched the game for a long time would deny that randomness plays a huge, often deciding factor in a sport where the worst teams are all but guaranteed to win a third of their games and the best to lose a third -- indeed, where the 2006 saint louis cardinals can win the world series. so if we acknowledge the importance of random variation, perhaps we can work to quantify aspects of it and maybe even make some generalizations about it. babip is one way we might do that, i think, though certainly not the only way.

first a marker of how rare this situation is.

teamyearhalfgravgobpslgbabipr/g
OPP20022763562553304133144.68
OPP19991854832943524853135.68
OPP19981874132673364103124.75
CHC20071873962693274123064.55
CHC19981874392683394373065.05
CHC19972753142673194003054.19
CHC19942271312703254233054.85
OPP20042753202513253983054.27
OPP20002764152723554483055.46
OPP19992774372773404473055.68
OPP19901854372773494103055.14
OPP19882783562743284003054.56
CHC20031944172623304193044.44
CHC19952753722723364453044.96
OPP19952753702733484363034.93
OPP19942271232743334323034.56
CHC19892763602703314113024.74
OPP19971873922673404333014.51
OPP19931873882733334203014.46
OPP19982763792663364413004.99
CHC20041874002693284492994.60
CHC19931873742743194112994.30
OPP20011863512423183802994.08
OPP20012763502563244202994.61
OPP19932763512733304212994.62
OPP20032682642393213702983.88
OPP19941864262673364162984.95
CHC20062743592713224432974.85
CHC20022763502523244222974.61
CHC19991854322663394452975.08
CHC19922752932713193862973.91
OPP20062743862553444232975.22
OPP19902773372643293932974.38
CHC19982763922613354302965.16
OPP20031944192423273742964.46
OPP19892763012613273892963.96
CHC20012763762643394352954.95
CHC20051873942703254472944.53
CHC19971873732593233922944.29
CHC19932763642673324172944.79
CHC19881853502663143962934.12
OPP20021864032513324012934.69
OPP19972753672653384372934.89
OPP19962753742683394372934.99
OPP19912783502603233872924.49
CHC20061883572653174042914.06
CHC19951693212583174132914.65
OPP20051873942523284192914.53
CHC19941863692563243982904.29
CHC19901853782693224112904.45
CHC19882783102553063692903.97
OPP20052753202473213922904.27
CHC20032683072553134132894.51
CHC19992773152483183922894.09
OPP20001864882643424552895.67
CHC20052753092693234322884.12
CHC20001864262593384352884.95
CHC20002763382523313832884.45
CHC20011864012583344252864.66
OPP20061884482543404392865.09
OPP20041873452443183912863.97
OPP19881853382563223672863.98
CHC19902773122573043712854.05
CHC20042753892663284692835.19
CHC19962753792573254102835.05
OPP19961873972533224102834.56
OPP19911823842553253812834.68
CHC19891863422523093652813.98
OPP19922753142563303882804.19
CHC19961873932463153932794.52
OPP20071873682413143942794.23
CHC19911823632593154042774.43
OPP19951693012503173852774.36
CHC20021863562403194042714.14
OPP19921873102363113462693.56
CHC19912783322453033762674.26
OPP19891863222393073422673.74
CHC19921873002392973442643.45



as you can see, the cubs are similtaneously experiencing their best offensive half in over 20 years in terms of babip while, at the same time, experiencing their fourth-best defensive performance. this is a very favorable conincidence, and i've been harping for some time that it's bound to mean revert and performance will probably suffer.

what do i mean by that exactly?

on the batting side, i'm actually not so worried. not only can i imagine this club getting some better slugging going forward, but the recent case history of babip mean reversion is not very fearful. here are the five most outrageous (by babip) offensive halves of the 19 years preceding 2007, compared to the other half of that same year. of course teams don't remain exactly the same from half to half, but they do remain very similar and so we'll compare on that basis.

teamyearhalfgrsavgobpslgbabiprs/ggpa
           
CHC20031944172623304193044.441013
CHC20032683072553134132894.51976
           
CHC19981874392683394373065.051047
CHC19982763922613354302965.161033
           
CHC19972753142673194003054.19974
CHC19971873732593233922944.29973
           
CHC19952753722723364453044.961050
CHC19951693212583174132914.65984
           
CHC19942271312703254233054.851008
CHC19941863692563243982904.29981



while in all five cases gross production average declined, it was by an average of 2.8% as compared to a 4.2% decline in babip. in three of the five cases, that was small enough a difference for other sources of random variation to actually slightly increase runs scored per game -- though, on average, run production fell 2.2%, or 0.12 runs/game.

the pitching side is somewhat more ominous.

teamyearhalfgraavgobpslgbabipra/ggpa
           
OPP19891863222393073422673.74895
OPP19892763012613273892963.96978
           
OPP19921873102363113462693.56906
OPP19922753142563303882804.19982
           
OPP19951693012503173852774.36956
OPP19952753702733484363034.931062
           
OPP19911823842553253812834.68966
OPP19912783502603233872924.49968



here, with an average rise in babip of 6.9%, opposition gross production jumped an average of 7.2% from one half to the next, with runs allowed per game rising over 8% or 0.30 runs/game -- and this in spite of the fact that, in 1992, the second half falloff was still one of the best five babip halves!

one can imagine reasons why the offensive falloff might be less than the defensive -- it focuses on a narrower group of players (the cubs, as opposed to the rest of the nl), for one, and they may be idiosyncratic. i'll leave that speculation to the reader -- for me it's enough to note the fact of it. but i will say that a small number of trials may have opened a window of distortion.

in any case, these would be disconcerting reductions if only one side, pitching or offense, had been the beneficiary of strange luck. but that both sides, pitching and offense, have leaves me cold with regard to this club's prospects in the second half. the cubs in the first half plated 4.55 runs a game and allowed 4.23. if this club were to experience the simple average of these cases of the post-inflated-babip blues, those figures would look in the second half more like 4.45 and 4.53 -- a pythagorean estimate of .492 baseball, or 37-38 from here to october.

full disclosure -- i've no idea at all what is going to happen in the second half. it could well be that the lucky keep on getting lucky. it could well be that other factors, random or non-random, intervene to override whatever babip mean reversion that does come. this is intended to be only part of a rational basis of expectation, and no kind of crystal ball. but it's certainly worth noting on the evidence.

No comments: