1060west: case studies in team babip extremes

i pointed out in a long, rambling diatribe the other day this little nugget:

as difficult as this may be for many to comprehend (as it is certainly not what anyone wants to believe), i think these cubs have actually been not at all unlucky to get to 44-43. it's true that their runs-based pythagorean estimate is (at 46-41) a couple games better, but that misses the point about how these runs were scored and allowed. this cubs offense posted a .306 team babip in the first half; that is the highest such figure since the first half of 1984 (.308). on the pitching side, the team benefitted from a first-half .279 babip against; that is the lowest such figure since the first half of 1992 (.269). both these clubs -- and virtually all who see such disparities in any half -- saw reversions back toward the league mean follow. moreover, they've hit quite well with men on and with runners in scoring position and pitched even better -- both being trends equally unlikely as babip to last.

hidden as it was well into a loosely "organized" rant, i thought it might be a good idea to highlight just how weirdly lucky the 2007 cubs have been through the 87 games leading up to the break, and what a break in that luck might mean.

most people who would bother to read this site know most of the basics about batting average on balls in play (babip), i suspect. there's a lot of resources on the web about it, and we don't need to repeat all that here. suffice to hit a few high points:

exhaustive study available on the web has dmonstrated that pitchers generally have little control over whether balls put in play are line drives, popups, groundballs or flyballs. hitters seem to have more control, but still not a great deal -- those who end a greater percentage of their at-bats with line drives tend to have a higher natural babip, but the human ceiling for that sustainable skill appears to be a percentage in the high-twenties. the major league average is around 18%.

in any very large typical sample encompassing the major leagues, a ball in play will fall for a hit about 29.5% of the time -- a babip of .295 -- give or take a couple tenths of a percent. the typical variation from that mean depends on how many at-bats are in the sample. as can be seen by looking at the past 20 years of cub teams by halves (simplified as pre- and post-all-star-break), extremes can lie from the 310s to the 270s over any single half-season of baseball.

in smaller samples -- such as any individual half-season for any player -- variation can be much on the order of 30% of the value. this for two reasons. first, any player often will for short times hit more or fewer line drives than he will over longer timeframes; second, batted balls of all types can either fall in or be converted into outs at unusual rates. both of these factors, having no apparent causation to me and being random and transient, are what i would term "luck".

approximating the power of this randomness is what i'm trying to address here. i don't think anyone who has watched the game for a long time would deny that randomness plays a huge, often deciding factor in a sport where the worst teams are all but guaranteed to win a third of their games and the best to lose a third -- indeed, where the 2006 saint louis cardinals can win the world series. so if we acknowledge the importance of random variation, perhaps we can work to quantify aspects of it and maybe even make some generalizations about it. babip is one way we might do that, i think, though certainly not the only way.

first a marker of how rare this situation is.

team	year	half	g	r	avg	obp	slg	babip	r/g
OPP	2002	2	76	356	255	330	413	314	4.68
OPP	1999	1	85	483	294	352	485	313	5.68
OPP	1998	1	87	413	267	336	410	312	4.75
CHC	2007	1	87	396	269	327	412	306	4.55
CHC	1998	1	87	439	268	339	437	306	5.05
CHC	1997	2	75	314	267	319	400	305	4.19
CHC	1994	2	27	131	270	325	423	305	4.85
OPP	2004	2	75	320	251	325	398	305	4.27
OPP	2000	2	76	415	272	355	448	305	5.46
OPP	1999	2	77	437	277	340	447	305	5.68
OPP	1990	1	85	437	277	349	410	305	5.14
OPP	1988	2	78	356	274	328	400	305	4.56
CHC	2003	1	94	417	262	330	419	304	4.44
CHC	1995	2	75	372	272	336	445	304	4.96
OPP	1995	2	75	370	273	348	436	303	4.93
OPP	1994	2	27	123	274	333	432	303	4.56
CHC	1989	2	76	360	270	331	411	302	4.74
OPP	1997	1	87	392	267	340	433	301	4.51
OPP	1993	1	87	388	273	333	420	301	4.46
OPP	1998	2	76	379	266	336	441	300	4.99
CHC	2004	1	87	400	269	328	449	299	4.60
CHC	1993	1	87	374	274	319	411	299	4.30
OPP	2001	1	86	351	242	318	380	299	4.08
OPP	2001	2	76	350	256	324	420	299	4.61
OPP	1993	2	76	351	273	330	421	299	4.62
OPP	2003	2	68	264	239	321	370	298	3.88
OPP	1994	1	86	426	267	336	416	298	4.95
CHC	2006	2	74	359	271	322	443	297	4.85
CHC	2002	2	76	350	252	324	422	297	4.61
CHC	1999	1	85	432	266	339	445	297	5.08
CHC	1992	2	75	293	271	319	386	297	3.91
OPP	2006	2	74	386	255	344	423	297	5.22
OPP	1990	2	77	337	264	329	393	297	4.38
CHC	1998	2	76	392	261	335	430	296	5.16
OPP	2003	1	94	419	242	327	374	296	4.46
OPP	1989	2	76	301	261	327	389	296	3.96
CHC	2001	2	76	376	264	339	435	295	4.95
CHC	2005	1	87	394	270	325	447	294	4.53
CHC	1997	1	87	373	259	323	392	294	4.29
CHC	1993	2	76	364	267	332	417	294	4.79
CHC	1988	1	85	350	266	314	396	293	4.12
OPP	2002	1	86	403	251	332	401	293	4.69
OPP	1997	2	75	367	265	338	437	293	4.89
OPP	1996	2	75	374	268	339	437	293	4.99
OPP	1991	2	78	350	260	323	387	292	4.49
CHC	2006	1	88	357	265	317	404	291	4.06
CHC	1995	1	69	321	258	317	413	291	4.65
OPP	2005	1	87	394	252	328	419	291	4.53
CHC	1994	1	86	369	256	324	398	290	4.29
CHC	1990	1	85	378	269	322	411	290	4.45
CHC	1988	2	78	310	255	306	369	290	3.97
OPP	2005	2	75	320	247	321	392	290	4.27
CHC	2003	2	68	307	255	313	413	289	4.51
CHC	1999	2	77	315	248	318	392	289	4.09
OPP	2000	1	86	488	264	342	455	289	5.67
CHC	2005	2	75	309	269	323	432	288	4.12
CHC	2000	1	86	426	259	338	435	288	4.95
CHC	2000	2	76	338	252	331	383	288	4.45
CHC	2001	1	86	401	258	334	425	286	4.66
OPP	2006	1	88	448	254	340	439	286	5.09
OPP	2004	1	87	345	244	318	391	286	3.97
OPP	1988	1	85	338	256	322	367	286	3.98
CHC	1990	2	77	312	257	304	371	285	4.05
CHC	2004	2	75	389	266	328	469	283	5.19
CHC	1996	2	75	379	257	325	410	283	5.05
OPP	1996	1	87	397	253	322	410	283	4.56
OPP	1991	1	82	384	255	325	381	283	4.68
CHC	1989	1	86	342	252	309	365	281	3.98
OPP	1992	2	75	314	256	330	388	280	4.19
CHC	1996	1	87	393	246	315	393	279	4.52
OPP	2007	1	87	368	241	314	394	279	4.23
CHC	1991	1	82	363	259	315	404	277	4.43
OPP	1995	1	69	301	250	317	385	277	4.36
CHC	2002	1	86	356	240	319	404	271	4.14
OPP	1992	1	87	310	236	311	346	269	3.56
CHC	1991	2	78	332	245	303	376	267	4.26
OPP	1989	1	86	322	239	307	342	267	3.74
CHC	1992	1	87	300	239	297	344	264	3.45

as you can see, the cubs are similtaneously experiencing their best offensive half in over 20 years in terms of babip while, at the same time, experiencing their fourth-best defensive performance. this is a very favorable conincidence, and i've been harping for some time that it's bound to mean revert and performance will probably suffer.

what do i mean by that exactly?

on the batting side, i'm actually not so worried. not only can i imagine this club getting some better slugging going forward, but the recent case history of babip mean reversion is not very fearful. here are the five most outrageous (by babip) offensive halves of the 19 years preceding 2007, compared to the other half of that same year. of course teams don't remain exactly the same from half to half, but they do remain very similar and so we'll compare on that basis.

team	year	half	g	rs	avg	obp	slg	babip	rs/g	gpa

CHC	2003	1	94	417	262	330	419	304	4.44	1013
CHC	2003	2	68	307	255	313	413	289	4.51	976

CHC	1998	1	87	439	268	339	437	306	5.05	1047
CHC	1998	2	76	392	261	335	430	296	5.16	1033

CHC	1997	2	75	314	267	319	400	305	4.19	974
CHC	1997	1	87	373	259	323	392	294	4.29	973

CHC	1995	2	75	372	272	336	445	304	4.96	1050
CHC	1995	1	69	321	258	317	413	291	4.65	984

CHC	1994	2	27	131	270	325	423	305	4.85	1008
CHC	1994	1	86	369	256	324	398	290	4.29	981

while in all five cases gross production average declined, it was by an average of 2.8% as compared to a 4.2% decline in babip. in three of the five cases, that was small enough a difference for other sources of random variation to actually slightly increase runs scored per game -- though, on average, run production fell 2.2%, or 0.12 runs/game.

the pitching side is somewhat more ominous.

team	year	half	g	ra	avg	obp	slg	babip	ra/g	gpa

OPP	1989	1	86	322	239	307	342	267	3.74	895
OPP	1989	2	76	301	261	327	389	296	3.96	978

OPP	1992	1	87	310	236	311	346	269	3.56	906
OPP	1992	2	75	314	256	330	388	280	4.19	982

OPP	1995	1	69	301	250	317	385	277	4.36	956
OPP	1995	2	75	370	273	348	436	303	4.93	1062

OPP	1991	1	82	384	255	325	381	283	4.68	966
OPP	1991	2	78	350	260	323	387	292	4.49	968

here, with an average rise in babip of 6.9%, opposition gross production jumped an average of 7.2% from one half to the next, with runs allowed per game rising over 8% or 0.30 runs/game -- and this in spite of the fact that, in 1992, the second half falloff was still one of the best five babip halves!

one can imagine reasons why the offensive falloff might be less than the defensive -- it focuses on a narrower group of players (the cubs, as opposed to the rest of the nl), for one, and they may be idiosyncratic. i'll leave that speculation to the reader -- for me it's enough to note the fact of it. but i will say that a small number of trials may have opened a window of distortion.

in any case, these would be disconcerting reductions if only one side, pitching or offense, had been the beneficiary of strange luck. but that both sides, pitching and offense, have leaves me cold with regard to this club's prospects in the second half. the cubs in the first half plated 4.55 runs a game and allowed 4.23. if this club were to experience the simple average of these cases of the post-inflated-babip blues, those figures would look in the second half more like 4.45 and 4.53 -- a pythagorean estimate of .492 baseball, or 37-38 from here to october.

full disclosure -- i've no idea at all what is going to happen in the second half. it could well be that the lucky keep on getting lucky. it could well be that other factors, random or non-random, intervene to override whatever babip mean reversion that does come. this is intended to be only part of a rational basis of expectation, and no kind of crystal ball. but it's certainly worth noting on the evidence.

1060west

Thursday, July 12, 2007

case studies in team babip extremes

No comments:

1060w Archive

Admin