Thursday, August 17, 2006

from the inbox: dusty's pitcher abuse -- a numerical study

the following is an article from our email inbox written by frequent commenter shawndgoldman -- it has also since been offered at bcb. the analysis certainly deserves maximum exposure, and this page is happy to post it here in sympathy with our long-running critique of dusty baker's pitching-use practices, as well for your edification, comments and enjoyment. thanks to shawndgoldman for sending it our way.

After Monday night's senseless 121-pitch outing by Carlos Zambrano, many of us were enraged at Baker's pointless overworking of his "caballo". Furthermore, there is ample evidence that Baker thinks this sort of work is good for Zambrano's career. For example:

"I talked to him about being himself and doing what he's capable of doing," Baker said. "To be a horse, to be a caballo, like an iron man under adverse conditions.

"If you're going to be truly great eventually, you're going to have to be that kind of person."


To me, this stinks of Dusty throwing his horse under the bus in order to save his ass...

However, despite Dusty's usage of Zambrano this year, despite his frightening comments from Monday, and despite his horrible track record of pitcher abuse, i was still skeptical of Dusty's real effect on pitch counts, as i recalled a recent Hardball Times article that claimed Dusty's "abuse" only added up to ~3-4 pitches per start. That article, written by David Gassko, attempted to determine how many pitches per start Dusty's starters were expected to throw:

"Let's take every pitcher season beginning in 2000, 394 in total, with six seasons worth of data, and try to predict pitch counts while controlling for everything in the universe that needs to be controlled for. In this case, "everything" means hits, walks, strikeouts, league, and year. Essentially what we're asking is this: 'Given that a pitcher allowed this many hits and this many walks, struck out this many batters, played in this season, and in this league, how many pitches per start would we expect him to throw?'"


More rigorously, he used the following methodology:

"I used an ordinary least-squares regression with Pitches/Start as my dependent variable, and Year, Hits/BFP, BB/BFP, K/BFP, NL, and 'Baker' as my independent variables. The Years were there essentially as constants. Hits had a negative relationship with Pitches/Start, as you might expect, so the more hits a pitcher allows per plate appearance, the more likely he is to be pulled early. Walks and Strikeouts both had a positive relationship, though the coefficient for walks was somewhat unexpected. The most likely explanation is a combination of the following three things: (1) Walks have a positive correlation with Ks, and high-K pitchers will generally be the ones who stay in for longest, (2) It takes a lot of pitches to walk a batter, and (3) A walk are not as costly as a hit, so a high-BB pitcher can still be good. The results of my regression are listed below. All estimates were significant at the 1% level."


The results? Gassko's model predicted the "Dusty" variable was worth 3.67 pitches/start. That's not a horrible number, but it is significant. However, i had some concerns about his analysis. Specifically, it seems to be answering the wrong question. The question Gassko answered was basically "On average, how many extra pitches per outing is the result of Dusty being the manager?" I think a more appropriate question would be "For Dusty's stable of 'caballos,' how many pitches per outing is the result of Dusty being the manager?" Fortunately, it is fairly simple to reproduce Gassko's method. In this case, we'll make a least squares regression model for every season during Dusty's tenure with the Cubs (2003-2006), and will use the same variables Gassko used, except we will exclude Dusty as a variable. We can then compare the expected number of pitches per game for Dusty-managed starting pitchers to the actual number of pitches/game they threw. Here are the results (GS=Games started, P/GS=Pitches/GS, Exp. P/GS=model prediction for P/GS, "Dusty effect"=(Exp P/GS)-P/GS. The important number is the "Dusty effect" which is a measure of the pitches Dusty "added" to an average start by a particular pitcher over the course of a given season. Positive numbers mean the pitcher threw more than would be expected given his performance that season; negative numbers mean the opposite.)


YearLast NameFirst NameGSPP/StartEXP P/GS"Dusty effect"
2006ZambranoCarlos262913112.04100.05911.979
2006MarmolCarlos12117497.8391.6216.212
2006HillRich873692.0089.7942.206
2006PriorMark983592.7893.398-0.620
2006MateoJuan18888.0085.0822.918
2006MadduxGreg25210484.1694.259-10.099
2006MarshallSean19166587.6392.944-5.313
2006GuzmanAngel544689.2092.243-3.043
2006RuschGlendon975283.5693.083-9.527
2006WoodKerry431779.2594.405-15.155
2006WilliamsJerome214773.5081.295-7.795
2006RyuJae Kuk12828.0073.591-45.591
2005ZambranoCarlos333558107.82101.1366.682
2005RuschGlendon19187398.5892.7765.803
2005PriorMark272827104.70103.6211.083
2005MitreSergio768097.1493.0774.066
2005WilliamsJerome20189094.5092.7501.750
2005KoronkaJohn328494.67103.445-8.778
2005DempsterRyan659398.8397.4701.363
2005MadduxGreg35309988.5495.416-6.873
2005WoodKerry1088088.00104.668-16.668
2005HillRich432380.7592.168-11.418
2005LeicesterJon15959.0084.859-25.859
2004ZambranoCarlos313471111.9799.29212.676
2004RuschGlendon161609100.5695.9024.661
2004WoodKerry222221100.95100.3570.598
2004ClementMatt30299299.73100.511-0.778
2004PriorMark21206198.14101.200-3.057
2004MadduxGreg33292588.6496.904-8.268
2004MitreSergio976484.8989.005-4.116
2003PriorMark303391113.03104.5618.473
2003ZambranoCarlos323396106.1395.03311.092
2003WoodKerry323540110.63104.3926.233
2003ClementMatt32314298.1997.5360.651
2003EstesShawn28259192.5487.4285.108
2003CruzJuan657195.1796.335-1.169
2003MitreSergio213668.0077.782-9.782
2003-2006Total6106008198.4997.3121.182


What can we glean from this data? Well, there does seem to be a fairly consistent pattern: every year, Dusty picks a horse or two (cough, Z, cough), who he overworks significantly, while letting the other members of the staff take on a lighter load. This keeps the average "Dusty effect" less than 2 pitches per start even though Dusty is really abusing his best pitchers. Remember, this data is already corrected for performance. In other words, although one would expect Mark Prior, Kerry Wood, and Carlos Zambrano to throw more pitches due to their effectiveness, one would not expect their load to increase nearly as much as it does under Baker. Conversely, the pitchers at the back of Baker's rotations have pitched less than one would expect, even when their sub-standard performances are taken into account. The numbers for Zambrano are particularly disconcerting. It seems that, with the lone exception of a "gentle" 2005, Dusty has left Zambrano in for 11-12 pitches more than one would predict given Zambrano's performance in those seasons. That's a lot. Keep in mind that those 10 extra pitches are the ones that will do the most damage to Zambrano's arm. (For those of you wondering how pitcher abuse points are calculated, its the number of triple digit pitches in an outing cubed, and then summed over each start.) If Zambrano were to see 10 pitches less per outing, his pitcher abuse points would be way down and i for one would be far less concerned about his future health. It may be that Zambrano's arm truly is indestructible, and that it can take year after year of overuse. However, a lost season in which Zambrano's team has an atomically-small chance of making the playoffs would be a good time to rest his arm for the future, not to test the hypothesis that it is made of adamantium.


for what it's worth, this writer couldn't agree more.

No comments: