Friday, May 11, 2007

trend spotting

in a recent post, this page laid out the case for measurable trends in run differential. today this writer would like to present some preliminary work regarding measuring and spotting changes in trend.

quantitative analytics for forecasting fractal patterns into the future is still a very primitive art. as is so often the case in humankind, the bulk of the work has been directed to profit, and virtually everything to be applied in this post can be found in any primer on financial quantitative methods. at this stage particularly, the intention is to keep it simple.

it should go without saying that forecasting the future is an error-prone business. the average person in this society is hopelessly manichaean, totally incapable of assessing odds as opposed to thinking in dire absolutes, and faced constantly with the prospect of reducing beyond utility or merely discarding many of life's important (and not-so-important) complexities. many will see what is about to be presented as so much sound and fury signifying nothing. nor would one hope to teach such people much about how to trade in markets, or quantitatively manage most any significant human enterprise. so be it. but for those with a capacity and willingness to assess not in black and white but a multitude of grays, what follows will hopefully be engaging if not enlightening.

these four charts represent the last four years' pythagorean records over three timeframes -- year-to-date in dark gray, prior 26 games in orange, prior 12 games in yellow. also recorded is the actual winning percentage of the club in a lighter gray.

overlaid on a second axis is a moving average convergence-divergence line (macd), which quantifies the difference between the 12-game and 26-game net run differential.

what can we learn from these charts?

first, in taking over all four the difference between actual winning percentage and the year-to-date pythagorean estimate, one can begin to see that the two first start to materially converge -- in 2003, in 2005, in 2006 -- at a point beyond 60 games played. the apparent exception would at first seem to be 2004, where a gap persisted between the two records for the duration of the season -- but a finer interpretation notes that the gap in fact largely stabilizes outward from the 60-game area. indeed, in 2004, the team -- much as they have to date this year -- significantly underperformed their pythagorean record early in the year. on june 1 of that year the cubs stood 27-24 in spite of having scored 247 and allowed 206, yielding a pythagorean estimate of .582 or 30-21. the club never regained those three games against their estimate, and in fact finished the year five under.

what this confirms for us is two things, neither particularly hopeful for the 2007 cubs. first is that holding a pythagorean estimate of .607 at this juncture (150 scored and 118 allowed) is not as indicative of future success as might be imagined. the variance of the sample of this size is still too great -- though it is certainly a great contribution to the sample of the first 60 games, which may tell us something far more significant. the second lesson -- taken from the 2004 club -- is that lost opportunities are lost. that the cubs have put on a brilliant run of scoring more than the opposition must be tempered by the knowledge that they've come out of it at a disappointing 16-16. they won't be handed back the three games they've underperformed their estimate by, except through the unlikely action of some future spurt of good luck to cancel the bad they've suffered.

but what more particularly might be said about the trends in the data?

it should be noted that crossover points in the 12-game and 26-game run differential correspond to points where the two exchange positions, superior to inferior -- when the two are equal, the macd line is zero. what is the subsequent behavior of the estimated records following such crossover points?

in general, it can be seen that, when the the macd line crosses into positive territory and 12-game record exceeds the 26-game, the subsequent behavior of the 26-game record in general and with exceptions is to improve. it can also be noted that the 12-game record, although it naturally anticipates the 26-game, also tends to continue trending higher after the fact -- usually for a period of some 10-20 games. this verifies the earlier assertion that, in general, run differential behaves with a self-affirming trend -- if it has increased in the previous interval, its tendency is to continue increasing in the next. the opposite behavior is also generally observed.

so one might use these crossover points as a means of anticipating positive performance for the club over the subsequent set of games, somewhere between 10-20 games in duration.

another observed feature of the behavior of the macd line over the four years is its tendency to reverse from extremes -- that is, when the gap between the 12-game and 26-game run differential grows very large, it often indicates a reversal in trend to be imminent.

this is perhaps most clearly seen in 2006, when the macd line was twice driven down to or below (-2.0), both times presaging a reversal in play which eventually saw the club up to a highwatermark 12-game estimate of .600+. the opposite again is observed as well -- from maximum values in 2006 over +1.5, play reverted for the worse, driving down both the pythagorean estimates and the actual winning percentage. similar reversals from extrema are seen in all four years -- generally from a macd absolute reading in excess of 1.5.

this introductory work leaves much still to be desired, and it would one hopes go without saying (at least in more thoughtful circles) that forecasting is not prophecy. but it is at least a basis for future endeavor, as well perhaps as a rudimentary tool to be applied to the current season.

the youth of the year is no longer a hindrance in observing 26-game samples, and so one is able to now see that the macd line is both positive and not at an extreme, and 12-game is superior to the 26-game estimated record. while not a prophecy of dominance headed into a difficult seven-game east coast swing, these must be considered positive indications going forward, suggesting that the improving trend of pythagorean record has potential distance left to run.

also of interest may be the chart of another nearby ballclub.

this chart suggests that the brewers' amazing 24-10 start, which has put the cubs a season-high (or -low, really) seven games back headed into today's games, may be coming to a head. it may be viewed as hopeful that the macd line here has hit a local maxima of 1.49 -- a typically dangerous level -- and a decline from this point is an event that would commonly initiate any decline to and through a crossover point.

the future remains unknowable, of course, but the conclusion of this view must be that the cubs may be on the cusp of an opportunity to close at least some of the very wide gap that sits between them and the division-leading brewer ballclub.

