As pointed out in Practicing Sabermetrics, one of the major characteristics of Baseball is the fact that the dimensions of a field may vary from one ballpark to the next. Furthermore, these dimensions (including the size of the foul territory, the height of the outfield walls, the distance from the infields to the fences and others) may impact weather a park is more conductive to offensive or defensive play.
Park Factors (PF’s) measure the effect the dimensions of a field have on the performance of a particular team by comparing the stats of the team at home vs. the stats of the team on the road.
While there are multiple formulas out there (some of them involving several complex metrics) to compute PF’s, here I’ll only be showing you to code what I believe is the most basic formula for getting this metric. Please note that a PF higher than 1 favors the hitter, a PF below 1 favors the pitcher and a PF equal to 1 means a park is neutral.
The first step for calculating Park Factors using the above formula has to do with collecting the runs scored by the MLB teams. Said that, the f_runs method loads the 1974-2014 Retrosheet’s game files and gets the runs scored by every team as home or visitant.
On the other hand the f_park_factor method gets the park factors for every field using n historical seasons. This way if the p_hist is equal to 5 and p_year is equal to 2014, the f_park_factor function will aggregate the runs scored in a field during the 2014, 2013, 2012, 2011, and 2010 seasons before computing the PF’s.
Moreover this piece of code will compute a year column to let the user know how many years used to compare a specific PF, this due to the fact that fields not have complete historical data for a p_hist value.