# Baseball SeRies: Getting Games Played from Retrosheet with R.

Few days back I was giving a quick read to the Win Shares  book written by Bill James and Jim Hezler, when I was suddenly disconcerted by step 11, in page 14 :

11. For fielders,  give the player one Win Share for every 25 games at catcher, one for every 76 games at first base, one for every 28 games at second base, one for every 38 games at third base, one for 25 games at shortstop and one for every 48 games in the outfield.

You must be asking yourself  «What is so disturbing about this?»  Well, think about HOF Johnny Bench for a minute.  It is very well known that Johnny was an illustrious catcher who played for the Cincinati Reds from 1968 to 1983, but few remember that during his years playing for the Big Red Machine  he also performed as First Baseman, Third Baseman and even as Left Fielder.  So the million-dollar question is:  How do you figure out the number of games that Johnny got to play at any of these positions?

Assuming that a standard game lasts 27 defensive outs per team, then we can estimate the amount of games that Bench played at a given position by dividing the total number of outs that occurred while he was at that position by 27:

G = Quotient( O / 27 )

From this formula we can deduce that the Little General played 136 games as catcher in the course of the 1970 season.

Innings & Outs

Is important to mention that most of the times while your calculating Games Played you will get some spare outs due to Remainder( O / 27 ). Nevertheless, you can turn these outs into Innings and Remaining Outs by using below formulas:

I =  Quotient( Remainder( O / 3 ) )

O = Remainder(  Remainder( O / 3 ) )

Please note that all formulas shown in this post can be simplified. However, these are very simple.

The Code:

For this activity, I’ll be using Retrosheet’s 1970 event and game files, which you can download from here.  Assuming that the game and event files are located in the current R working directory, we are ready to dive into the code.

 library( package = 'dplyr' ) library( package = 'data.table' ) library( package = 'reshape2' ) # Column names l_e_names <- c( 'GAME_ID','CATCHER','FIRST_BASEMAN','SECOND_BASEMAN' , 'THIRD_BASEMAN','SHORTSTOP','LEFT_FIELDER','CENTER_FIELDER' , 'RIGHT_FIELDER', 'EVENT_OUTS_CT' ,'TEAM_ID' ) # Column types l_e_cols <- c( 'character' , rep( x = 'NULL', times = 17 ) , rep( x = 'character', times = 8 ) , rep( x = 'NULL', times = 14 ) , 'numeric' , rep( x = 'NULL', times = 58 ) , 'character' , rep( x = 'NULL', times = 58 ) ) # Push the 1970 season into the environment. d_e_1970 <- fread( input = 'all1970.csv' , sep = ',' , colClasses = l_e_cols , col.names = l_e_names ) # Transform the dataset: # IDs: GAME_ID, OUTS & FIELDING TEAM # Measures: All fielding positions, except for pitcher d_r_1970 <- melt( data = d_e_1970 , id.vars = l_e_names[ c( 1, 10, 11 ) ] , measure.vars = l_e_names[ -c( 1, 10, 11 ) ] , variable.name = 'POSITION' , value.name = 'PLAYER_ID' , variable.factor = F ) # Get Games, Innings and Outs per player & team d_p_games <- group_by( .data = d_r_1970, TEAM_ID, PLAYER_ID, POSITION ) %>% summarise( T_O = sum( x = EVENT_OUTS_CT, na.rm = T ) ) %>% mutate( G = T_O %/% 27 , I = ( T_O %% 27 ) %/% 3 , O = ( T_O %% 27 ) %% 3 , TEXT = paste( G,'game(s),' , I, 'inning(s) and' , O, 'out(s)' , sep = ' ' ) )
view raw GamesFielding.R hosted with ❤ by GitHub