Baseball SeRies: Getting Innings Pitched from Retrosheet with R.

Innings Pitched( IP ) is one of the most important statistics used in baseball  to measure a pitcher’s performance. As it name suggests, IP can represent the total number of innings a pitcher played not only in a game, but also in a season, during his stay on a team or  throughout his career. Furthermore, this stat can also be applied at  team level, league level or season level.

Based on the fact that a half-inning consists of three outs and hence that 1 out represents 1/3 of a half-inning, the IP metric can be calculated for any pitcher by converting the total number of batters and baserunners that were put out while he was at the mound into innings.  Having said that, IP can be mathematically written as:

IP = Quotient( O / 3 ) + Remainder( O / 0.3 )

Where Quotient embodies complete innings pitched and Remainder stands for spare outs. Confusing? I’m sure it is, so lets dive into some examples so that you can understand this better:

  • On the night of June 26th 1990, Fernando «El Toro» Valenzuela( LAD ) and Dave Stewart( OAK ) both pitched a no-hitter game. One as well as the other witnessed 27 outs that day, in other words, the two of them pitched 9 innings.
  • In that same year, the Cincinnati Reds managed to putout 4369 offensive players. They became World Champions after only playing 1456.1 defensive innings.
  • 112691 baserunners and batters were sent back to the dugout during the 1990 season. In total, 37563.2 innings were pitched during that year.

The Code

Along these series I’ll be mining several baseball data banks ( i.e: Retrosheet, Lahman Database ), but whatever the data source is, I will always be making use of the data.table and dplyr packages for the sake of simplicity, comprehensibility and execution performance.

For this exercise, I’ll be using Retrosheet’s 1990 event and game files, which you can download from here.  Assuming that the game and event files are located in the current R working directory and that the data.table and dplyr packages are already installed, we are ready to look into the code:

Deja un comentario