Baseball SeRies: Using maps to plot Baseball Data in R.

Days back I was wondering how I could plot some data on a map using R. Luckily I was able to find some good information by querying Google and Stack Overflow. I mixed the info I got from those sites with some baseball data and now I’m showing you the results!

First of all, make sure you have the required packages to plot the map:

library( package = "ggplot2" )
library( package = "rgeos" )
library( package = "maptools" )
view raw librariesMaps.R hosted with ❤ by GitHub

After that go to the Global Administrative Areas webpage and download a map from the GADM spatial database.  Please note that since I’m doing some MLB stuff in R for  I downloaded the level 1 USA map in an R Spatial Polygons Data Frame format.  If you need a more (or less) detailed map  please check the different map levels offered in the website.

Once you downloaded the file, push it into the R environment using readRDS and transform it into a data.frame using ggplot2::fortify. Note that fortify will transform the spatial object into a data.frame that ggplot2 can understand.  Also, please take a close look at line 8. There I’m a creating a new column in the map dataset that contains the state names without any punctuation character. This is helpful since eventually you will want to join data by country( cities, provinces, etc. ) but  R joins might not work properly with punctuation characters  because of encoding stuff. Line 13 just removes Alaska and Hawaii from my map.

# Read GADM file.
map <- readRDS( file = "USA_adm1.rds" )
# Create Data Frame from spatial object.
map <- fortify( map, region = "NAME_1")
# Create new state column without any special characters.
map$state <- stringi::stri_trans_general( str = map$id
, id = "Latin-ASCII"
)
# Remove Alaska and Hawaii
map <- map[ ! map$state %in% c( "Alaska", "Hawaii" ), ]
view raw mapTransform.R hosted with ❤ by GitHub

Once your map data is ready, it’s time to load some other data you would like to show in your plot. In this case I got the MLB players’ place of birth data from Baseball-Reference and I loaded into R. Kindly note that line 9 does the same string conversion I did before for the map data.

# Read team file
players <- read.csv( file = "players.csv"
, sep = ","
, stringsAsFactors = F
, na.strings = ""
)
# Remove accents, symbols, etc
players$state <- stri_trans_general( str = players$state
, id = "Latin-ASCII"
)
view raw data.R hosted with ❤ by GitHub

Now that all of our data is ready, we are ready to join it:

# Join map and championships
data <- dplyr::left_join( x = map
, y = players
, by = "state"
)
view raw join.R hosted with ❤ by GitHub

And plot it:

# Create map
( ggplot()
+ geom_polygon( data = data
, mapping = aes( x = long
, y = lat
, group = group
, fill = players
)
, color = "white"
)
+ labs( fill = "Players born" )
+ coord_map()
+ theme( panel.grid.minor = element_blank()
, panel.grid.major = element_blank()
, panel.background = element_blank()
, panel.border = element_blank()
, axis.ticks = element_blank()
, axis.text.x = element_blank()
, axis.text.y = element_blank()
, axis.title = element_blank()
)
)
view raw plot.R hosted with ❤ by GitHub

MapaYou can access the full code from here.

 

Deja un comentario