# Data Viz The Glossy

**Analysis of AFL**, and kindly contributed to R-bloggers]. (You can report issue about the content on this page here)

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.

One of the fun things when doing graphs, is that moment when you identify something that sticks out. A great example of this is done by ethan_meldrum

In it we can see how much Paul Seedsman 2018 has stood out so far. Lets give it the ggplot/fitzRoy treatment.

# Step One – Get the Data

library(fitzRoy) library(tidyverse) df<-fitzRoy::player_stats df1<-fitzRoy::get_footywire_stats(9514:9593) df<-df%>%filter(Season != 2018) df2<-rbind(df, df1) df2%>%select(Season,TO, MG, Player) %>% filter(Season==2018) %>% group_by(Player)%>% summarise(Turnovers=sum(TO), Meters_gained=sum(MG))

`df<-fitzRoy::player_stats`

this gets footywire data from 2010-2018 some round I say some round because this requires a weekly update from James or myself which sadly doesn’t always happen.`df1<-fitzRoy::get_footywire_stats(9514:9593)`

this gets the footywire data for 2018, 9514 is the first game of 2018 and 9593 was the most recent game as of writing this post (23/05/2018)- We want to stack these dataframes on top of each other but we are away that some of the
`df`

might contain 2018 data so we just get rid of it using`filter(Season!=2018)`

`df2<-rbind(df, df1)`

stacks our datasets together and calls it`df2`

- From
`df2`

we then select the columns we want`Season`

,`TO`

(Turnovers),`MG`

(Meters gained) and`Player`

- Then we filter out just this years data
`filter(Season==2018)`

- then we
`group_by`

player*note if we wanted season on season comparisions we would group_by (Season, Player) and not use the filter(Season==2018)* We then

`summarise`

the data, get the Season total for each players Turnovers and meters gained.`summarise(Turnovers=sum(TO), Meters_gained=sum(MG))`

# Step 2

This is an assumption we are making here that we know the players we want to highlight. But lets say we do and the players are Rory Laird, Bryce Gibbs, Paul Seedsman and Tom Mitchell. We need to create a flag for them so we can subset them out for highlighting down the track.

We do this using the `mutate`

.

`mutate(flag = ifelse(Player %in% c("Rory Laird","Bryce Gibbs",Paul Seedsman","Tom Mitchell"), T, F))`

Lets break this down

- step a create a variable called flag
`flag=`

- step b
`ifelse(Player %in% c("P1", "P2"))`

if Player is in this list (c(“P1”, P2)) - step c , Give value T otherwise give value F
`,T,F))`

df2%>%select(Season,TO, MG, Player) %>% filter(Season==2018) %>% group_by(Player)%>% summarise(Turnovers=sum(TO), Meters_gained=sum(MG)) %>% mutate(flag = ifelse(Player %in% c("Rory Laird", "Bryce Gibbs", "Paul Seedsman", "Tom Mitchell"), T, F))

# Step 3 - Give it the ggplot treatment

ggplot(aes(x=Meters_gained, y=Turnovers)) + geom_point(aes(colour=flag), show.legend = FALSE) + geom_smooth(method='lm',formula=y~x) + geom_text(aes(label=ifelse(Player %in% c ("Rory Laird", "Bryce Gibbs", "Paul Seedsman", "Tom Mitchell"), Player, ""),vjust=-1))

- 1
`ggplot(aes(x=Meters_gained, y=Turnovers))`

creates a blank canvas with our x variable being`Meters_gained`

and y being`Turnovers`

- 2
`geom_point(aes(colour=flag), show.legend=FALSE)`

this creates our graph dots we are taking the xy from ggplot and using`aes(colour=flag)`

to give our points different colours depending on the value of`flag`

- 3
`geom_smooth(method=lm, formula=y~x)`

this gives us our line of best fit (linear regression`method=lm`

) - 4
`geom_text`

this gives us our labels, but we don’t want all the players so we subset the labels. We do this using another`ifelse`

5 you can think of the ifelse as

*if the Player is in this list , give them label of Player, otherwise blank “”*- 5a where
`Player`

is the list of all Players, checking if any player is in this list`c ("Rory Laird","Bryce Gibbs", "Paul Seedsman", "Tom Mitchell")`

and if they are give them their original label`Player`

otherwise a blank label`" "`

5b

`vjust`

moves the labels so not directly over the dot

# Put it all together

library(fitzRoy) library(tidyverse) ## -- Attaching packages --------------------------------------------------------------- tidyverse 1.2.1 -- ## v ggplot2 2.2.1 v purrr 0.2.4 ## v tibble 1.4.2 v dplyr 0.7.4 ## v tidyr 0.8.0 v stringr 1.3.0 ## v readr 1.1.1 v forcats 0.3.0 ## -- Conflicts ------------------------------------------------------------------ tidyverse_conflicts() -- ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() df<-fitzRoy::player_stats df1<-fitzRoy::get_footywire_stats(9514:9593) ## Getting data from footywire.com ## Finished getting data df<-df%>%filter(Season != 2018) df2<-rbind(df, df1) df2%>%select(Season,TO, MG, Player) %>% filter(Season==2018) %>% group_by(Player)%>% summarise(Turnovers=sum(TO), Meters_gained=sum(MG)) %>% mutate(flag = ifelse(Player %in% c("Rory Laird", "Bryce Gibbs", "Paul Seedsman", "Tom Mitchell"), T, F))%>% ggplot(aes(x=Meters_gained, y=Turnovers)) + geom_point(aes(colour=flag), show.legend = FALSE) + geom_smooth(method='lm',formula=y~x) + geom_text(aes(label=ifelse(Player %in% c ("Rory Laird", "Bryce Gibbs", "Paul Seedsman", "Tom Mitchell"), Player, ""),vjust=-1))

**leave a comment**for the author, please follow the link and comment on their blog:

**Analysis of AFL**.

R-bloggers.com offers

**daily e-mail updates**about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.

Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.