Data: Read

We’ll need to read the data and transform it from a wide table (with many columns) to a long one (with only a few columns) so that each row contains a unique value, ie an answer, and then we can easily query for any question or combination. With this data structure we can create and use generic plotting functions that accept any question or combination of questions to generate visualizations.

For the long data format from the polling data, we want a table with these columns:

heading question answer survey_id value_num value_chr

To arrive at this more generic data structure for querying, you can use the function tidy_poll().

For more background on the how to read and manipulate data, you can check out the following cheatsheets found in RStudio’s Help menu or at Cheatsheets - RStudio:

# load libaries
library(calcoastpoll) # devtools::load_all() # devtools::install()
library(tidyverse)    # see tidyverse.org for packages loaded 
library(plotly)       # use ggplotly() to make plot interactive
library(DT)           # for rendering interactive datatable()

# paths and parameters of poll data
data_xlsx           <- "CoastalOpinionPoll_thru2017.xlsx"
headers_xlsx        <- "CoastalOpinionPoll_thru2017_headers.xlsx"
row_end             <- 12891
cols_chr            <- c(2,4:7,10:13,46,167,256,263,434,437,438,447,455,460,487)
dir_diagnostic_csvs <- "."
data_rds            <- "data.rds"

# tidy up data and save as csv for reading next time
if (file.exists(data_rds)){
  #d <- read_csv("data.csv", col_types = cols(value_num = col_double()))
  d <- read_rds(data_rds)
} else {
  d <- tidy_poll(data_xlsx, headers_xlsx, row_end, cols_chr, dir_diagnostic_csvs)  
  #write_csv(d, "data.csv")               # csv too big (155 MB) for Github
  write_rds(d, data_rds, compress = "xz") # only 696 KB compressed
}

Here are direct downloads to files:

Data: Questions

Now we can easily look at the questions and how many answers are associated and display with some interactivity using the DT::datatable() function.

qs <- d %>% 
  group_by(heading, question) %>% 
  summarise(n_answers = n())
datatable(qs)

Bar: CA ocean health better?, by year

Let’s use another custom function plot_pctbar_qyn_year() to look at how yes/no/unsure answers to a question (qyn) vary over years.

p <- plot_pctbar_qyn_year(d, "CA ocean health better?")
p

Bar: CA ocean health better?, by year, interactive

To make a graph interactive, we simply feed the plot object to the plotly::ggplotly() function.

Bar: Climate change problem?, by Education

Let’s use another custom function plot_pctbar_qyn_qc() to look at how yes/no/unsure answers to a question (qyn) vary by another categorical question (qc).

plot_pctbar_qyn_qc(d, "Climate change problem?", "Education")

Bar: Climate change problem?, by Env Issues

Or plot_pctbar_qyn_qc() by a different categorical question (qc).

plot_pctbar_qyn_qc(d, "Climate change problem?", "Env Issues")

Treemap: Recreational Activities

A good way for evaluating composition is with a treemap, here by categorical question (qc) using custom plot_treemap_qc():

plot_treemap_qc(d, "Recreational Activities")

Treemap: Recreational Activities, animated

Finally, we can generate an animated gif using animate_treemap_qc_year() to look at composition over time.

library(gganimate)

q   <- "Recreational Activities"
gif <- paste(q, "animated_treemap.gif")

# animate to gif
if (!file.exists(gif)){
  animate_treemap_qc_year(d, q, gif)
}

Now include the gif in the document with the following markdown:

![](`r gif`)