We’ll need to read the data and transform it from a wide table (with many columns) to a long one (with only a few columns) so that each row contains a unique value, ie an answer, and then we can easily query for any question or combination. With this data structure we can create and use generic plotting functions that accept any question or combination of questions to generate visualizations.
For the long data format from the polling data, we want a table with these columns:
heading | question | answer | survey_id | value_num | value_chr |
---|---|---|---|---|---|
… |
To arrive at this more generic data structure for querying, you can use the function tidy_poll()
.
For more background on the how to read and manipulate data, you can check out the following cheatsheets found in RStudio’s Help menu or at Cheatsheets - RStudio:
# load libaries
library(calcoastpoll) # devtools::load_all() # devtools::install()
library(tidyverse) # see tidyverse.org for packages loaded
library(plotly) # use ggplotly() to make plot interactive
library(DT) # for rendering interactive datatable()
# paths and parameters of poll data
data_xlsx <- "CoastalOpinionPoll_thru2017.xlsx"
headers_xlsx <- "CoastalOpinionPoll_thru2017_headers.xlsx"
row_end <- 12891
cols_chr <- c(2,4:7,10:13,46,167,256,263,434,437,438,447,455,460,487)
dir_diagnostic_csvs <- "."
data_rds <- "data.rds"
# tidy up data and save as csv for reading next time
if (file.exists(data_rds)){
#d <- read_csv("data.csv", col_types = cols(value_num = col_double()))
d <- read_rds(data_rds)
} else {
d <- tidy_poll(data_xlsx, headers_xlsx, row_end, cols_chr, dir_diagnostic_csvs)
#write_csv(d, "data.csv") # csv too big (155 MB) for Github
write_rds(d, data_rds, compress = "xz") # only 696 KB compressed
}
Here are direct downloads to files:
Now we can easily look at the questions and how many answers are associated and display with some interactivity using the DT::datatable()
function.
Let’s use another custom function plot_pctbar_qyn_year()
to look at how yes/no/unsure answers to a question (qyn
) vary over years.
p <- plot_pctbar_qyn_year(d, "CA ocean health better?")
p
To make a graph interactive, we simply feed the plot object to the plotly::ggplotly()
function.
ggplotly(p)
Let’s use another custom function plot_pctbar_qyn_qc()
to look at how yes/no/unsure answers to a question (qyn
) vary by another categorical question (qc
).
plot_pctbar_qyn_qc(d, "Climate change problem?", "Education")
Or plot_pctbar_qyn_qc()
by a different categorical question (qc
).
plot_pctbar_qyn_qc(d, "Climate change problem?", "Env Issues")
A good way for evaluating composition is with a treemap, here by categorical question (qc
) using custom plot_treemap_qc()
:
plot_treemap_qc(d, "Recreational Activities")
Finally, we can generate an animated gif using animate_treemap_qc_year()
to look at composition over time.
library(gganimate)
q <- "Recreational Activities"
gif <- paste(q, "animated_treemap.gif")
# animate to gif
if (!file.exists(gif)){
animate_treemap_qc_year(d, q, gif)
}
Now include the gif in the document with the following markdown:
![](`r gif`)