data in science

Shiny Homework

Having a homework for shiny that builds through multiple steps is…kinda nuts. What I’d like you to do is save an R file for each one (prepend 1_, 2_, 3_, etc. so we know which number it is) and then zip it all up to submit. For when you post the app, give us a URL!

There are a LOT of extra credit options here. You don’t have to do them in order, or do them all. But, take a gander. This is all to help you prep for your final project!

We know this data so well

We’ve been using the coronavirus package to this point, but, while it rejiggers itself for automated data updates, let’s tap it from the source!

library(readr)

coronavirus <- read_csv("https://github.com/RamiKrispin/coronavirus-csv/raw/master/coronavirus_dataset.csv",
                        col_types = "ccddDdc")
Province.State Country.Region Lat Long date cases type
Zhejiang China 29.1832 120.0934 2020-05-09 0 recovered
Zhejiang China 29.1832 120.0934 2020-05-10 0 recovered
Zhejiang China 29.1832 120.0934 2020-05-11 0 recovered
Zhejiang China 29.1832 120.0934 2020-05-12 0 recovered
Zhejiang China 29.1832 120.0934 2020-05-13 0 recovered
Zhejiang China 29.1832 120.0934 2020-05-14 0 recovered

Today, we’ll build a shiny app that works with this data.

1. Make a shiny template that runs

You know how to do this. Use Rstudio or hand-code your own template. Save it as 1_app.R. Have it load the data above (straight from the source!) Make sure that it runs.

2. Add a Layout

Choose a layout. It can be a sidebar layout, as we’ve been doing, or, try setting up something different from here. Save this as 2_app.R after making sure your app runs!

3. Got a layout? Cool. Add a theme.

There are a LOT of themes for shiny. The ones that come pre-installed are here along with a tutorial. Make your app stylin! Make sure your app runs.

3a. Extra credit (4 points) A novel theme and layout!

Scour the web for shiny packages (seriously, just google shiny packages or other terms) to use your own unique layout and theme that is not pre-packaged. 2 poitns for theme, 2 points for layout. I’m not saying how many points we’ll give for shiny lcars….

4. Show us the data.

OK, you are all setup. Using tableOutput() in the proper place in your UI and renderTable() in your server, show us the tail of the coronavirus table. Make sure your app runs.

5. Trends

OK, now have it create a ggplot that, using plotOutput() and renderPlot() shows us the worldwide trend in confirmed cases of coronavirus. Note – you’ll have to do some data manipulation here! But you’ve done this before. Have the title tell us what date this is current to. Make sure your app runs. Note – check yourself against a dashboard like this to make sure your results are more or less correct.

6. Make it reactive

Now, add two inputs. One allows users to chose a country. The second allows them to select the type of cases to display (confirmed, death, recovered). Use whatever *Input functions you would like. Now, where you have OUTPUTS for global trends and a tail of a data table, replace them with outputs that show the trend for the selected country and type and the table for the selected country and type. Make sure your app runs.

7. Deploy your app!

Using http://shiny.umb.edu, upload your code, make sure the app works (in case you have to install any libraries), and then tell us the link! If you’ve forgotten how, go back and watch the last 10-15 minutes of our Shiny Lab!

8. Extra Credit (for full credit of another question)

Use tabsets so that all of the outputs are not returned all at once. Deploy this!

9. Map Extra Credit (for full credit of another question)

Using tmap, leafletOutput(), and renderLeaflet() (don’t worry, shiny takes care of making it interactive) add a worldmap somewhere (your choice how to do this layout – will be easier with the tabs!) that shows the final # of whatever type you have selected across a worldmap (hey, you know how to merge this data with a worldmap – maybe you have some code for it, that, for additional extra credit, you can source in!)

10. MOAR Map Extra Credit (for full credit of another question)

Set up the app so that, when you select a country, have it zoom in on the country (replot but center on that country).

11. MOAR MOAR MAP Extra Credit (for 2x full credit of another question)

Can you use information at smaller spatial scales than countries? You might have to use another spatial data source – at least a different function within rnaturalearth or from somewhere else. Either have this smaller-scale information represented on the global map OR when you select a country, plot JUST that country with smaller province/state areas as the internal borders. Good luck!

12. Even More Extra Credit (for full credit of another question)

Go wild. Do something different with this! Use other output types, use other input types, filter differently, map differently – Have fun! And tell us what you did. Deploy it. We can’t wait to see!

Place this order or similar order and get an amazing discount. USE Discount code “GET20” for 20% discount

Posted in Uncategorized

data in science

Here is a set of RDS files that contain sf objects of state county boundaries. We are going to work with these using iteration and functions for some of this week’s work.

  1. Let’s warm up with some SF practice. the function readRDS() reads in RDS files. The dplyr function bind_rows() can take rows of a data frame, tibble, of sf object, and bind them together properly. Using the purrr library, read in all of the counties files and then combine them into a single data frame. Plot the result.
  2. This is great. Now, I’m curious – is there a link between the number of counties in a state and the ratio of area of the largest county in the state to the total state area? Let’s find out!

A. Write a function that, given a state name, will use readRDS to read in a single data file and fix up the CRS (these are all in lat/long – you want a mollweide, in which distance is in meters). Plot Massachusetts to make sure everything works.

B. Write a function that, given an sf object of a single state and its counties, will return a one row data frame with the number of counties, the area of the largest county, the average county area, the state’s area, and the ratio of the largest county to total area. st_area() will help you calculate area – but you will need to as.numeric(), and if you take an sf object and use summarize() on it, it will merge all of the polygons into one.

C. Using iteration, make a data frame that has all of the above information for all of the states. +1 EXTRA CREDIT – have a column named state with the state name. (hint: ?setNames)

D. Plot that largest county ratio to number of counties! What do you learn? +1 extra credit for each exploration beyond this.

  1. Install and load up the package repurrsive. It has an object in it, got_chars with information about the characters from the Game of Thrones series. Notice it is a list of lists. To explore it, check out listviewer::jsonedit(got_chars, mode = "view").

Now, using purrr functions make a tibble with the following columns:

  • name
  • aliases (a list column)
  • gender
  • culture
  • allegiances (a list column)
  1. Who has more aliases on average? Men or women? Visualize however you see fit.
  2. One thing that is cool about list columns is that we can filter on them. We can remove rows with list columns that have a length of 0 with filter(lengths(x) < 0) where x is some column name. Note we are using lengths() and not length().

Another cool thing is that we can always tidyr::unnest() columns to expand them out, repeating, say, names or other elements of a data frame.

A. Select just name and aliases. Filter the resulting data down to something usable, and then unnest aliases. Use the resulting data to determine, who had the most aliases!

B. Great! Now. Let’s use this idea of unnesting to build and then visualize a dataset that shows the breakdown, within each allegiance, whether there are more aliases for men or women. What does this visualization teach you about the different allegiances?

E.C. +8 Write a function that takes a state name, and plots the state, but with height of county as % area using deckgl or mapdeck

Place this order or similar order and get an amazing discount. USE Discount code “GET20” for 20% discount

Posted in Uncategorized

data in science

Here is a set of RDS files that contain sf objects of state county boundaries. We are going to work with these using iteration and functions for some of this week’s work.

  1. Let’s warm up with some SF practice. the function readRDS() reads in RDS files. The dplyr function bind_rows() can take rows of a data frame, tibble, of sf object, and bind them together properly. Using the purrr library, read in all of the counties files and then combine them into a single data frame. Plot the result.
  2. This is great. Now, I’m curious – is there a link between the number of counties in a state and the ratio of area of the largest county in the state to the total state area? Let’s find out!

A. Write a function that, given a state name, will use readRDS to read in a single data file and fix up the CRS (these are all in lat/long – you want a mollweide, in which distance is in meters). Plot Massachusetts to make sure everything works.

B. Write a function that, given an sf object of a single state and its counties, will return a one row data frame with the number of counties, the area of the largest county, the average county area, the state’s area, and the ratio of the largest county to total area. st_area() will help you calculate area – but you will need to as.numeric(), and if you take an sf object and use summarize() on it, it will merge all of the polygons into one.

C. Using iteration, make a data frame that has all of the above information for all of the states. +1 EXTRA CREDIT – have a column named state with the state name. (hint: ?setNames)

D. Plot that largest county ratio to number of counties! What do you learn? +1 extra credit for each exploration beyond this.

  1. Install and load up the package repurrsive. It has an object in it, got_chars with information about the characters from the Game of Thrones series. Notice it is a list of lists. To explore it, check out listviewer::jsonedit(got_chars, mode = "view").

Now, using purrr functions make a tibble with the following columns:

  • name
  • aliases (a list column)
  • gender
  • culture
  • allegiances (a list column)
  1. Who has more aliases on average? Men or women? Visualize however you see fit.
  2. One thing that is cool about list columns is that we can filter on them. We can remove rows with list columns that have a length of 0 with filter(lengths(x) < 0) where x is some column name. Note we are using lengths() and not length().

Another cool thing is that we can always tidyr::unnest() columns to expand them out, repeating, say, names or other elements of a data frame.

A. Select just name and aliases. Filter the resulting data down to something usable, and then unnest aliases. Use the resulting data to determine, who had the most aliases!

B. Great! Now. Let’s use this idea of unnesting to build and then visualize a dataset that shows the breakdown, within each allegiance, whether there are more aliases for men or women. What does this visualization teach you about the different allegiances?

E.C. +8 Write a function that takes a state name, and plots the state, but with height of county as % area using deckgl or mapdeck

Place this order or similar order and get an amazing discount. USE Discount code “GET20” for 20% discount

Posted in Uncategorized

data in science

Must be familiar with Rstudio

Function lab exercises

Function template:

func_name <- function(arg1, arg2, ...) {
  
  func_code_here

  return(object_to_return)

}

In this template:

  • function_name is what you decide to call your function. This is usually a verb that describes what the function does; e.g., ‘get_max_diff’, ‘get_first_year’, …
  • arg1 this is the name of an argument (again you decide what the name is). This is what you will call the input when you are within the body of the function code
  • function_code_here is where you write the code. This is where you transform your inputs into the output

Remember that a function takes input (which could be multiple things), does something to that input, and then returns some kind of output.


Exercises

  1. This may be a type of function you are more familiar with. It is an equation that converts Celsius to Farenheit. A previous student of mine was basically Farenheit-illiterate; she never know what the weather is going to be like. Given this equation, can you write a function that converts a temperature value in Farenheit to Celsius for her?
  • C = (F – 32) x 5/9

Take your function for a spin, does it return the correct values?

  • 32 F = 0 C
  • 50 F = 10 C
  • 61 F = 16.11 C
  • 212 F = 100 C
  • -40 F = -40 C

2a. Given the following code chunk for reading buoy data files in for each year, describe the following:

  • What parts of your code are consistent across every line/code chunk?
  • What parts are different?
  • What is the output that you want your function to return?
buoy_1987 <- read_csv('./data/buoydata/44013_1987.csv', na = c("99", "999"))
buoy_1988 <- read_csv('./data/buoydata/44013_1988.csv', na = c("99", "999"))
buoy_1989 <- read_csv('./data/buoydata/44013_1989.csv', na = c("99", "999"))
buoy_1990 <- read_csv('./data/buoydata/44013_1990.csv', na = c("99", "999"))

2b. Use the str_c() function to write a function that creates the filename for each year. I’ve given you an example below if we were using str_c for just 1986. Consider this your starting point to build out a function.

str_c("./data/buoydata/44013_", 1986, ".csv", sep = "")
## [1] "./data/buoydata/44013_1986.csv"

Extra credit (2 points): Check out the glue package and do the same thing with glue().

2c. Complete the skeleton of this function based on the work that you have done up to now. Describe, in words, what is happening in every step.

read_buoy <- function(_________){
  
  filename <- ___________________________
  
  a_buoy <- read_csv(________________, ____________________)
  
  return(___________)

}

2d. Amend the read_buoy function to allow for a variable buoy number (currently we are using data from buoy 44013, but there are many other numbers/names that could be used!), directory location of the file, and year.

2e. Apply the workflow that you used in 2a – 2c to create a function to clean up the data using a dplyr workflow that will work for 1987, 2000, and 2007 Have it generate daily averaged wave heights and temperatures as well as renaming all of the columns to something understandable. Begin by writing a dplyr workflow for one data frame at a time. Then generalize it. Remember to ask yourself the following questions:

  • What parts of your code are consistent across every line/code chunk?
  • What parts are different?
  • What is the output that you want your function to return?

If you are not sure of some of these things, remember to run the code chunks bit by bit, putting in test values (e.g., one year of data) to ensure that you know what you are working with, what each line is doing, and what the final returned value is. Your answer might look similar to what we did in class, or, very different depending on how you write the function.


Modular Programming


3A-C. Using all that we previously created in the functions week and/or this homework, create a set of functions that, once a buoy is read in, returns a two facet ggplot2 object of a histogram of the difference between wind speed (WSPD) and gust speed (GST) and between the air temperature (ATMP) and water temperature (WTMP), so that you can later format and style it as you’d like. E.C.(+1 per question) Break the templates below into smaller modular functions.

gust_increase_hist <- function(a_year){
  #get the cleaned buoy data
  
  #create a long data frame with each row as a data point, measuring either 
  #difference between air and water OR wind speed and gust speed - one row per measurement
  #with a column that says WHAT that measurement is 
  
  #create a plot
  
}

buoy_measured_diff_long <- function(a_buoy){
  #with one buoy
  
  #calculate differences between ATMP and WTMP as well as WSPD and GST
  
  #pivot to make it long
  
  #return the modified data
}

plot_dual_hist <- function(summarized_buoy){
  #create a ggplot with a single variable as the x
  
  #make a histogram
  
  #facet by the measurement type
}

#test it out!

Final Project Prep


  1. Based on the data set you’re planning to use for your final, do you need to write any functions to clean the data as you bring it in? If so, describe it, and take a stab at writing it. If not, show us that the data loads cleanly.
  2. With the data you just loaded, make one visualization. But, before you do, articulate a question you want to answer with said visualization. What do you think you will see? Now make the plot. Did you see what you expected? What did the data tell you?

Place this order or similar order and get an amazing discount. USE Discount code “GET20” for 20% discount

Posted in Uncategorized

Data in science

Tuto mus be familIar with RStudio

Intro

For this week’s homework, let’s work on mapping the covid-19 data. You have two choices of data source. The first is the coronavirus data we have already loaded.

library(coronavirus)
head(coronavirus)
##   Province.State Country.Region      Lat     Long       date cases      type
## 1                         Japan 35.67620 139.6503 2020-01-22     2 confirmed
## 2                   South Korea 37.56650 126.9780 2020-01-22     1 confirmed
## 3                      Thailand 13.75630 100.5018 2020-01-22     2 confirmed
## 4          Anhui Mainland China 31.82571 117.2264 2020-01-22     1 confirmed
## 5        Beijing Mainland China 40.18238 116.4142 2020-01-22    14 confirmed
## 6      Chongqing Mainland China 30.05718 107.8740 2020-01-22     6 confirmed

The second is a newer dataset. It harvests data that is from the New York Times. It is focused solely on the US. To install it, you’ll need to do the following

#if you don't have it already
install.packages("devtools")

#install the library from github
devtools::install_github("covid19R/covid19nytimes")
library(covid19nytimes)
covid_states <- refresh_covid19nytimes_states()

head(covid_states)
## # A tibble: 6 x 7
##   date       location location_type location_standa… location_standa… data_type
##   <date>     <chr>    <chr>         <chr>            <chr>            <chr>    
## 1 2020-01-21 Washing… state         53               fips_code        cases_to…
## 2 2020-01-21 Washing… state         53               fips_code        deaths_t…
## 3 2020-01-22 Washing… state         53               fips_code        cases_to…
## 4 2020-01-22 Washing… state         53               fips_code        deaths_t…
## 5 2020-01-23 Washing… state         53               fips_code        cases_to…
## 6 2020-01-23 Washing… state         53               fips_code        deaths_t…
## # … with 1 more variable: value <dbl>
covid_counties <- refresh_covid19nytimes_counties()

head(covid_counties)
## # A tibble: 6 x 7
##   date       location location_type location_standa… location_standa… data_type
##   <date>     <chr>    <chr>         <chr>            <chr>            <chr>    
## 1 2020-01-21 Snohomi… county_state  53061            fips_code        cases_to…
## 2 2020-01-21 Snohomi… county_state  53061            fips_code        deaths_t…
## 3 2020-01-22 Snohomi… county_state  53061            fips_code        cases_to…
## 4 2020-01-22 Snohomi… county_state  53061            fips_code        deaths_t…
## 5 2020-01-23 Snohomi… county_state  53061            fips_code        cases_to…
## 6 2020-01-23 Snohomi… county_state  53061            fips_code        deaths_t…
## # … with 1 more variable: value <dbl>

Now, you have three data sets to choose from! Countries, states, or counties. Remember, with the coronavirus data, you have to do some dplyr::summarizing to get it down to countries, though!

Maps to use for this assignment

OK, so, we need world, US state, and US county maps – depending on which of the three datasets you chose

library(sf)
## Linking to GEOS 3.7.2, GDAL 2.4.2, PROJ 5.2.0
#The world
library(rnaturalearth)
world_map <- ne_countries()

#US States
library(USAboundaries)
us_states <- us_states()

#US Counties
us_counties <- us_counties()

Armed with this, let’s make some maps!

Questions

  1. Which data set – or aspect of a single data set, are you most interested in? Sort through the datasets. What is there? Is it the world? A single country? Multiple contries? All states? Counties in one state?

Filter or summarize your data to just what you are interested in, in terms of space.

For example

library(dplyr)
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
florida_covid <- covid_counties %>%
  filter(stringr::str_detect(location, "[Ff]lorida"))

florida_map <- us_counties %>%
  filter(state_name == "Florida")
  1. What type or types of data from that dataset are you interested in? Why? Filter the dataset to that data type only.
  2. What do you want to learn from this slice of the data? Formulate a question and write it out here.
  3. Filter and manipulate the data so that it is in a format to be used to answer the question.
  4. Join the covid data with spatial data to build a map.
  5. Create a map from this data! Make it awesome!
  6. What do you learn from the map you made?
  7. This static map is, I’m sure, great. Load up tmapand make it dynamic! Is there anything different you can learn from this form of visualization?
  8. Last… it’s time to start thinking about your final project. Name the one or more data sources you are interested in exploring. Tell us a bit about what is in them.

Place this order or similar order and get an amazing discount. USE Discount code “GET20” for 20% discount

Posted in Uncategorized

data in science

1. Warmup with some Faded Examples!

Please provide one example per chunk in your RMD file.

Grab Sale_Prices_City.csv and bring it in to R, link to data here.

First, convert it from wide to long, with a column for year/month called time_point.

sales_long <- sales %>%
    ______(___ = time_point, 
           _____ = sale_price, 
           cols = -c(_________:_________))

Drop the NAs

sales_long <- sales_long %>%
    ______(______(sale_price))

Split up year and month into two columns

library(dplyr)
library(stringr)

sales_long <- sales_long %>%
    ____(year = str_split(time_point, "____", simplify=TRUE)[,____],
           month = ____(time_point, "____", ____=____)[,____],
           )

Make the following string:

my_string <- "Some people, when confronted with a problem, think 'I know, I'll use regular expressions.' Now they have two problems."

Make it all uppercase

__________(my_string)

Remove all instances of the letter e

______________(my_string, ______ = "e", replacement = ___)

Remove all instances of the letter e

Split this string into a vector of individual words

split_string <- __________(my_string, pattern = ___, simplify = ______)

Find the words that start with consonants.

str_____(split_string, "___________")

2. Tidying the HadCrut Data

Load up the raw hadcrut data, link here. We’ve been using this in a long format, but it actually is supplied as wide data. Use your skills with tidyr to make it look like the long data we’ve been using in class!

Make sure in this exercise you submit: Code required for loading data, converting to long format, and then use head() to display the first five lines.Be sure to include steps where you check your work with str and and explain the relavent parts of what you see! Feel free to do this in comments. an example of this could be:

#First I will create a vector of 100 random numbers between 0 and 1 using runif
x <- runif(100)
#I will then use mean to find the average of this vector
mean(x)
## [1] 0.5082634

I want to know that you know what your code is doing!

3. Coronavirus

Finish your lab from https://biol355.github.io/Labs/coronavirus.html

4. Your package!

A. What are the most interesting things your package does? Provide examples of each.
B. What are the most essential things your package does – that everyone will want to use it for. Provide examples of each.
C. How does your package complement/enhance/make use of tools already in the R ecosystem? Give examples!

5. Your cheat sheet!

A. In what order would you provide information about this package in a cheat sheet?
B. Based on all of the above, make a sketch of your cheatsheet. This can be on paper, powerpoint, using the template, not using it, whatever. Make a rough sketch showing what you’d show off, and in what format. Scan it, and attach the image to your homework.

Place this order or similar order and get an amazing discount. USE Discount code “GET20” for 20% discount

Posted in Uncategorized

data in science

Analysing a data set, start to finish

For this assignment we’re going to look at birthweights of babies in California from 2000 to 2013. The data has year, groups of birthweights, and counts of number of babies in that group of birth weights. There’s also information on county, zip code, lat/long, etc. These may or may not be useful, but are interesting grouping factors.

1. Warmup: Load up the data! Use skimr to show that you’ve done so properly, and everything is as it should be.

2. What is the number of children in each birthweight category in Sacramento, CA across the entire dataset?

3. Are there trends in the birthweights of children in Truckee over time? Visualize this by looking at the number of individuals in each birthweight category in each year. Make the plot as easy to understand as possible. Extra credit for making it fancy and not just a default plot. Note, Truckee has multiple zip codes, so, you’re going to want to make sure to sum over all zip codes in the city!

4. Are there any trends by Latitude (e.g., North-South)? To answer this…

4a. Create a new column, lat_group where you use cut_interval to cut the data into 10 groups.
4b. Calculate the birthweight count for each birthweight group and latitude group – but also calculate the mean latitude in that group.
4c. Plot! Remember, make your axes, labels, and titles informative! That mean latitude will help you make a good plot. Trust me!

4d. Well that was unsatisfying, given the different number of births at different latitudes. Can you redo this, correcting for number of individuals in each latitude bin to make it more useful? So, percent in each bin? Make those axes on the plot tell us what is going on! Hint: to do this, you’ll need to group not just once, but twice. The first time, but latitude group and birthweight group, the second, you’ll just use latitude group and instead of summarize, use a mutate to caculate percent in each birthweight category. Don’t forget to ungroup at the end!

4e. What did visualization by raw numbers versus percent tell you? How are they different? Why might they be different? What does this tell you about visualization and analysis of population trends in general?

5. Extra credit (variable, depending on awesomeness of data viz)

Note that there is latitude and longitude information in this data. Can you use that in some way to plot out anything interesting in the data in terms of geographic distribution. Note, log(x+1) transformations may be your friend for some things. Or correcting by population size. Have fun with this! Feel free to look into geospatial visualization with ggplot2 or other packages – although we’ll do this more formally in a few weeks. Might not be necessary, but, you never know.

Place this order or similar order and get an amazing discount. USE Discount code “GET20” for 20% discount

Posted in Uncategorized