First install the latest version of R from here
Then install the latest version of RStudio from here
Launch RStudio and check that it shows
R version 3.5.2 (2018-12-20) – “Egshell Igloo”
Copyright (C) 2018 The R Foundation for Statistical Computing
You can customize the panes via Tools -> Global Options...
Panes can be detached . This is very helpful when you want another application next to the pane or behind it, or if you are using multiple monitors since then you can execute commands in one monitor and watch the output in another monitor.
You also have a spellcheck; use it to catch typos.
Now we install some packages via Tools -> Install Packages...
The initial list of packages to be installed is shown below. Other packages will be installed as needed.
devtools, reshape2, lubridate, car, Hmisc, gapminder, leaflet, DT, data.table, htmltools, scales, ggridges, here, knitr, kableExtra, haven, readr, readxl, ggplot2, vembedr
If we need to, we could update packages via Tools -> Check for Package Updates...
It is a good idea to update packages on a regular frequency but every now and then something might break with an update but it is usually fixed sooner rather than later by the developer.
ohio2019
data
. The folder structure will now be as shown belowohio2019/
└── my-rmarkdown-file-01.Rmd
└── my-rmarkdown-file-02.Rmd
└── data/
└── some data file
└── another data file
All data you download or create go into the data
folder. All R code files reside in the ohio2019
folder. Open the Rmd file I sent you: Module01_forClass.Rmd and save it in the ohio2019 folder. Save the data I sent you to the data folder.
project
via File -> New Project
and choose Existing Directory
. Browse to the ohio2019 folder and click Create Project
. RStudio will restart and when it does you will be in the project folder and will see a file called ohio2019.Rproj
From now until you leave, whenever you start an RStudio session, do so by clicking the ohio2019.Rproj
file. If you do this life will be fairly easy when working with the files I send you.
New File -> R Markdown ...
and enter My First Rmd File
in title and your name
.OK
File -> Save As..
and save it as testing_rmd
in the ohio2019 folder and click the Knit
buttonYou may see a message that says some packages need to be installed/updated. Allow these to be installed/updated.
If all goes well, and the document kntis, you should see an html file that has some code, a plot and other results. As the document knits, watch for error messages and copy these verbatim since we can hunt for solutions if we know the error message word for word.
Golden Rule: Give every code chunk a unique name, whch can be a alphanumeric string with no whitespace. If you forget, use the namer()
package to assign names to every code chunk sans a name. This can be done via
library(namer)
name_chunks("myfilename.Rmd")
You will see the code chunks have several options that could be invoked. Here are some of the more common ones we will use.
eval
= If FALSE, knitr will not run the code in the code chunk. include
= If FALSE, knitr will run the chunk but not include the chunk in the final document. echo
= If FALSE, knitr will not display the code in the code chunk above it’s results in the final document. error
= If FALSE, knitr will not display any error messages generated by the code. message
= If FALSE, knitr will not display any messages generated by the code. warning
= If FALSE, knitr will not display any warning messages generated by the code. cache
= If TRUE, knitr will cache the results to reuse in future knits. Knitr will reuse the results until the code chunk is altered. dev
= The R function name that will be used as a graphical device to record plots, e.g. dev=‘CairoPDF’. dpi
= A number for knitr to use as the dots per inch (dpi) in graphics (when applicable). fig.align
= ‘center’, ‘left’, ‘right’ alignment in the knit document fig.height
= height of the figure (in inches, for example) fig.width
= width of the figure (in inches, for example) out.height
and out.width
= The width and height to scale plots to in the final output.
Other options can be found in the cheatsheet available here There is an excellent R Markdown in RStudio tutorial on vimeo. If the video does not show up below (because of privacy restrictions) click on it to view it on vimeo. You may need to sign-up (for free) with an email id.
Make sure you have the data-sets sent to you via Slack in your data folder. If you don’t then the commands that follow will not work. We start by reading a simple comma-separated variable
format file and then a tab-delimited variable
format file.
If both files were read then Environment
should show objects called df.csv
and df.tab
. If you don’t see these, check the following:
Excel files can be read via the readxl
package
library(readxl)
df.xls <- read_excel("data/ImportDataXLS.xls")
df.xlsx <- read_excel("data/ImportDataXLSX.xlsx")
SPSS, Stata, SAS files can be read via the haven
package
library(haven)
df.stata <- read_stata("data/ImportDataStata.dta")
df.sas <- read_sas("data/ImportDataSAS.sas7bdat")
df.spss <- read_sav("data/ImportDataSPSS.sav")
It is also common to encounter fixed-width files where the raw data are stored without any gaps between successive variables. However, these files will come with documentation that will tell you where each variable starts and ends, along with other details about each variable.
Notice we need widths = c()
to indicate how many slots each variable takes and then col.names = c()
to label the columns since the data file does not have variable names.
It is possible to specify the full web-path for a file and read it in, rather than storing a local copy. This is often useful when updated by the source (Census Bureau, Bureau of Labor, Bureau of Economic Analysis, etc.)
fpe <- read.table("http://data.princeton.edu/wws509/datasets/effort.dat")
test <- read.table("https://stats.idre.ucla.edu/stat/data/test.txt",
header = TRUE)
test.csv <- read_csv("https://stats.idre.ucla.edu/stat/data/test.csv")
hsb2.spss <- read_spss("https://stats.idre.ucla.edu/stat/data/hsb2.sav")
There are other packages as well – for example, the foreign
package will also read Stata, SAS, SPSS, and other file formats. In addition, there are some specialist packages for reading SAS, SPSS, etc. data files – sas7bdat
, rio
, data.table
, xlsx
, XLConnect
, gdata
, etc.
Large files may sit in compressed archives on the web and R has a neat way of allowing you to download the file, unzip it, and read it. Why is this useful? Because if these files tend to be update periodically, this ability lets you use the same piece of R code to download/unzip/read the updated file. The tedious way would be to manually download, unzip, place in the appropriate data folder, and then read it.
temp <- tempfile()
download.file("ftp://ftp.cdc.gov/pub/Health_Statistics/NCHS/Datasets/NVSS/bridgepop/2016/pcen_v2016_y1016.sas7bdat.zip", temp)
oursasdata <- read_sas(
unz(temp,
"pcen_v2016_y1016.sas7bdat"
)
)
unlink(temp)
You can save your data in a format that R will recognize, giving it the RData or rdata extension
Check your data directory to confirm that both files are present
Working with the hsb2 data: 200 students from the High school and Beyond study
hsb2 <- read.table('https://stats.idre.ucla.edu/stat/data/hsb2.csv',
header = TRUE,
sep = ","
)
female
= (0/1)race
= (1=hispanic 2=asian 3=african-amer 4=white)ses
= socioeconomic status (1=low 2=middle 3=high)schtyp
= type of school (1=public 2=private)prog
= type of program (1=general 2=academic 3=vocational)read
= standardized reading scorewrite
= standardized writing scoremath
= standardized math scorescience
= standardized science scoresocst
= standardized social studies scoreThere are no value labels for the various qualitative/categorical variables (female, race, ses, schtyp, and prog) so we next create these.
hsb2$female <- factor(hsb2$female, levels = c(0, 1),
labels=c("Male", "Female")
)
hsb2$race <- factor(hsb2$race, levels = c(1:4),
labels=c("Hispanic", "Asian", "African American", "White")
)
hsb2$ses <- factor(hsb2$ses, levels = c(1:3),
labels=c("Low", "Middle", "High")
)
hsb2$schtyp <- factor(hsb2$schtyp, levels = c(1:2),
labels=c("Public", "Private")
)
hsb2$prog <- factor(hsb2$prog, levels = c(1:3),
labels=c("General", "Academic", "Vocational")
)
I am overwriting each variable, indicating to R that variable x
will show up as numeric with values 0 and 1, and that a 0 should be treated as male and a 1 as female, and so on. There are are four values for race, 3 for ses, 2 for schtyp, and 3 for prog, so the mapping has to reflect this. Note that this is just a quick run through with creating value labels; we will cover this in greater detail in a later module.
save your work!!
Having added labels to the factors in hsb2 we can now save the data for later use.
Let us test if this R Markdown file will knit to html. If all is good then we can Close Project
, and when we do so, RStudio will close the project and reopen in a vanilla session.
Almost all R packages come bundled with data-sets, too many of them to walk you through but
To load data from a package, if you know the data-set’s name, run
You can save your data via
save(dataname, file = "filepath/filename.RData")
orsave(dataname, file = "filepath/filename.rdata")
You can also save multiple data files as follows:
If you want to save just a single object
from the environment and then load it in a later session, maybe with a different name, then you should use saveRDS()
and readRDS()
If instead you did the following, the file will be read with the name when saved
If you want to save everything
you have done in the work session you can via save.image()
save.image(file = "mywork_aug142019.RData")
The next time you start RStudio this image will be automatically loaded. This is useful if you have a lot of R code you have written and various objects generated and do not want to start from scratch the next time around.
If you are not in a project and they try to close RStudio after some code has been run, you will be prompted to save (or not) the workspace
and you should say “no” by default unless you want to save the workspace.
There are several packages that allow us to build simple versus complicated maps in R. Of late I have been really fascinated by leaflet
– an easy to learn JavaScript library that generates interactive maps – so let us see that package in action. Later on, when we move to more advanced visualizations we will look at a variety of mapping options. For the moment we keep it simple and fun.
library(leaflet)
library(leaflet.extras)
library(widgetframe)
m1 <- leaflet() %>% setView(lat = 39.322577,
lng = -82.106336,
zoom = 14) %>%
addTiles() %>%
setMapWidgetStyle() %>%
frameWidget(height = '275')
saveWidget(m1, 'leaflet1.html')
m1
Notice how this was built:
setView()
to center the map with given latitude and longitude and then pick a reasonable zoom factor with zoom =
. If you set the zoom factor too low you will be seeing the place from outer space and if too high then you might standing on a street corner, so experiment with it.Now, say since I ended up picking the general area around Richland Avenue, I could drop a marker on Building 21 on The Ridges. This is being done with addMarkers
and the popup
is basically reflecting what should be displayed when someone clicks on this marker.
m2 <- leaflet() %>% setView(lat = 39.322577,
lng = -82.106336,
zoom = 15) %>%
addMarkers(lat = 39.319984, lng = -82.107084,
popup = c("The Ridges, Building 21")) %>%
addTiles() %>%
setMapWidgetStyle() %>%
frameWidget(height = '275')
saveWidget(m2, 'leaflet2.html')
m2
Let us build one for Egypt, shall we?
m3 <- leaflet() %>% setView(lat = 30.049677,
lng = 31.236318,
zoom = 8) %>%
addMarkers(lat = 30.049677, lng = 31.236318,
popup = c("Cairo, Egypt")) %>%
addTiles() %>%
setMapWidgetStyle() %>%
frameWidget(height = '500')
saveWidget(m3, 'leaflet3.html')
m3