Begin this practical by setting the maximum line length in R-Studio
to 80 characters. Go to RStudio
’s Preferences
(or Global Options
under Tools
) –> Code
–> Display
, and tick the show margin
box. Make sure that the margin column
is set to 80
For those of you who are accustomed to running analysis models in Mplus, I have included two additional exercises (12 and 13) on how to call Mplus from within R
. If you do not use Mplus, I suggest that you skip these two exercises.
mice
. Go to Tools
> Install Packages
in RStudio
. If you are connected to the internet, select Repository
under Install From
and type mice
under Packages
. Leave the Install to Library
at default and make sure that Install Dependencies
is selected. Click install. If you are not connected to the internet, select Package Archive File
under “Install from” and navigate to the respective file on your drive.
Some packages depend on other packages, meaning that their functionality may be limited if their dependencies are not installed. Installing dependencies is therefor recommended, but internet connectivity is required.
If all is right, you will receive a message in the console that the package has been installed (as well as its dependencies.
ALternatively, if you know the name of the package you would like to install - in this case mice
- you can also call install.packages("mice")
in the console window.
mice
. Loading packages can be done through functions library()
and require()
.library(mice)
## Loading required package: lattice
##
## Attaching package: 'mice'
## The following objects are masked from 'package:base':
##
## cbind, rbind
If you use require()
within a function, and the required package is not available, require()
will yield a warning and the remainder of the function is still executed, whereas library()
will yield an error and terminate all executions. The use of library()
when not doing too complicated things is preferred - require()
would result in more computational overhead because it calls library()
itself.
mammalsleep
dataset from package mice
by typing mammalsleep
in the console, and subsequently by using the function View()
. Using View()
is preferred for inspecting datasets that are large. View()
opens the dataset in a spreadsheet-like window (conform MS Excel, or SPSS). If you View()
your own datasets, you can even edit the datasets’ contents.
mice
to the work directory as a tab-delimited text file with .
as a decimal seperator. Name the file mammalsleep.txt
library(mice)
write.table(mammalsleep, "mammalsleep.txt", sep = "\t", dec = ".", row.names = FALSE)
The command sep = "\t"
indicates that the file is tabulated and the command dec = "."
indicates that a point is used as the decimal seperator (instead of a comma). row.names = FALSE
tells R
that row names are not to be included in exported file.
mammalsleep.txt
file. sleepdata <- read.table("mammalsleep.txt", sep = "\t", dec = ".", header = TRUE)
The command sep = "\t"
indicates that the file is tabulated and the command dec = "."
indicates that a point is used as the decimal seperator (instead of a comma). header = TRUE
tells R
that variable names are included in the header.
All files that are presented in the work directory of the current R
project, can essentially be imported into the workspace (the space that contains all environments) directly. All other locations require you to specify the specific path from the root of your machine. To find out what the current work directory is, you can type getwd()
and to change the work directory you can use setwd()
. The beauty of using projects in RStudio is that you would never have to change the work directory, as the work directory is automatically set, relative to your projects’ R
-scripts.
There are many packages that facilitate importing datasets from other statistical software packages, such as SPSS (e.g. function read_spss
from package haven
), Mplus (package MplusAutomation
), Stata (read.dta()
in foreign
), SAS (sasxport.get()
from package Hmisc
) and from spreadsheet software, such as MS Excel (function read.xlsx()
from package xlsx
). For a short guideline to import multiple formats into R
, see e.g. http://www.statmethods.net/input/importingdata.html.
If you would like to know more about this dataset, you can open the help for the mammalsleep
dataset in package mice
through ?mammalsleep
. Don’t forget to load package mice
first.
Inspecting the sleepdata could be done by
str(sleepdata) #the data structure
## 'data.frame': 62 obs. of 11 variables:
## $ species: Factor w/ 62 levels "African elephant",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ bw : num 6654 1 3.38 0.92 2547 ...
## $ brw : num 5712 6.6 44.5 5.7 4603 ...
## $ sws : num NA 6.3 NA NA 2.1 9.1 15.8 5.2 10.9 8.3 ...
## $ ps : num NA 2 NA NA 1.8 0.7 3.9 1 3.6 1.4 ...
## $ ts : num 3.3 8.3 12.5 16.5 3.9 9.8 19.7 6.2 14.5 9.7 ...
## $ mls : num 38.6 4.5 14 NA 69 27 19 30.4 28 50 ...
## $ gt : num 645 42 60 25 624 180 35 392 63 230 ...
## $ pi : int 3 3 1 5 3 4 1 4 1 1 ...
## $ sei : int 5 1 1 2 5 4 1 5 2 1 ...
## $ odi : int 3 3 1 3 4 4 1 4 1 1 ...
summary(sleepdata) #distributional summaries
## species bw brw
## African elephant : 1 Min. : 0.005 Min. : 0.14
## African giant pouched rat: 1 1st Qu.: 0.600 1st Qu.: 4.25
## Arctic Fox : 1 Median : 3.342 Median : 17.25
## Arctic ground squirrel : 1 Mean : 198.790 Mean : 283.13
## Asian elephant : 1 3rd Qu.: 48.203 3rd Qu.: 166.00
## Baboon : 1 Max. :6654.000 Max. :5712.00
## (Other) :56
## sws ps ts mls
## Min. : 2.100 Min. :0.000 Min. : 2.60 Min. : 2.000
## 1st Qu.: 6.250 1st Qu.:0.900 1st Qu.: 8.05 1st Qu.: 6.625
## Median : 8.350 Median :1.800 Median :10.45 Median : 15.100
## Mean : 8.673 Mean :1.972 Mean :10.53 Mean : 19.878
## 3rd Qu.:11.000 3rd Qu.:2.550 3rd Qu.:13.20 3rd Qu.: 27.750
## Max. :17.900 Max. :6.600 Max. :19.90 Max. :100.000
## NA's :14 NA's :12 NA's :4 NA's :4
## gt pi sei odi
## Min. : 12.00 Min. :1.000 Min. :1.000 Min. :1.000
## 1st Qu.: 35.75 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.000
## Median : 79.00 Median :3.000 Median :2.000 Median :2.000
## Mean :142.35 Mean :2.871 Mean :2.419 Mean :2.613
## 3rd Qu.:207.50 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000
## Max. :645.00 Max. :5.000 Max. :5.000 Max. :5.000
## NA's :4
round(cor(sleepdata[, -1], use = "pairwise.complete.obs"), 2) #bivariate correlations, variable 1 excluded.
## bw brw sws ps ts mls gt pi sei odi
## bw 1.00 0.93 -0.38 -0.11 -0.31 0.30 0.65 0.06 0.34 0.13
## brw 0.93 1.00 -0.37 -0.11 -0.36 0.51 0.75 0.03 0.37 0.15
## sws -0.38 -0.37 1.00 0.51 0.96 -0.38 -0.59 -0.32 -0.54 -0.48
## ps -0.11 -0.11 0.51 1.00 0.73 -0.30 -0.45 -0.45 -0.54 -0.58
## ts -0.31 -0.36 0.96 0.73 1.00 -0.41 -0.63 -0.40 -0.64 -0.59
## mls 0.30 0.51 -0.38 -0.30 -0.41 1.00 0.61 -0.10 0.36 0.06
## gt 0.65 0.75 -0.59 -0.45 -0.63 0.61 1.00 0.20 0.64 0.38
## pi 0.06 0.03 -0.32 -0.45 -0.40 -0.10 0.20 1.00 0.62 0.92
## sei 0.34 0.37 -0.54 -0.54 -0.64 0.36 0.64 0.62 1.00 0.79
## odi 0.13 0.15 -0.48 -0.58 -0.59 0.06 0.38 0.92 0.79 1.00
head(mammalsleep) #first six rows
## species bw brw sws ps ts mls gt pi sei
## 1 African elephant 6654.000 5712.0 NA NA 3.3 38.6 645 3 5
## 2 African giant pouched rat 1.000 6.6 6.3 2.0 8.3 4.5 42 3 1
## 3 Arctic Fox 3.385 44.5 NA NA 12.5 14.0 60 1 1
## 4 Arctic ground squirrel 0.920 5.7 NA NA 16.5 NA 25 5 2
## 5 Asian elephant 2547.000 4603.0 2.1 1.8 3.9 69.0 624 3 5
## 6 Baboon 10.550 179.5 9.1 0.7 9.8 27.0 180 4 4
## odi
## 1 3
## 2 3
## 3 1
## 4 3
## 5 4
## 6 4
tail(mammalsleep) #last six rows
## species bw brw sws ps ts mls gt pi sei odi
## 57 Tenrec 0.900 2.6 11.0 2.3 13.3 4.5 60 2 1 2
## 58 Tree hyrax 2.000 12.3 4.9 0.5 5.4 7.5 200 3 1 3
## 59 Tree shrew 0.104 2.5 13.2 2.6 15.8 2.3 46 3 2 2
## 60 Vervet 4.190 58.0 9.7 0.6 10.3 24.0 210 4 3 4
## 61 Water opossum 3.500 3.9 12.8 6.6 19.4 3.0 14 2 1 1
## 62 Yellow-bellied marmot 4.050 17.0 NA NA NA 13.0 38 3 1 1
?mammalsleep # the help
## starting httpd help server ... done
Note that the sleepdata dataset is automatically recognized as a dataframe. After all, there is one factor (categorical variable) containing the animal names.
The functions head()
and tail()
are very useful functions. As is function str
as it gives you a quick overview of the measurement levels in mammalsleep
.
Since mammalsleep
is an R
-dataset, there should be a help file. Taking a look at ?mammalsleep
may yield valuable insight about the measurements and origin of the variables.
One thing that may have caught your attention is the relation between ts
, ps
and sws
. This is a deterministic relation where total sleep (ts
) is the sum of paradoxical sleep (ps
) and short-wave sleep (sws
). In the event that you would model the data, you need to take such relations into account.
Practical_C.RData
. Also, save the sleepdata file as a seperate workspace called Sleepdata.RData
. Now that we have imported our data, it may be wise to save the current workspace, i.e. the current state of affairs. Saving the workspace will leave everything as is, so that we can continue from this exact state at a later time, by simply opening the workspace file. To save everything in the current workspace, type
save.image("Practical_C.RData")
To save just the dataset sleepdata
, and nothing else, type
save(sleepdata, file = "Sleepdata.RData")
With the save functions, any object in the workspace can be saved.
There are two ways to exclude the three animals from the dataset. The first approach uses the names:
exclude <- c("Echidna", "Lesser short-tailed shrew", "Musk shrew")
which <- sleepdata$species %in% exclude #Indicate the species that match the names in exclude
which
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [12] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [23] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [34] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [45] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [56] FALSE FALSE FALSE FALSE FALSE FALSE FALSE
sleepdata2 <- sleepdata[!which, ]
while the second approach uses the row numbers directly (you would need to inquire about, or calculate the rownumbers)
sleepdata2 <- sleepdata[-c(16, 32, 38), ]
Note that the numbered option requires less code, but the named option has a much lower probability for error. As the dataset might change, or might get sorted differently, the second option may not be valid anymore.
plot(brw ~ species, data = sleepdata2)
To find out which animals have a brain weight larger than 1 standard deviation above the mean brain weight:
sd.brw <- sd(sleepdata2$brw) #standard deviation
mean.brw <- mean(sleepdata2$brw) #mean
which <- sleepdata2$brw > (mean.brw + (1 * sd.brw)) #which are larger?
as.character(sleepdata2$species[which]) #names of the animals with brw > 1000
## [1] "African elephant" "Asian elephant" "Man"
To plot these animals:
plot(brw ~ species, data = sleepdata2[which, ])
The downside is that it still prints all the animals on the x-axis. This is due to the factor labels for species
being copied to the smaller subset of the data. Plot automatically takes over the labels. For example,
sleepdata2$species[which]
## [1] African elephant Asian elephant Man
## 62 Levels: African elephant African giant pouched rat ... Yellow-bellied marmot
returns only 3 mammals, but still has 62 factor levels. To get rid of the unused factor levels, we can use function factor()
:
sleepdata3 <- sleepdata2[which, ]
sleepdata3$species <- factor(sleepdata3$species)
sleepdata3$species
## [1] African elephant Asian elephant Man
## Levels: African elephant Asian elephant Man
To plot the graph that we wanted:
plot(brw ~ species, data = sleepdata3)
log()
] of brain weight as a function of short-wave sleep. Use dark red triangles as plotting characters. Name the x-axis Short-wave sleep
and name the y-axis Brain weight
. Finally, Give the overall title the name How sleep type relates to brain weight
. Hint: have a look at par
in the help to find information about the plotting parameters.plot(log(brw) ~ sws, data = sleepdata2, type = "p", pch = 2, col = "dark red",
xlab = "Short-wave sleep",
ylab = "Brain weight",
main = "How sleep type relates to brain weight")
If your current software-analysis platform is different from R
, chances are that you prepare your data in the software of your choice. In R
there are fantastic facilities for importing and exporting data and I would specifically like to pinpoint you to package haven
by Hadley Wickham. It provides wonderful functions to import and export many data types from software such as Stata, SAS, SPSS, and so on. For integrating Mplus into R
, package MplusAutomation
is essential.