Lab 4

Basic Graphics in R

R offers a variety of options for creating visual displays in statistics. While base R itself provides sufficient quality for many plots, additional packages like \(\texttt{\color{brown}{ggplot2}}\) or \(\texttt{\color{brown}{lattice}}\) to produce “publishable” quality graphics. In our lab sessions, we will mainly rely on the basic R graphics.

Beginning Semester Survey Data

For illustrations, let us use a beginning semester survey data from one of the previous MATH 012 classes. It contains 136 observations and 16 variables as follows:

01. Sports:    What is your favorite professional sport? 
02. States:    How many states in the US have you traveled to? 
03. Cat_Dog:   Are you more of a cat person or a dog person? 
04. Pets:      What is the number of pets you and your family currently have? 
05. Gender:    What is your gender? 
06. Browser:   What is your preferred internet browser? 
08. Shoes:     How many pairs of shoes do you have? 
09. Height:    How tall are you (inch)? 
10. MO_Height: What is your mothers height (inch)? 
11. FA_Height: What is your fathers height (inch)? 
12. Arrival:   How many minutes does it take for you to reach the recitation classroom from your residence? 
13. Sleep:     How many HOURS did you sleep last night, to the nearest half-hour? 
14. Number:    What is your favorite whole number between 1 and 10? 
15. Hand:      Are you right- or left-handed? 
16. Credit:    How many credits are you taking this semester? 

Let us check the variable name and the first six observations by using the function \(\texttt{\color{brown}{head()}}\):

Frequency Tables

As we discussed in class, a frequency table summarizes the information from the corresponding data set. It also serves as an important intermediate step for creating visual displays such as pie charts, bar plots, histograms, and more.

Let us create a frequency table of a categorical variable \(\texttt{\color{brown}{Hand}}\) using the function \(\texttt{\color{brown}{table()}}\).

We can also create a 2-by-2 (relative) frequency table (or contingency table) with two categorical variables.

Now, we can finally created conditional frequency table.

Based on the results of conditional frequency table, we can answer questions such as: Which gender prefers MLB? Which gender prefers NBA?

Bar Plot

A bar plot is often more effective than a pie chart for displaying categorical data, especially when dealing with a large number of categories. To create a bar plot, we can use the \(\texttt{\color{brown}{barplot()}}\) function. Note that the function accepts the \(\texttt{\color{brown}{table}}\) object that we created earlier.

For any plotting functions, the \(\texttt{\color{brown}{main}}\) argument can be used to add a “title” to the plot.

Example

Select a single categorical variable from the data set, and create a bar plot with an appropriate title. (We will address bivariate categorical variables in a lab question.)

Histogram

Histogram is a versatile graphical display for a single numerical variable (data). It directly accepts a numerical vector as its first argument. Let us create a histogram for the \(\texttt{\color{brown}{Height}}\) variable.

The histogram reveals that the distribution of the \(\texttt{\color{brown}{Height}}\) variable is right-skewed.

Example

Select a single numerical variable from the data set, and create a histogram with an appropriate title. What is the shape of the distribution?

Box Plot

Although the histogram is typically our go-to choice for a single numerical variable, there are situations where a box plot is preferable. For instance, if we want to display \(\texttt{\color{brown}{Height}}\) by \(\texttt{\color{brown}{Gender}}\), a box plot would be more appropriate.

This graphical representation makes it easy to compare the differences in \(\texttt{\color{brown}{Height}}\) between genders. As demonstrated, a box plot is a clear and intuitive choice when we have one numerical variable and one categorical variable.

Lab Questions

  1. Let us consider \(\texttt{\color{brown}{Browser}}\) variable in relation to \(\texttt{\color{brown}{Cat\_Dog}}\).
  1. Create a frequency table for the \(\texttt{\color{brown}{Browser}}\) variable.
  1. Create a bar plot using the table object in (i).
  1. Create a conditional frequency table for \(\texttt{\color{brown}{Browser}}\) with respect to \(\texttt{\color{brown}{Cat\_Dog}}\).
  1. Create a conditional frequency barplot for \(\texttt{\color{brown}{Browser}}\) with respect to \(\texttt{\color{brown}{Cat\_Dog}}\). Discuss the relationship between the two categorical variables.
  1. Let us consider \(\texttt{\color{brown}{Shoes}}\) variable in relation to \(\texttt{\color{brown}{Gender}}\). Note that we have one numerical and one categorical variable.
  1. Choose a relevant graphical display to summarize the information of \(\texttt{\color{brown}{Shoes}}\), and provide the plot.
  1. Choose a relevant graphical display to summarize the information of \(\texttt{\color{brown}{Shoes}}\) in terms of \(\texttt{\color{brown}{Gender}}\), and provide the plot.
  1. Which gender has a higher central value for \(\texttt{\color{brown}{Shoes}}\)? Which gender shows greater variability? What can we infer about the number of pairs of shoes with respect to gender?

Click HERE to submit your answers.