The age of generation we live in is digital, where the foundation of everything is data. With everyone using digital platforms for hours daily the volume of data is increasing rapidly and without visualization of the data it is almost impossible to put a narrative around it. Data visualization is an art which converts numbers into effective knowledge. There are a few programmes out there to help you in data visualization, one of the programmes that help you learn this art is R Programming. R Programming offers you a set of inbuilt functions and libraries to build visualizations and present data, it able to provide you with a distinctive form of visualization for nearly every scenario imaginable.
Below are selection of 7 essential data visualizations and how to recreate them using a mix of base R functions and a few common packages. The examples all make use of datasets included in a default R base installation.
1. Bar Chart
You're probably already familiar with the basic bar chart from elementary school, high school and college. The concept of the bar chart in R is the same as it was in the past scenarios - to show a categorical comparison between two or more variables. However, there are several different types of bar charts to know and understand.
Horizontal and vertical bar charts are already common and familiar - they are standard formats in most academic or professional presentations. But R provides a stacked bar chart that lets you introduce different variables to each category.
barplot(Numbers,main='Automobile cylinder number grouped by number of gears',
col=c('red','orange', 'steelblue'),legend=rownames(Numbers),xlab='Number of Gears',
Histograms are standard in some academic fields, but they're usually reserved for the senior-most levels. These charts are best with highly precise or accurate numbers in R.
It ultimately provides a probability estimate of a variable - the period of time before a project's completion, for example. R provides a simple function for this as well.
# histogram of frequency of ozone values in 'airquality' dataset
One of the most innovative data visualizations in R, the heat map emphasizes colour intensity to visualize relationships between multiple variables.
The result is an attractive 2D image that is easy to interpret. As a basic example, a heat map highlights the popularity of competing items by ranking them according to their original market launch date. It breaks it down further by providing sales statistics and figures over the course of time.
# simulate a dataset of 10 points
dataMatrix<-as.matrix(dataFrame)[sample(1:10),] # convert to class 'matrix', then shuffle the rows of the matrix
heatmap(dataMatrix) # visualize hierarchical clustering via a heatmap
4. Scatter Plot
Plotting is a popular alternative to charting or graphing. It provides a unique visualization involving various dots. The most standard iteration - the scatter plot - tracks two continuous variables over the course of time. A basic application of the scatter plot involves tracking the height and weight of children throughout the years. Scatter plots are useful when trying to avoid misinformation in the visualization. Only use a plot if you're sure the audience is familiar with that type of chart, and always use it sparingly. When in doubt, go with one of your other options.
# Plot Ozone and Temperature measurements for only the month of September
title ('Wind and Temperature in NYC in September of 1973')
5. Box Plot
The box plot resembles a bar chart in many respects. Instead of focusing on categorical data, box plots provide visualization for both categorical and continuous variable data.
In the real world, box plots give detailed information on weather patterns and how they change over the course of time.
mtcars<-transform(mtcars,cyl=factor(cyl)) # convert 'cyl' column from class 'numeric' to class 'factor'
class(mtcars$cyl) # 'cyl' is now a categorical variable
boxplot(mpg~cyl,mtcars,xlab='Number of Cylinders',ylab='miles per gallon',
main='miles per gallon for varied cylinders in automobiles',cex.main=1.2)
Correlated data is best visualized through corrplot. The 2D format is similar to a heat map, but it highlights statistics that are directly related.
Most correlograms highlight the amount of correlation between datasets at various points in time. Comparing sales data between different months or years is a basic example.
corr_matrix <- cor(mtcars)
# with circles
# with numbers and lower
method = 'number',
type = "lower")
Correlogram with circles
Correlogram with numbers
7. Area Chart
Area charts express continuity between different variables or data sets. It's akin to the traditional line chart you know from grade school and is used in a similar fashion.
Most area charts highlight trends and their evolution over the course of time, making them highly effective when trying to expose underlying trends - whether they're positive or negative.
#data("airquality") #dataset used
summarise(mean_wind = mean(Wind)) %>%
geom_area(aes(x = Day, y = mean_wind)) +
labs(title = "Area Chart of Average Wind per Day",
subtitle = "using airquality data",
y = "Mean Wind")
Graphs and charts are an easy way of remembering data an easier alternative to spreadsheets and reports. The professional world and institutions have started embracing this next gen visualization of data in the works.
NOTE: Code for the first 5 visualizations has been provided by Elisa Due. For the final 2 visualizations has been provided by Abdul Majed Raja