5 min read•august 14, 2024
Lists and data frames are essential data structures in R, allowing you to organize and manipulate complex datasets. Lists offer flexibility, storing elements of different types and lengths, while data frames provide a structured format for tabular data.
Understanding these structures is crucial for effective data analysis in R. You'll learn how to create, access, and manipulate lists and data frames, as well as combine and reshape data for various analytical tasks. These skills form the foundation for working with real-world datasets in R.
[list](https://www.fiveableKeyTerm:list)()
function
my_list <- list(1, "apple", c(TRUE, FALSE), list(1, 2, 3))
[data.frame()](https://www.fiveableKeyTerm:data.frame())
function or by combining vectors of equal
my_df <- data.frame(x = 1:3, y = c("a", "b", "c"))
cbind()
or rbind()
cbind()
combines vectors column-wiserbind()
combines vectors row-wisemy_df <- cbind(x = 1:3, y = c("a", "b", "c"))
[names](https://www.fiveableKeyTerm:Names)()
function or by assigning names directly during list creation
names(my_list) <- c("num", "char", "log", "list")
my_list <- list(num = 1, char = "apple", log = c(TRUE, FALSE), list = list(1, 2, 3))
colnames()
attribute
colnames(my_df) <- c("x", "y")
my_df <- data.frame(x = 1:3, y = c("a", "b", "c"))
append()
: Add elements to a list or data frameremove()
: Remove elements from a list or data frameupdate()
: Modify elements in a list or data framemerge()
: Combine lists or data frames based on common elements or columnsmy_list <- list(name = "John", age = 30, scores = c(85, 92, 88))
my_df <- data.frame(name = c("John", "Alice", "Bob"), age = c(30, 25, 35), score = c(85, 92, 88))
row.names
: Provides names for each row in the data framecolnames
: Provides names for each column in the data frameread.csv()
: Reads data from a CSV file and creates a data frameread.table()
: Reads data from a delimited text file and creates a data frame[]
with the element's index or name
my_list[1]
, my_list["num"]
[[]]
my_list[[1]]
, my_list[["num"]]
$
operator followed by the element name to access named elements in a list
my_list$num
$
operator followed by the column name
my_df$x
[]
with the column index or name
my_df[1]
, my_df["x"]
[]
with the row index or a logical vector
my_df[1, ]
, my_df[c(TRUE, FALSE, TRUE), ]
subset()
function
subset(my_df, x > 1)
apply()
family of functions
[lapply](https://www.fiveableKeyTerm:lapply)()
: Applies a function to each element of a list and returns a listsapply()
: Applies a function to each element of a list and returns a simplified vector or matrixapply()
: Applies a function to the margins (rows or columns) of a matrix or data framelapply(my_list, length)
, sapply(my_df, mean)
cbind()
function
cbind(my_df, new_column = c(1, 2, 3))
rbind()
function
rbind(my_df, c(4, "d"))
merge()
function, similar to a SQL join operation
merge(df1, df2, by = "common_column")
reshape2
package
melt()
: Convert a data frame from wide to long formatdcast()
: Convert a data frame from long to wide format[tidyr](https://www.fiveableKeyTerm:tidyr)
package for reshaping data frames, providing a more intuitive syntax
pivot_longer()
: Convert a data frame from wide to long formatpivot_wider()
: Convert a data frame from long to wide formatmelt(my_df, id.vars = "name")
, pivot_longer(my_df, cols = c("score1", "score2"), names_to = "test", values_to = "score")
[dplyr](https://www.fiveableKeyTerm:dplyr)
package for manipulating and transforming data frames in a concise and readable manner
select()
: Select specific columns from a data framefilter()
: Filter rows based on logical conditionsarrange()
: Arrange rows based on one or more columnsmutate()
: Create new columns or modify existing onessummarise()
: Summarize data by calculating aggregate functionsmy_df %>% select(name, age) %>% filter(age > 30) %>% arrange(desc(age)) %>% mutate(age_squared = age^2)