4 min read•august 9, 2024
Subsetting vectors and matrices is a crucial skill in R programming. It allows you to extract specific elements or subsets of data, making it easier to analyze and manipulate information. This topic builds on the broader concept of subsetting data, teaching you how to access and work with specific parts of your datasets.
Understanding these techniques is essential for efficient data handling in R. From simple indexing to advanced subsetting functions, these tools give you the power to precisely select and manipulate data. Mastering these skills will significantly enhance your ability to work with various data structures in R.
c(TRUE, FALSE, TRUE)
)v[1]
, v[3:5]
)v["apple"]
, v[c("apple", "banana")]
)v[-2]
returns all elements except the second)v[2:5]
selects elements 2 through 5)v[v > 5]
selects elements greater than 5)v[c(TRUE, FALSE)][1:3]
)v[v > 5 & v < 10]
selects elements between 5 and 10)v[which(v %% 2 == 0)]
selects even numbers)rep()
function for repeating index patterns (v[rep(c(TRUE, FALSE), 5)]
)[seq()](https://www.fiveableKeyTerm:seq())
function for creating index sequences (v[seq(1, length(v), by = 2)]
selects odd-indexed elements)i <- 3; v[i]
selects the third element)[subset()](https://www.fiveableKeyTerm:subset())
function extracts elements meeting specified conditions (subset(df, age > 30)
)
[$](https://www.fiveableKeyTerm:$)
operator[
extract function provides flexible subsetting across various data structures
m[1:3, 2:4]
)[[[](https://www.fiveableKeyTerm:[[)
extract function accesses single elements in lists and data frames
mylist[[3]]
)df[["age"]]
)$
operator offers shorthand notation for accessing named list or elements
df$age
)mylist$element_name
)[which()](https://www.fiveableKeyTerm:which())
function returns indices of TRUE elements in a logical vector
v[which(v > 5)]
)max(which(v > 0))
finds the last positive element)[head()](https://www.fiveableKeyTerm:head())
and [tail()](https://www.fiveableKeyTerm:tail())
functions subset the beginning or end of an object
head(v, 10)
, tail(df, 5)
)[slice()](https://www.fiveableKeyTerm:slice())
function from dplyr package subsets rows by position
slice(df, 1:10, 15, 20:25)
)filter()
function from dplyr package subsets rows based on conditions
subset()
(filter(df, age > 30 & salary < 50000)
)m[2, 3]
selects element in 2nd row, 3rd column)m[1, ]
selects first row, m[, 2]
selects second column)m[c(1, 3), c(2, 4)]
)m[m > 5]
selects all elements greater than 5)m[-1, ]
removes first row)m[matrix(c(TRUE, FALSE), nrow = nrow(m), ncol = ncol(m))]
)df[c("name", "age")]
)mylist[1:3]
) or double brackets for single elements (mylist[[2]]
)mylist[[1]][[2]]
)$
operator with brackets for named list elements (mylist$element[1:3]
)df[c("name", "age")]
)df[1:5, 2:4]
)df[df$age > 30, ]
)subset()
function to data frames for condition-based selection (subset(df, age > 30 & salary < 50000)
)dplyr
functions for more readable data frame subsetting:
select()
for column selection (select(df, name, age, salary)
)filter()
for row filtering (filter(df, age > 30)
)slice()
for row selection by position (slice(df, 1:10)
)