<- list("Fatma", 5624.25, TRUE)
ftm ftm
[[1]]
[1] "Fatma"
[[2]]
[1] 5624.25
[[3]]
[1] TRUE
We’ve seen that every element of a vector in R must have the same mode (number, character, etc.).
This requirement is put for efficiency; it enables fast computations. However, it is also limiting. We cannot store more complicated data structures, such as
Name: Fatma
Salary: 5624.25
Full time: yes
A data-processing language needs to provide a convenient structure for such mixed-type data. R has lists for this purpose.
A list’s elements can be of different types, allowing for more complicated data representations. They form the bridge between vectors and data frames, which we’ll see later.
In the simplest form, a list can be created with the list()
function call.
<- list("Fatma", 5624.25, TRUE)
ftm ftm
[[1]]
[1] "Fatma"
[[2]]
[1] 5624.25
[[3]]
[1] TRUE
We can access list elements using the double bracket [[...]]
notation.
1]] ftm[[
[1] "Fatma"
2]] ftm[[
[1] 5624.25
A list can comprise any type of object, such as vectors, matrices, sublists, etc.
list(1, c(2,3), list("abc",4))
[[1]]
[1] 1
[[2]]
[1] 2 3
[[3]]
[[3]][[1]]
[1] "abc"
[[3]][[2]]
[1] 4
List elements can be accessed with three methods:
mylist[[1]]
mylist$tag
notation, if tags are givenmylist[["tag"]]
notation, if tags are given<- list(name="Fatma", salary=5624.25, fulltime=TRUE)
ftm $name # or, ftm[[1]], ftm[["name"]] ftm
[1] "Fatma"
2]] # or, ftm$salary, ftm[["salary"]] ftm[[
[1] 5624.25
"fulltime"]] # or, ftm$fulltime , ftm[[3]] ftm[[
[1] TRUE
If the list element is a vector, the [...]
operator can be used afterwards in order to select elements of that vector.
<- list(name="Fatma", grades=c(10,12,9))
ftm $grades ftm
[1] 10 12 9
2]][3] ftm[[
[1] 9
$grades[3] ftm
[1] 9
The syntax listname[["tagname"]]
is useful when tagnames are stored in a variable.
<- list(name="Fatma", salary=5624.25, fulltime=TRUE)
ftm <- "salary"
x ftm[[x]]
[1] 5624.25
A range of indices can be selected using the familiar vector syntax with a single bracket. This returns a sublist.
1:2] ftm[
$name
[1] "Fatma"
$salary
[1] 5624.25
c(1,3)] ftm[
$name
[1] "Fatma"
$fulltime
[1] TRUE
However, this does not work with the double bracket notation.
1:2]] ftm[[
Error in ftm[[1:2]]: subscript out of bounds
The availability of two types of brackets for list indexing can be confusing. They can be distinguished by their return types:
[i]
returns a list with a single component[[i]]
returns a single component.1] # returns a list with a single component. ftm[
$name
[1] "Fatma"
1]] # returns a one-element vector ftm[[
[1] "Fatma"
mode(ftm[1])
[1] "list"
mode(ftm[[1]])
[1] "character"
You can start with an incomplete list and add new elements as you go along.
<- list(name="Fatma", salary=5624.25)
ftm ftm
$name
[1] "Fatma"
$salary
[1] 5624.25
$fulltime <- TRUE
ftm ftm
$name
[1] "Fatma"
$salary
[1] 5624.25
$fulltime
[1] TRUE
New list elements can also be added via vector indices.
4]] <- 28
ftm[[5:7] <- c(a=FALSE,b=TRUE,c=TRUE)
ftm[ ftm
$name
[1] "Fatma"
$salary
[1] 5624.25
$fulltime
[1] TRUE
[[4]]
[1] 28
[[5]]
[1] FALSE
[[6]]
[1] TRUE
[[7]]
[1] TRUE
This last example also shows that a list can have both tagged and untagged elements.
You can delete an element by setting it to NULL
.
$fulltime <- NULL
ftm ftm
$name
[1] "Fatma"
$salary
[1] 5624.25
[[3]]
[1] 28
[[4]]
[1] FALSE
[[5]]
[1] TRUE
[[6]]
[1] TRUE
Note that after deletion, all elements below the deleted one are moved up and their indices are decreased by one.
3]] <- NULL
ftm[[ ftm
$name
[1] "Fatma"
$salary
[1] 5624.25
[[3]]
[1] FALSE
[[4]]
[1] TRUE
[[5]]
[1] TRUE
3:5] <- NULL
ftm[ ftm
$name
[1] "Fatma"
$salary
[1] 5624.25
The familiar c()
function can be used on lists, too.
c( list("abc", 32, T), list(5.1))
[[1]]
[1] "abc"
[[2]]
[1] 32
[[3]]
[1] TRUE
[[4]]
[1] 5.1
c(list(name="Fatma", salary=5624.25, fulltime=TRUE), list(hobby="painting"))
$name
[1] "Fatma"
$salary
[1] 5624.25
$fulltime
[1] TRUE
$hobby
[1] "painting"
To get the number of elements in a list, we can use the length()
function.
<- list(name="Fatma", salary=5624.25, fulltime=TRUE)
ftm length(ftm)
[1] 3
To get the tags in a list, we use the names()
function.
names(ftm)
[1] "name" "salary" "fulltime"
To obtain the values as a vector, we can use the unlist()
function.
unlist(ftm)
name salary fulltime
"Fatma" "5624.25" "TRUE"
unname(unlist(ftm))
[1] "Fatma" "5624.25" "TRUE"
Note that this function returns a vector, and the numeric and the Boolean values are converted to strings. The reason is that in a vector every element must be of the same type, and strings are the only common denominator here.
The lapply()
function applies a function to each element of a list, and returns the results as a list.
lapply(list(2,3.5,4), sqrt)
[[1]]
[1] 1.414214
[[2]]
[1] 1.870829
[[3]]
[1] 2
Working with tagged elements:
<- c(10,12,11,14,8,12)
grades_1 <- c(13,11,10,11,9)
grades_2 <- list(section1=grades_1, section2=grades_2)
allgrades allgrades
$section1
[1] 10 12 11 14 8 12
$section2
[1] 13 11 10 11 9
mean(allgrades$section1)
[1] 11.16667
lapply(allgrades, mean)
$section1
[1] 11.16667
$section2
[1] 10.8
The sapply()
(simple apply) function returns a vector or a matrix resulting from the application of the function.
sapply(allgrades, mean)
section1 section2
11.16667 10.80000
mode(sapply(allgrades, mean))
[1] "numeric"
We can define our own functions to specify what to do with each element.
<- function(x) {2*x}
mult_by2 mult_by2(c(1,2,3,4))
[1] 2 4 6 8
lapply( list(1, 2, 3:7), mult_by2)
[[1]]
[1] 2
[[2]]
[1] 4
[[3]]
[1] 6 8 10 12 14
Set up a list of staff members, where each element is a list consisting of names, wages and number of hours worked.
<- list(
staff id000=list(name="Fatma", wage=12.5, hours=20),
id001=list(name="Ekrem", wage=11.7, hours=30),
id002=list(name="Deniz", wage=13.3, hours=25)
) staff
$id000
$id000$name
[1] "Fatma"
$id000$wage
[1] 12.5
$id000$hours
[1] 20
$id001
$id001$name
[1] "Ekrem"
$id001$wage
[1] 11.7
$id001$hours
[1] 30
$id002
$id002$name
[1] "Deniz"
$id002$wage
[1] 13.3
$id002$hours
[1] 25
Define a function that takes one person as defined above, and returns the weekly pay.
<- function(person){person$wage * person$hours} payroll
payroll(list(name="Deniz", wage=13.3, hours=25))
[1] 332.5
Now apply this function to every staff member on the list staff
.
lapply(staff, payroll)
$id000
[1] 250
$id001
[1] 351
$id002
[1] 332.5
sapply(staff, payroll)
id000 id001 id002
250.0 351.0 332.5
We have a vector of numbers where numbers are repeated.
= c(1,2,3,15,1,2,3,4,1) mydata
We want to keep the count of each number in a list, such that counts[[i]]
stores how many times the number i
occurs in data
.
Initialize the counts
list with zeros.
= list(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0) counts
Now loop over the data vector, and increase the count of the appropriate number.
for (x in mydata) {
<- counts[[x]] + 1
counts[[x]]
} counts
[[1]]
[1] 3
[[2]]
[1] 2
[[3]]
[1] 2
[[4]]
[1] 1
[[5]]
[1] 0
[[6]]
[1] 0
[[7]]
[1] 0
[[8]]
[1] 0
[[9]]
[1] 0
[[10]]
[1] 0
[[11]]
[1] 0
[[12]]
[1] 0
[[13]]
[1] 0
[[14]]
[1] 0
[[15]]
[1] 1
[[16]]
[1] 0
[[17]]
[1] 0
Here is a simple application of textual analysis. Consider the following (short) text. It is preprocessed to remove punctuation marks and uppercase letters.
<- "my dear fellow said sherlock holmes as we sat on either side of the fire in his lodgings at baker street life is infinitely stranger than anything which the mind of man could invent we would not dare to conceive the things which are really mere commonplaces of existence if we could fly out of that window hand in hand hover over this great city gently remove the roofs and peep in at the queer things which are going on the strange coincidences the plannings the cross purposes the wonderful chains of events working through generations and leading to the most outré results it would make all fiction with its conventionalities and foreseen conclusions most stale and unprofitable" sometext
We wish to create a list wordcounts
such that wordcounts$word
gives the number of occurrences of word
in the given text.
This problem is similar to the example above where we counted the occurrences of numbers. However, we don’t know in advance what words and how many words we are going to encounter. So we cannot initialize the counts to zero.
We will approach the problem as follows:
for every word in the word list
if the word is already in the list, increase the count.
otherwise, add this word with a count of 1.
If an element is not in a list, the list returns NULL
. This can be used to check for the existence of an element in a list.
<- list()
wordcounts wordcounts
list()
"sherlock"]] wordcounts[[
NULL
is.null(wordcounts[["sherlock"]])
[1] TRUE
So, beginning with the first word, we add it to our list:
<- "my"
word if (is.null(wordcounts[[word]])){
<- 1
wordcounts[[word]] else {
} <- wordcounts[[word]] + 1
wordcounts[[word]] }
wordcounts
$my
[1] 1
Similarly, the second word:
<- "dear"
word if (is.null(wordcounts[[word]])){
<- 1
wordcounts[[word]] else {
} <- wordcounts[[word]] + 1
wordcounts[[word]]
} wordcounts
$my
[1] 1
$dear
[1] 1
And now the list contains the elements we gave, and nothing more.
wordcounts
$my
[1] 1
$dear
[1] 1
We can’t go word by word manually. The better solution is to loop over every word in the text. We need to find a way to convert the large string of text to a vector, so that we can take the words one by one.
The strsplit()
function does that for us:
strsplit(sometext, split=" ")
[[1]]
[1] "my" "dear" "fellow"
[4] "said" "sherlock" "holmes"
[7] "as" "we" "sat"
[10] "on" "either" "side"
[13] "of" "the" "fire"
[16] "in" "his" "lodgings"
[19] "at" "baker" "street"
[22] "life" "is" "infinitely"
[25] "stranger" "than" "anything"
[28] "which" "the" "mind"
[31] "of" "man" "could"
[34] "invent" "we" "would"
[37] "not" "dare" "to"
[40] "conceive" "the" "things"
[43] "which" "are" "really"
[46] "mere" "commonplaces" "of"
[49] "existence" "if" "we"
[52] "could" "fly" "out"
[55] "of" "that" "window"
[58] "hand" "in" "hand"
[61] "hover" "over" "this"
[64] "great" "city" "gently"
[67] "remove" "the" "roofs"
[70] "and" "peep" "in"
[73] "at" "the" "queer"
[76] "things" "which" "are"
[79] "going" "on" "the"
[82] "strange" "coincidences" "the"
[85] "plannings" "the" "cross"
[88] "purposes" "the" "wonderful"
[91] "chains" "of" "events"
[94] "working" "through" "generations"
[97] "and" "leading" "to"
[100] "the" "most" "outré"
[103] "results" "it" "would"
[106] "make" "all" "fiction"
[109] "with" "its" "conventionalities"
[112] "and" "foreseen" "conclusions"
[115] "most" "stale" "and"
[118] "unprofitable"
Note that strsplit()
returns a list. The first element of this list is the vector of strings we look for.
<- strsplit(sometext, split=" ")[[1]]
wordsintext wordsintext
[1] "my" "dear" "fellow"
[4] "said" "sherlock" "holmes"
[7] "as" "we" "sat"
[10] "on" "either" "side"
[13] "of" "the" "fire"
[16] "in" "his" "lodgings"
[19] "at" "baker" "street"
[22] "life" "is" "infinitely"
[25] "stranger" "than" "anything"
[28] "which" "the" "mind"
[31] "of" "man" "could"
[34] "invent" "we" "would"
[37] "not" "dare" "to"
[40] "conceive" "the" "things"
[43] "which" "are" "really"
[46] "mere" "commonplaces" "of"
[49] "existence" "if" "we"
[52] "could" "fly" "out"
[55] "of" "that" "window"
[58] "hand" "in" "hand"
[61] "hover" "over" "this"
[64] "great" "city" "gently"
[67] "remove" "the" "roofs"
[70] "and" "peep" "in"
[73] "at" "the" "queer"
[76] "things" "which" "are"
[79] "going" "on" "the"
[82] "strange" "coincidences" "the"
[85] "plannings" "the" "cross"
[88] "purposes" "the" "wonderful"
[91] "chains" "of" "events"
[94] "working" "through" "generations"
[97] "and" "leading" "to"
[100] "the" "most" "outré"
[103] "results" "it" "would"
[106] "make" "all" "fiction"
[109] "with" "its" "conventionalities"
[112] "and" "foreseen" "conclusions"
[115] "most" "stale" "and"
[118] "unprofitable"
For each word in the text, increase the count if the word exists in the counter list, otherwise set it to one.
<- list()
wordcounts for (word in wordsintext){
if (is.null(wordcounts[[word]])){
<- 1
wordcounts[[word]] else {
} <- wordcounts[[word]] + 1
wordcounts[[word]]
} }
wordcounts
$my
[1] 1
$dear
[1] 1
$fellow
[1] 1
$said
[1] 1
$sherlock
[1] 1
$holmes
[1] 1
$as
[1] 1
$we
[1] 3
$sat
[1] 1
$on
[1] 2
$either
[1] 1
$side
[1] 1
$of
[1] 5
$the
[1] 10
$fire
[1] 1
$`in`
[1] 3
$his
[1] 1
$lodgings
[1] 1
$at
[1] 2
$baker
[1] 1
$street
[1] 1
$life
[1] 1
$is
[1] 1
$infinitely
[1] 1
$stranger
[1] 1
$than
[1] 1
$anything
[1] 1
$which
[1] 3
$mind
[1] 1
$man
[1] 1
$could
[1] 2
$invent
[1] 1
$would
[1] 2
$not
[1] 1
$dare
[1] 1
$to
[1] 2
$conceive
[1] 1
$things
[1] 2
$are
[1] 2
$really
[1] 1
$mere
[1] 1
$commonplaces
[1] 1
$existence
[1] 1
$`if`
[1] 1
$fly
[1] 1
$out
[1] 1
$that
[1] 1
$window
[1] 1
$hand
[1] 2
$hover
[1] 1
$over
[1] 1
$this
[1] 1
$great
[1] 1
$city
[1] 1
$gently
[1] 1
$remove
[1] 1
$roofs
[1] 1
$and
[1] 4
$peep
[1] 1
$queer
[1] 1
$going
[1] 1
$strange
[1] 1
$coincidences
[1] 1
$plannings
[1] 1
$cross
[1] 1
$purposes
[1] 1
$wonderful
[1] 1
$chains
[1] 1
$events
[1] 1
$working
[1] 1
$through
[1] 1
$generations
[1] 1
$leading
[1] 1
$most
[1] 2
$outré
[1] 1
$results
[1] 1
$it
[1] 1
$make
[1] 1
$all
[1] 1
$fiction
[1] 1
$with
[1] 1
$its
[1] 1
$conventionalities
[1] 1
$foreseen
[1] 1
$conclusions
[1] 1
$stale
[1] 1
$unprofitable
[1] 1
Now we can ask questions about the statistics of words in the text. For example, which words occur more than twice in the text?
> 2] wordcounts[wordcounts
$we
[1] 3
$of
[1] 5
$the
[1] 10
$`in`
[1] 3
$which
[1] 3
$and
[1] 4
What are the most frequent words? Sort the list, most frequent word first.
order(unlist(wordcounts),decreasing = T)] wordcounts[
$the
[1] 10
$of
[1] 5
$and
[1] 4
$we
[1] 3
$`in`
[1] 3
$which
[1] 3
$on
[1] 2
$at
[1] 2
$could
[1] 2
$would
[1] 2
$to
[1] 2
$things
[1] 2
$are
[1] 2
$hand
[1] 2
$most
[1] 2
$my
[1] 1
$dear
[1] 1
$fellow
[1] 1
$said
[1] 1
$sherlock
[1] 1
$holmes
[1] 1
$as
[1] 1
$sat
[1] 1
$either
[1] 1
$side
[1] 1
$fire
[1] 1
$his
[1] 1
$lodgings
[1] 1
$baker
[1] 1
$street
[1] 1
$life
[1] 1
$is
[1] 1
$infinitely
[1] 1
$stranger
[1] 1
$than
[1] 1
$anything
[1] 1
$mind
[1] 1
$man
[1] 1
$invent
[1] 1
$not
[1] 1
$dare
[1] 1
$conceive
[1] 1
$really
[1] 1
$mere
[1] 1
$commonplaces
[1] 1
$existence
[1] 1
$`if`
[1] 1
$fly
[1] 1
$out
[1] 1
$that
[1] 1
$window
[1] 1
$hover
[1] 1
$over
[1] 1
$this
[1] 1
$great
[1] 1
$city
[1] 1
$gently
[1] 1
$remove
[1] 1
$roofs
[1] 1
$peep
[1] 1
$queer
[1] 1
$going
[1] 1
$strange
[1] 1
$coincidences
[1] 1
$plannings
[1] 1
$cross
[1] 1
$purposes
[1] 1
$wonderful
[1] 1
$chains
[1] 1
$events
[1] 1
$working
[1] 1
$through
[1] 1
$generations
[1] 1
$leading
[1] 1
$outré
[1] 1
$results
[1] 1
$it
[1] 1
$make
[1] 1
$all
[1] 1
$fiction
[1] 1
$with
[1] 1
$its
[1] 1
$conventionalities
[1] 1
$foreseen
[1] 1
$conclusions
[1] 1
$stale
[1] 1
$unprofitable
[1] 1
Where in the text does a word occur? Generate a list such that words are tags and the corresponding value is a vector of positions.
<- list()
wordlocations
for (i in 1:length(wordsintext)){
<- wordsintext[i]
word <- c(wordlocations[[word]],i)
wordlocations[[word]]
} wordlocations
$my
[1] 1
$dear
[1] 2
$fellow
[1] 3
$said
[1] 4
$sherlock
[1] 5
$holmes
[1] 6
$as
[1] 7
$we
[1] 8 35 51
$sat
[1] 9
$on
[1] 10 80
$either
[1] 11
$side
[1] 12
$of
[1] 13 31 48 55 92
$the
[1] 14 29 41 68 74 81 84 86 89 100
$fire
[1] 15
$`in`
[1] 16 59 72
$his
[1] 17
$lodgings
[1] 18
$at
[1] 19 73
$baker
[1] 20
$street
[1] 21
$life
[1] 22
$is
[1] 23
$infinitely
[1] 24
$stranger
[1] 25
$than
[1] 26
$anything
[1] 27
$which
[1] 28 43 77
$mind
[1] 30
$man
[1] 32
$could
[1] 33 52
$invent
[1] 34
$would
[1] 36 105
$not
[1] 37
$dare
[1] 38
$to
[1] 39 99
$conceive
[1] 40
$things
[1] 42 76
$are
[1] 44 78
$really
[1] 45
$mere
[1] 46
$commonplaces
[1] 47
$existence
[1] 49
$`if`
[1] 50
$fly
[1] 53
$out
[1] 54
$that
[1] 56
$window
[1] 57
$hand
[1] 58 60
$hover
[1] 61
$over
[1] 62
$this
[1] 63
$great
[1] 64
$city
[1] 65
$gently
[1] 66
$remove
[1] 67
$roofs
[1] 69
$and
[1] 70 97 112 117
$peep
[1] 71
$queer
[1] 75
$going
[1] 79
$strange
[1] 82
$coincidences
[1] 83
$plannings
[1] 85
$cross
[1] 87
$purposes
[1] 88
$wonderful
[1] 90
$chains
[1] 91
$events
[1] 93
$working
[1] 94
$through
[1] 95
$generations
[1] 96
$leading
[1] 98
$most
[1] 101 115
$outré
[1] 102
$results
[1] 103
$it
[1] 104
$make
[1] 106
$all
[1] 107
$fiction
[1] 108
$with
[1] 109
$its
[1] 110
$conventionalities
[1] 111
$foreseen
[1] 113
$conclusions
[1] 114
$stale
[1] 116
$unprofitable
[1] 118
As a side benefit, once we have the wordlocations
list, we can get the number of occurrences of words without passing over the data again. We only need to apply the length()
function to the list.
lapply(wordlocations, length)
$my
[1] 1
$dear
[1] 1
$fellow
[1] 1
$said
[1] 1
$sherlock
[1] 1
$holmes
[1] 1
$as
[1] 1
$we
[1] 3
$sat
[1] 1
$on
[1] 2
$either
[1] 1
$side
[1] 1
$of
[1] 5
$the
[1] 10
$fire
[1] 1
$`in`
[1] 3
$his
[1] 1
$lodgings
[1] 1
$at
[1] 2
$baker
[1] 1
$street
[1] 1
$life
[1] 1
$is
[1] 1
$infinitely
[1] 1
$stranger
[1] 1
$than
[1] 1
$anything
[1] 1
$which
[1] 3
$mind
[1] 1
$man
[1] 1
$could
[1] 2
$invent
[1] 1
$would
[1] 2
$not
[1] 1
$dare
[1] 1
$to
[1] 2
$conceive
[1] 1
$things
[1] 2
$are
[1] 2
$really
[1] 1
$mere
[1] 1
$commonplaces
[1] 1
$existence
[1] 1
$`if`
[1] 1
$fly
[1] 1
$out
[1] 1
$that
[1] 1
$window
[1] 1
$hand
[1] 2
$hover
[1] 1
$over
[1] 1
$this
[1] 1
$great
[1] 1
$city
[1] 1
$gently
[1] 1
$remove
[1] 1
$roofs
[1] 1
$and
[1] 4
$peep
[1] 1
$queer
[1] 1
$going
[1] 1
$strange
[1] 1
$coincidences
[1] 1
$plannings
[1] 1
$cross
[1] 1
$purposes
[1] 1
$wonderful
[1] 1
$chains
[1] 1
$events
[1] 1
$working
[1] 1
$through
[1] 1
$generations
[1] 1
$leading
[1] 1
$most
[1] 2
$outré
[1] 1
$results
[1] 1
$it
[1] 1
$make
[1] 1
$all
[1] 1
$fiction
[1] 1
$with
[1] 1
$its
[1] 1
$conventionalities
[1] 1
$foreseen
[1] 1
$conclusions
[1] 1
$stale
[1] 1
$unprofitable
[1] 1