10  Lists

We’ve seen that every element of a vector in R must have the same mode (number, character, etc.).

This requirement is put for efficiency; it enables fast computations. However, it is also limiting. We cannot store more complicated data structures, such as

    Name: Fatma
    Salary: 5624.25
    Full time: yes

A data-processing language needs to provide a convenient structure for such mixed-type data. R has lists for this purpose.

A list’s elements can be of different types, allowing for more complicated data representations. They form the bridge between vectors and data frames, which we’ll see later.

Creating lists

In the simplest form, a list can be created with the list() function call.

ftm <- list("Fatma", 5624.25, TRUE)
ftm
[[1]]
[1] "Fatma"

[[2]]
[1] 5624.25

[[3]]
[1] TRUE

We can access list elements using the double bracket [[...]] notation.

ftm[[1]]
[1] "Fatma"
ftm[[2]]
[1] 5624.25

Tags of list elements

Instead of using integer indices, we can assign names (tags) to list components and refer to them using these tags.

ftm <- list(name="Fatma", salary=5624.25, fulltime=TRUE)
ftm
$name
[1] "Fatma"

$salary
[1] 5624.25

$fulltime
[1] TRUE
ftm$name
[1] "Fatma"

Mixing different objects

A list can comprise any type of object, such as vectors, matrices, sublists, etc.

list(1, c(2,3), list("abc",4))
[[1]]
[1] 1

[[2]]
[1] 2 3

[[3]]
[[3]][[1]]
[1] "abc"

[[3]][[2]]
[1] 4

List indexing

List elements can be accessed with three methods:

  1. using integer indices: mylist[[1]]
  2. using the mylist$tag notation, if tags are given
  3. using the mylist[["tag"]] notation, if tags are given
ftm <- list(name="Fatma", salary=5624.25, fulltime=TRUE)
ftm$name  # or, ftm[[1]], ftm[["name"]]
[1] "Fatma"
ftm[[2]]  # or, ftm$salary, ftm[["salary"]]
[1] 5624.25
ftm[["fulltime"]] # or, ftm$fulltime , ftm[[3]]
[1] TRUE

If the list element is a vector, the [...] operator can be used afterwards in order to select elements of that vector.

ftm <- list(name="Fatma", grades=c(10,12,9))
ftm$grades
[1] 10 12  9
ftm[[2]][3]
[1] 9
ftm$grades[3]
[1] 9

The syntax listname[["tagname"]] is useful when tagnames are stored in a variable.

ftm <- list(name="Fatma", salary=5624.25, fulltime=TRUE)
x <- "salary"
ftm[[x]]
[1] 5624.25

Selecting a range of indices

A range of indices can be selected using the familiar vector syntax with a single bracket. This returns a sublist.

ftm[1:2]
$name
[1] "Fatma"

$salary
[1] 5624.25
ftm[c(1,3)]
$name
[1] "Fatma"

$fulltime
[1] TRUE

However, this does not work with the double bracket notation.

ftm[[1:2]]
Error in ftm[[1:2]]: subscript out of bounds

Difference between indexing with single and double brackets

The availability of two types of brackets for list indexing can be confusing. They can be distinguished by their return types:

  • [i] returns a list with a single component
  • [[i]] returns a single component.
ftm[1]  # returns a list with a single component.
$name
[1] "Fatma"
ftm[[1]]  # returns a one-element vector
[1] "Fatma"
mode(ftm[1])
[1] "list"
mode(ftm[[1]])
[1] "character"

Adding new elements to a list

You can start with an incomplete list and add new elements as you go along.

ftm <- list(name="Fatma", salary=5624.25)
ftm
$name
[1] "Fatma"

$salary
[1] 5624.25
ftm$fulltime <- TRUE
ftm
$name
[1] "Fatma"

$salary
[1] 5624.25

$fulltime
[1] TRUE

New list elements can also be added via vector indices.

ftm[[4]] <- 28
ftm[5:7] <- c(a=FALSE,b=TRUE,c=TRUE)
ftm
$name
[1] "Fatma"

$salary
[1] 5624.25

$fulltime
[1] TRUE

[[4]]
[1] 28

[[5]]
[1] FALSE

[[6]]
[1] TRUE

[[7]]
[1] TRUE

This last example also shows that a list can have both tagged and untagged elements.

Delete elements from a list

You can delete an element by setting it to NULL.

ftm$fulltime <- NULL
ftm
$name
[1] "Fatma"

$salary
[1] 5624.25

[[3]]
[1] 28

[[4]]
[1] FALSE

[[5]]
[1] TRUE

[[6]]
[1] TRUE

Note that after deletion, all elements below the deleted one are moved up and their indices are decreased by one.

ftm[[3]] <- NULL
ftm
$name
[1] "Fatma"

$salary
[1] 5624.25

[[3]]
[1] FALSE

[[4]]
[1] TRUE

[[5]]
[1] TRUE
ftm[3:5] <- NULL
ftm
$name
[1] "Fatma"

$salary
[1] 5624.25

Concatenating lists

The familiar c() function can be used on lists, too.

c( list("abc", 32, T), list(5.1))
[[1]]
[1] "abc"

[[2]]
[1] 32

[[3]]
[1] TRUE

[[4]]
[1] 5.1
c(list(name="Fatma", salary=5624.25, fulltime=TRUE), list(hobby="painting"))
$name
[1] "Fatma"

$salary
[1] 5624.25

$fulltime
[1] TRUE

$hobby
[1] "painting"

Getting information on lists

To get the number of elements in a list, we can use the length() function.

ftm <- list(name="Fatma", salary=5624.25, fulltime=TRUE)
length(ftm)
[1] 3

To get the tags in a list, we use the names() function.

names(ftm)
[1] "name"     "salary"   "fulltime"

To obtain the values as a vector, we can use the unlist() function.

unlist(ftm)
     name    salary  fulltime 
  "Fatma" "5624.25"    "TRUE" 
unname(unlist(ftm))
[1] "Fatma"   "5624.25" "TRUE"   

Note that this function returns a vector, and the numeric and the Boolean values are converted to strings. The reason is that in a vector every element must be of the same type, and strings are the only common denominator here.

Applying functions to lists

The lapply() function applies a function to each element of a list, and returns the results as a list.

lapply(list(2,3.5,4), sqrt)
[[1]]
[1] 1.414214

[[2]]
[1] 1.870829

[[3]]
[1] 2

Working with tagged elements:

grades_1 <- c(10,12,11,14,8,12)
grades_2 <- c(13,11,10,11,9)
allgrades <- list(section1=grades_1, section2=grades_2)
allgrades
$section1
[1] 10 12 11 14  8 12

$section2
[1] 13 11 10 11  9
mean(allgrades$section1)
[1] 11.16667
lapply(allgrades, mean)
$section1
[1] 11.16667

$section2
[1] 10.8

The sapply() (simple apply) function returns a vector or a matrix resulting from the application of the function.

sapply(allgrades, mean)
section1 section2 
11.16667 10.80000 
mode(sapply(allgrades, mean))
[1] "numeric"

We can define our own functions to specify what to do with each element.

mult_by2 <- function(x) {2*x}
mult_by2(c(1,2,3,4))
[1] 2 4 6 8
lapply( list(1, 2, 3:7), mult_by2)
[[1]]
[1] 2

[[2]]
[1] 4

[[3]]
[1]  6  8 10 12 14

Examples

Calculate weekly payrolls

Set up a list of staff members, where each element is a list consisting of names, wages and number of hours worked.

staff <- list(
    id000=list(name="Fatma", wage=12.5, hours=20),
    id001=list(name="Ekrem", wage=11.7, hours=30),
    id002=list(name="Deniz", wage=13.3, hours=25)
)
staff
$id000
$id000$name
[1] "Fatma"

$id000$wage
[1] 12.5

$id000$hours
[1] 20


$id001
$id001$name
[1] "Ekrem"

$id001$wage
[1] 11.7

$id001$hours
[1] 30


$id002
$id002$name
[1] "Deniz"

$id002$wage
[1] 13.3

$id002$hours
[1] 25

Define a function that takes one person as defined above, and returns the weekly pay.

payroll <- function(person){person$wage * person$hours}
payroll(list(name="Deniz", wage=13.3, hours=25))
[1] 332.5

Now apply this function to every staff member on the list staff.

lapply(staff, payroll)
$id000
[1] 250

$id001
[1] 351

$id002
[1] 332.5
sapply(staff, payroll)
id000 id001 id002 
250.0 351.0 332.5 

Count the occurrences of numbers in a vector

We have a vector of numbers where numbers are repeated.

mydata = c(1,2,3,15,1,2,3,4,1)

We want to keep the count of each number in a list, such that counts[[i]] stores how many times the number i occurs in data.

Initialize the counts list with zeros.

counts = list(0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0)

Now loop over the data vector, and increase the count of the appropriate number.

for (x in mydata) {
    counts[[x]] <- counts[[x]] + 1
}
counts
[[1]]
[1] 3

[[2]]
[1] 2

[[3]]
[1] 2

[[4]]
[1] 1

[[5]]
[1] 0

[[6]]
[1] 0

[[7]]
[1] 0

[[8]]
[1] 0

[[9]]
[1] 0

[[10]]
[1] 0

[[11]]
[1] 0

[[12]]
[1] 0

[[13]]
[1] 0

[[14]]
[1] 0

[[15]]
[1] 1

[[16]]
[1] 0

[[17]]
[1] 0

Count the occurrences of words in a text

Here is a simple application of textual analysis. Consider the following (short) text. It is preprocessed to remove punctuation marks and uppercase letters.

sometext <- "my dear fellow said sherlock holmes as we sat on either side of the fire in his lodgings at baker street life is infinitely stranger than anything which the mind of man could invent we would not dare to conceive the things which are really mere commonplaces of existence if we could fly out of that window hand in hand hover over this great city gently remove the roofs and peep in at the queer things which are going on the strange coincidences the plannings the cross purposes the wonderful chains of events working through generations and leading to the most outré results it would make all fiction with its conventionalities and foreseen conclusions most stale and unprofitable"

We wish to create a list wordcounts such that wordcounts$word gives the number of occurrences of word in the given text.

This problem is similar to the example above where we counted the occurrences of numbers. However, we don’t know in advance what words and how many words we are going to encounter. So we cannot initialize the counts to zero.

We will approach the problem as follows:

for every word in the word list
    if the word is already in the list, increase the count.
    otherwise, add this word with a count of 1.

If an element is not in a list, the list returns NULL. This can be used to check for the existence of an element in a list.

wordcounts <- list()
wordcounts
list()
wordcounts[["sherlock"]]
NULL
is.null(wordcounts[["sherlock"]])
[1] TRUE

So, beginning with the first word, we add it to our list:

word <- "my"
if (is.null(wordcounts[[word]])){
    wordcounts[[word]] <- 1
} else {
    wordcounts[[word]] <- wordcounts[[word]] + 1
}
wordcounts
$my
[1] 1

Similarly, the second word:

word <- "dear"
if (is.null(wordcounts[[word]])){
    wordcounts[[word]] <- 1
} else {
    wordcounts[[word]] <- wordcounts[[word]] + 1
}
wordcounts
$my
[1] 1

$dear
[1] 1

And now the list contains the elements we gave, and nothing more.

wordcounts
$my
[1] 1

$dear
[1] 1

We can’t go word by word manually. The better solution is to loop over every word in the text. We need to find a way to convert the large string of text to a vector, so that we can take the words one by one.

The strsplit() function does that for us:

strsplit(sometext, split=" ")
[[1]]
  [1] "my"                "dear"              "fellow"           
  [4] "said"              "sherlock"          "holmes"           
  [7] "as"                "we"                "sat"              
 [10] "on"                "either"            "side"             
 [13] "of"                "the"               "fire"             
 [16] "in"                "his"               "lodgings"         
 [19] "at"                "baker"             "street"           
 [22] "life"              "is"                "infinitely"       
 [25] "stranger"          "than"              "anything"         
 [28] "which"             "the"               "mind"             
 [31] "of"                "man"               "could"            
 [34] "invent"            "we"                "would"            
 [37] "not"               "dare"              "to"               
 [40] "conceive"          "the"               "things"           
 [43] "which"             "are"               "really"           
 [46] "mere"              "commonplaces"      "of"               
 [49] "existence"         "if"                "we"               
 [52] "could"             "fly"               "out"              
 [55] "of"                "that"              "window"           
 [58] "hand"              "in"                "hand"             
 [61] "hover"             "over"              "this"             
 [64] "great"             "city"              "gently"           
 [67] "remove"            "the"               "roofs"            
 [70] "and"               "peep"              "in"               
 [73] "at"                "the"               "queer"            
 [76] "things"            "which"             "are"              
 [79] "going"             "on"                "the"              
 [82] "strange"           "coincidences"      "the"              
 [85] "plannings"         "the"               "cross"            
 [88] "purposes"          "the"               "wonderful"        
 [91] "chains"            "of"                "events"           
 [94] "working"           "through"           "generations"      
 [97] "and"               "leading"           "to"               
[100] "the"               "most"              "outré"            
[103] "results"           "it"                "would"            
[106] "make"              "all"               "fiction"          
[109] "with"              "its"               "conventionalities"
[112] "and"               "foreseen"          "conclusions"      
[115] "most"              "stale"             "and"              
[118] "unprofitable"     

Note that strsplit() returns a list. The first element of this list is the vector of strings we look for.

wordsintext <- strsplit(sometext, split=" ")[[1]]
wordsintext
  [1] "my"                "dear"              "fellow"           
  [4] "said"              "sherlock"          "holmes"           
  [7] "as"                "we"                "sat"              
 [10] "on"                "either"            "side"             
 [13] "of"                "the"               "fire"             
 [16] "in"                "his"               "lodgings"         
 [19] "at"                "baker"             "street"           
 [22] "life"              "is"                "infinitely"       
 [25] "stranger"          "than"              "anything"         
 [28] "which"             "the"               "mind"             
 [31] "of"                "man"               "could"            
 [34] "invent"            "we"                "would"            
 [37] "not"               "dare"              "to"               
 [40] "conceive"          "the"               "things"           
 [43] "which"             "are"               "really"           
 [46] "mere"              "commonplaces"      "of"               
 [49] "existence"         "if"                "we"               
 [52] "could"             "fly"               "out"              
 [55] "of"                "that"              "window"           
 [58] "hand"              "in"                "hand"             
 [61] "hover"             "over"              "this"             
 [64] "great"             "city"              "gently"           
 [67] "remove"            "the"               "roofs"            
 [70] "and"               "peep"              "in"               
 [73] "at"                "the"               "queer"            
 [76] "things"            "which"             "are"              
 [79] "going"             "on"                "the"              
 [82] "strange"           "coincidences"      "the"              
 [85] "plannings"         "the"               "cross"            
 [88] "purposes"          "the"               "wonderful"        
 [91] "chains"            "of"                "events"           
 [94] "working"           "through"           "generations"      
 [97] "and"               "leading"           "to"               
[100] "the"               "most"              "outré"            
[103] "results"           "it"                "would"            
[106] "make"              "all"               "fiction"          
[109] "with"              "its"               "conventionalities"
[112] "and"               "foreseen"          "conclusions"      
[115] "most"              "stale"             "and"              
[118] "unprofitable"     

For each word in the text, increase the count if the word exists in the counter list, otherwise set it to one.

wordcounts <- list()
for (word in wordsintext){
    if (is.null(wordcounts[[word]])){
        wordcounts[[word]] <- 1
    } else {
        wordcounts[[word]] <- wordcounts[[word]] + 1
    }
}
wordcounts
$my
[1] 1

$dear
[1] 1

$fellow
[1] 1

$said
[1] 1

$sherlock
[1] 1

$holmes
[1] 1

$as
[1] 1

$we
[1] 3

$sat
[1] 1

$on
[1] 2

$either
[1] 1

$side
[1] 1

$of
[1] 5

$the
[1] 10

$fire
[1] 1

$`in`
[1] 3

$his
[1] 1

$lodgings
[1] 1

$at
[1] 2

$baker
[1] 1

$street
[1] 1

$life
[1] 1

$is
[1] 1

$infinitely
[1] 1

$stranger
[1] 1

$than
[1] 1

$anything
[1] 1

$which
[1] 3

$mind
[1] 1

$man
[1] 1

$could
[1] 2

$invent
[1] 1

$would
[1] 2

$not
[1] 1

$dare
[1] 1

$to
[1] 2

$conceive
[1] 1

$things
[1] 2

$are
[1] 2

$really
[1] 1

$mere
[1] 1

$commonplaces
[1] 1

$existence
[1] 1

$`if`
[1] 1

$fly
[1] 1

$out
[1] 1

$that
[1] 1

$window
[1] 1

$hand
[1] 2

$hover
[1] 1

$over
[1] 1

$this
[1] 1

$great
[1] 1

$city
[1] 1

$gently
[1] 1

$remove
[1] 1

$roofs
[1] 1

$and
[1] 4

$peep
[1] 1

$queer
[1] 1

$going
[1] 1

$strange
[1] 1

$coincidences
[1] 1

$plannings
[1] 1

$cross
[1] 1

$purposes
[1] 1

$wonderful
[1] 1

$chains
[1] 1

$events
[1] 1

$working
[1] 1

$through
[1] 1

$generations
[1] 1

$leading
[1] 1

$most
[1] 2

$outré
[1] 1

$results
[1] 1

$it
[1] 1

$make
[1] 1

$all
[1] 1

$fiction
[1] 1

$with
[1] 1

$its
[1] 1

$conventionalities
[1] 1

$foreseen
[1] 1

$conclusions
[1] 1

$stale
[1] 1

$unprofitable
[1] 1

Now we can ask questions about the statistics of words in the text. For example, which words occur more than twice in the text?

wordcounts[wordcounts > 2]
$we
[1] 3

$of
[1] 5

$the
[1] 10

$`in`
[1] 3

$which
[1] 3

$and
[1] 4

What are the most frequent words? Sort the list, most frequent word first.

wordcounts[order(unlist(wordcounts),decreasing = T)]
$the
[1] 10

$of
[1] 5

$and
[1] 4

$we
[1] 3

$`in`
[1] 3

$which
[1] 3

$on
[1] 2

$at
[1] 2

$could
[1] 2

$would
[1] 2

$to
[1] 2

$things
[1] 2

$are
[1] 2

$hand
[1] 2

$most
[1] 2

$my
[1] 1

$dear
[1] 1

$fellow
[1] 1

$said
[1] 1

$sherlock
[1] 1

$holmes
[1] 1

$as
[1] 1

$sat
[1] 1

$either
[1] 1

$side
[1] 1

$fire
[1] 1

$his
[1] 1

$lodgings
[1] 1

$baker
[1] 1

$street
[1] 1

$life
[1] 1

$is
[1] 1

$infinitely
[1] 1

$stranger
[1] 1

$than
[1] 1

$anything
[1] 1

$mind
[1] 1

$man
[1] 1

$invent
[1] 1

$not
[1] 1

$dare
[1] 1

$conceive
[1] 1

$really
[1] 1

$mere
[1] 1

$commonplaces
[1] 1

$existence
[1] 1

$`if`
[1] 1

$fly
[1] 1

$out
[1] 1

$that
[1] 1

$window
[1] 1

$hover
[1] 1

$over
[1] 1

$this
[1] 1

$great
[1] 1

$city
[1] 1

$gently
[1] 1

$remove
[1] 1

$roofs
[1] 1

$peep
[1] 1

$queer
[1] 1

$going
[1] 1

$strange
[1] 1

$coincidences
[1] 1

$plannings
[1] 1

$cross
[1] 1

$purposes
[1] 1

$wonderful
[1] 1

$chains
[1] 1

$events
[1] 1

$working
[1] 1

$through
[1] 1

$generations
[1] 1

$leading
[1] 1

$outré
[1] 1

$results
[1] 1

$it
[1] 1

$make
[1] 1

$all
[1] 1

$fiction
[1] 1

$with
[1] 1

$its
[1] 1

$conventionalities
[1] 1

$foreseen
[1] 1

$conclusions
[1] 1

$stale
[1] 1

$unprofitable
[1] 1

Where in the text does a word occur? Generate a list such that words are tags and the corresponding value is a vector of positions.

wordlocations <- list()

for (i in 1:length(wordsintext)){
    word <- wordsintext[i]
    wordlocations[[word]] <- c(wordlocations[[word]],i)
}
wordlocations
$my
[1] 1

$dear
[1] 2

$fellow
[1] 3

$said
[1] 4

$sherlock
[1] 5

$holmes
[1] 6

$as
[1] 7

$we
[1]  8 35 51

$sat
[1] 9

$on
[1] 10 80

$either
[1] 11

$side
[1] 12

$of
[1] 13 31 48 55 92

$the
 [1]  14  29  41  68  74  81  84  86  89 100

$fire
[1] 15

$`in`
[1] 16 59 72

$his
[1] 17

$lodgings
[1] 18

$at
[1] 19 73

$baker
[1] 20

$street
[1] 21

$life
[1] 22

$is
[1] 23

$infinitely
[1] 24

$stranger
[1] 25

$than
[1] 26

$anything
[1] 27

$which
[1] 28 43 77

$mind
[1] 30

$man
[1] 32

$could
[1] 33 52

$invent
[1] 34

$would
[1]  36 105

$not
[1] 37

$dare
[1] 38

$to
[1] 39 99

$conceive
[1] 40

$things
[1] 42 76

$are
[1] 44 78

$really
[1] 45

$mere
[1] 46

$commonplaces
[1] 47

$existence
[1] 49

$`if`
[1] 50

$fly
[1] 53

$out
[1] 54

$that
[1] 56

$window
[1] 57

$hand
[1] 58 60

$hover
[1] 61

$over
[1] 62

$this
[1] 63

$great
[1] 64

$city
[1] 65

$gently
[1] 66

$remove
[1] 67

$roofs
[1] 69

$and
[1]  70  97 112 117

$peep
[1] 71

$queer
[1] 75

$going
[1] 79

$strange
[1] 82

$coincidences
[1] 83

$plannings
[1] 85

$cross
[1] 87

$purposes
[1] 88

$wonderful
[1] 90

$chains
[1] 91

$events
[1] 93

$working
[1] 94

$through
[1] 95

$generations
[1] 96

$leading
[1] 98

$most
[1] 101 115

$outré
[1] 102

$results
[1] 103

$it
[1] 104

$make
[1] 106

$all
[1] 107

$fiction
[1] 108

$with
[1] 109

$its
[1] 110

$conventionalities
[1] 111

$foreseen
[1] 113

$conclusions
[1] 114

$stale
[1] 116

$unprofitable
[1] 118

As a side benefit, once we have the wordlocations list, we can get the number of occurrences of words without passing over the data again. We only need to apply the length() function to the list.

lapply(wordlocations, length)
$my
[1] 1

$dear
[1] 1

$fellow
[1] 1

$said
[1] 1

$sherlock
[1] 1

$holmes
[1] 1

$as
[1] 1

$we
[1] 3

$sat
[1] 1

$on
[1] 2

$either
[1] 1

$side
[1] 1

$of
[1] 5

$the
[1] 10

$fire
[1] 1

$`in`
[1] 3

$his
[1] 1

$lodgings
[1] 1

$at
[1] 2

$baker
[1] 1

$street
[1] 1

$life
[1] 1

$is
[1] 1

$infinitely
[1] 1

$stranger
[1] 1

$than
[1] 1

$anything
[1] 1

$which
[1] 3

$mind
[1] 1

$man
[1] 1

$could
[1] 2

$invent
[1] 1

$would
[1] 2

$not
[1] 1

$dare
[1] 1

$to
[1] 2

$conceive
[1] 1

$things
[1] 2

$are
[1] 2

$really
[1] 1

$mere
[1] 1

$commonplaces
[1] 1

$existence
[1] 1

$`if`
[1] 1

$fly
[1] 1

$out
[1] 1

$that
[1] 1

$window
[1] 1

$hand
[1] 2

$hover
[1] 1

$over
[1] 1

$this
[1] 1

$great
[1] 1

$city
[1] 1

$gently
[1] 1

$remove
[1] 1

$roofs
[1] 1

$and
[1] 4

$peep
[1] 1

$queer
[1] 1

$going
[1] 1

$strange
[1] 1

$coincidences
[1] 1

$plannings
[1] 1

$cross
[1] 1

$purposes
[1] 1

$wonderful
[1] 1

$chains
[1] 1

$events
[1] 1

$working
[1] 1

$through
[1] 1

$generations
[1] 1

$leading
[1] 1

$most
[1] 2

$outré
[1] 1

$results
[1] 1

$it
[1] 1

$make
[1] 1

$all
[1] 1

$fiction
[1] 1

$with
[1] 1

$its
[1] 1

$conventionalities
[1] 1

$foreseen
[1] 1

$conclusions
[1] 1

$stale
[1] 1

$unprofitable
[1] 1