flowchart B[Code body] --> C{Condition} C -- True --> D[Execute if True] D --> C C -- False --> E[Code body]
Repetition flow diagram.
Repetition (also known as “loops”) are another fundamental structure of all programming languages.
Repetition, like decision, requires a condition to execute a block of code. However, unlike decision, at the end of the block it checks the condition again. If the condition still holds, the block is executed again.
flowchart B[Code body] --> C{Condition} C -- True --> D[Execute if True] D --> C C -- False --> E[Code body]
Repetition flow diagram.
Simple example: Print the squares of 1, 2, … 10.
Without loops, it would be cumbersome:
<- 1
i cat(i,i^2,"\n")
1 1
<- i+1
i cat(i,i^2, "\n")
2 4
<- i+1
i cat(i,i^2, "\n")
3 9
With loops, it is shorter and more flexible:
<- 1 # initialization
i while (i<=10) { # condition
cat(i,i^2,"\n")
<- i+1 # update
i }
1 1
2 4
3 9
4 16
5 25
6 36
7 49
8 64
9 81
10 100
while
loop<initialization>
while( <condition> ) {
<statements>
<update variables>
}
<condition>
is a Boolean expression, usually involving existing variables.<condition>
is true the block is executed (this is called one iteration).<- 1 # initialization
i while (i<=10) { # condition
cat(i,i^2,"\n")
<- i+1 # update
i }
1 1
2 4
3 9
4 16
5 25
6 36
7 49
8 64
9 81
10 100
Find the sum of elements in a vector
Without a loop:
<- c(-1,4,2,5,1,4,6,2,0)
mydata <- 0
total <- 1
i <- total + mydata[i]
total <- i+1
i <- total + mydata[i]
total <- i+1
i <- total + mydata[i]
total #...
With a loop
<- c(-1,4,2,5,1,4,6,2,0)
mydata <- 0
total <- 1
i while (i<=length(mydata)){
<- total + mydata[i]
total cat(i, mydata[i], total, "\n")
<- i + 1
i }
1 -1 -1
2 4 3
3 2 5
4 5 10
5 1 11
6 4 15
7 6 21
8 2 23
9 0 23
total
[1] 23
Find the maximum element in a vector
<- c(1,4,2,5,1,4,6,2,0,7,3,1)
mydata <- mydata[1]
largest <- 2
i while (i<=length(mydata)){
if(mydata[i]>largest)
<- mydata[i]
largest <- i + 1
i
} largest
[1] 7
while
that does not take a Boolean condition.for
statement takes elements from a vector one by one, and runs the loop body with the current element.for(i in c(2,-1,5,3,7)) {
cat(i,i^2,"\n")
}
2 4
-1 1
5 25
3 9
7 49
In many cases for
is simpler than while
, especially when you need to iterate over the elements of a vector.
Note that by using for
you don’t need to keep track of the element index. This is OK if you are interested in the element values only, but it might not work if the element’s location is relevant.
break
statementWhen the program encounters a break
statement, it terminates the loop. The remainder of the block is skipped over.
<- 1
i while(i<=10){
if(i==7)
break
cat("i =",i,"\n")
<- i+1
i }
i = 1
i = 2
i = 3
i = 4
i = 5
i = 6
cat("Goodbye")
Goodbye
Same with a for
loop
for(i in 1:10){
if (i==7)
break
cat("i =",i,"\n")
}
i = 1
i = 2
i = 3
i = 4
i = 5
i = 6
next
statement<-0
iwhile(i<=10){
<- i+1
i if( i%%3==0 )
next
cat(i, "")
}
1 2 4 5 7 8 10 11
Caution: If the update statement is located after next
, the loop may not terminate.
<-1
iwhile(i<=10){
if( i%%3==0 )
next
cat(i, "")
<- i+1
i }
repeat
looprepeat
statement provides an infinite loop.break
.<- 1
i repeat {
<- i + 3
i cat(i,"")
if(i>10) break
}
4 7 10 13
The next
statement works in the same way in repeat
loops.
<- -2
i repeat {
<- i + 1
i if (i%%3==0) next
cat(i,"")
if(i>10) break
}
-1 1 2 4 5 7 8 10 11
while
is the most general one.repeat
is the same as while(TRUE)
.for
is more convenient when going over sequences such as vectors.A loop body can contain anything, including other loops.
for (i in 1:4) {
cat("i =",i,"\n")
for (j in c(7,8,9))
cat(" i+j =",i+j,"\n")
}
i = 1
i+j = 8
i+j = 9
i+j = 10
i = 2
i+j = 9
i+j = 10
i+j = 11
i = 3
i+j = 10
i+j = 11
i+j = 12
i = 4
i+j = 11
i+j = 12
i+j = 13
<- c(1,2,3)
x <- c(4,5,6)
y <- x + y
z z
[1] 5 7 9
Alternative way by using a for
loop
<- vector(length=length(x))
z for(i in 1:length(x))
<- x[i] + y[i]
z[i] z
[1] 5 7 9
Let’s measure the time that the computer takes for each operation. We use large vectors so that we can see the time difference clearly.
<- runif(1000000)
x <- runif(1000000)
y system.time(z <- x+y) # Time taken by vectorized addition
user system elapsed
0.003 0.002 0.005
<- vector(length=1000000)
z system.time(for(i in 1:length(x)) z[i] <- x[i] + y[i]) # time taken by explicit loop.
user system elapsed
0.088 0.004 0.093
for()
loop, the :
operator, the index operator [
are all function calls, which slows the code.Moving averages are a way to reduce fluctuations in data. For example, the 2-element sample moving average of a vector \(v\) with \(n\) elements is defined as
\[ \left(\frac{v_1+v_2}{2},\frac{v_2+v_3}{2},\frac{v_3+v_4}{2}, \ldots, \frac{v_{n-1}+v_n}{2}\right)\]
Let’s generate some synthetic data with an upward trend and some random noise.
<- cumsum(sample(c(-1,2),size = 100, replace=TRUE))
data plot(data, type="l")
<- vector(length=length(data)-1)
movav for(i in 1:length(data)-1)
<- (data[i] + data[i+1])/2 movav[i]
plot(data, type="l")
lines(movav, col="red")
Implement this in vectorized form:
<- (data[1:length(data)-1] + data[2:length(data)])/2 movav
A Fibonacci sequence starts with 1 and 1, and each new value is the sum of the two previous values. Formally: \[\begin{eqnarray} F_1 &=& 1\\ F_2 &=& 1\\ F_{n} &=& F_{n-1} + F_{n-2} \end{eqnarray}\]
Each number in this sequence is called a Fibonacci number. Let us write R code that displays the first 20 Fibonacci numbers.
<- 1
f1 <- 1
f2 for (i in 3:20){
<- f1
temp <- f2
f1 <- f2 + temp
f2 cat(f2,"\n")
}
2
3
5
8
13
21
34
55
89
144
233
377
610
987
1597
2584
4181
6765
<- 10000
account_balance <- 0.1
interest_rate <- 10
years <- account_balance
balance_vec for (y in 1:years){
<- account_balance * (1+interest_rate)
account_balance <- c(balance_vec, account_balance)
balance_vec cat("After",y,"years your account balance is",account_balance,"\n")
}
After 1 years your account balance is 11000
After 2 years your account balance is 12100
After 3 years your account balance is 13310
After 4 years your account balance is 14641
After 5 years your account balance is 16105.1
After 6 years your account balance is 17715.61
After 7 years your account balance is 19487.17
After 8 years your account balance is 21435.89
After 9 years your account balance is 23579.48
After 10 years your account balance is 25937.42
Evaluate the sum \[\sum_{i=1}^{n} 2^{-i} = \frac{1}{2} + \frac{1}{4} +\ldots + \frac{1}{2^n}\] for given \(n\).
<- 5
n <- 0
total for (i in 1:n) {
<- total + 1/2^i
total
} total
[1] 0.96875
As \(n\rightarrow\infty\), the sum must approach 1. To see this, let us wrap another loop around the code to change n.
for (n in 2:20){
<- 0
total for (i in 1:n) {
<- total + 1/2^i
total
}cat("n =",n,", series total =",total,"\n")
}
n = 2 , series total = 0.75
n = 3 , series total = 0.875
n = 4 , series total = 0.9375
n = 5 , series total = 0.96875
n = 6 , series total = 0.984375
n = 7 , series total = 0.9921875
n = 8 , series total = 0.9960938
n = 9 , series total = 0.9980469
n = 10 , series total = 0.9990234
n = 11 , series total = 0.9995117
n = 12 , series total = 0.9997559
n = 13 , series total = 0.9998779
n = 14 , series total = 0.999939
n = 15 , series total = 0.9999695
n = 16 , series total = 0.9999847
n = 17 , series total = 0.9999924
n = 18 , series total = 0.9999962
n = 19 , series total = 0.9999981
n = 20 , series total = 0.999999
sum
and cumsum
Implement the sum()
and cumsum()
functions using R’s loop structures. Test your functions with some simple cases to ensure that they work correctly. Using random vectors of size 1,000,000 as input, compare their speed with the built-in versions.
Write a function digits(x)
that takes a positive integer and returns a vector of its digits. For example, digits(0667230)
should return the vector (6,6,7,2,3,0)
.
Write a function named digitsum
which takes a positive integer and returns the sum of digits (the “digital sum”) of the input. For example, the function call digitsum(35274)
should return 3+5+2+7+4=21.
Using this function, find the number between 1 and 1 million that has the largest digital sum.
Currently there are 1,000,000 inhabitants in city A, and 500,000 in city B. Each year, 2% of people in city A move to city B, and 3% of people in city A move to city B. The intrinsic growth rate of both cities is 1% (i.e., the growth in the absence of any migration).
Plot the population of both cities for the next 20 years.
The Collatz sequence is defined as follows:
For example, starting with 10, the Collatz sequence is 10, 5, 16, 8, 4, 2, 1. The number of steps required to reach 1 is 6.
It is believed that for any starting point the sequence ends with 1. However, this is not proven.
collatzlen(n)
that returns the number of steps required to go from n to 1 in a Collatz sequence. For example, collatzlen(10)
should return 6.