help(read.table)
13 File input-output
Reading data from a text file
Data is frequently stored in tabular form in text files. The read.table()
function can read a file from your disk and return a data frame containing that data.
Suppose we have a data file mydata.txt with the following contents:
Can 1.70 65
Cem 1.75 66
Hande 1.62 61
Lale 1.76 64
Arda 1.78 63
Bilgin 1.77 84
Cem 1.69 75
Ozlem 1.75 65
Ali 1.73 75
Haluk 1.71 81
The file can be read into a data frame simply with:
<- read.table("mydata.txt")
hwdata hwdata
V1 V2 V3
1 Can 1.70 65
2 Cem 1.75 66
3 Hande 1.62 61
4 Lale 1.76 64
5 Arda 1.78 63
6 Bilgin 1.77 84
7 Cem 1.69 75
8 Ozlem 1.75 65
9 Ali 1.73 75
10 Haluk 1.71 81
class(hwdata)
[1] "data.frame"
We can change the columns of the dataframe as usual:
names(hwdata) <- c("Name", "Height","Weight")
hwdata
Name Height Weight
1 Can 1.70 65
2 Cem 1.75 66
3 Hande 1.62 61
4 Lale 1.76 64
5 Arda 1.78 63
6 Bilgin 1.77 84
7 Cem 1.69 75
8 Ozlem 1.75 65
9 Ali 1.73 75
10 Haluk 1.71 81
The function read.table()
is quite versatile, and it has a lot of parameters to tune its behavior. The help documentation help(read.table)
can be helpful.
Let’s read a new data file mydata2.txt. It has a header row, and we want to set the column names of the resulting data frame accordingly:
Name Height Weight
Can 1.70 65
Cem 1.75 66
Hande 1.62 61
Lale 1.76 64
Arda 1.78 63
Bilgin 1.77 84
Cem 1.69 75
Ozlem 1.75 65
Ali 1.73 75
Haluk 1.71 81
All we need is to set the header
parameter to TRUE
:
<- read.table("mydata2.txt",header = TRUE)
hwdata hwdata
Name Height Weight
1 Can 1.70 65
2 Cem 1.75 66
3 Hande 1.62 61
4 Lale 1.76 64
5 Arda 1.78 63
6 Bilgin 1.77 84
7 Cem 1.69 75
8 Ozlem 1.75 65
9 Ali 1.73 75
10 Haluk 1.71 81
Now we have another file mydata3.txt whose fields are separated with commas, instead of spaces:
Name,Height,Weight
Can,1.70,65
Cem,1.75,66
Hande,1.62,61
Lale,1.76,64
Arda,1.78,63
Bilgin,1.77,84
Cem,1.69,75
Ozlem,1.75,65
Ali,1.73,75
Haluk,1.71,81
To accomodate for that, we set the sep
parameter to the separator character, comma.
<- read.table("mydata3.txt",header = TRUE, sep=",")
hwdata hwdata
Name Height Weight
1 Can 1.70 65
2 Cem 1.75 66
3 Hande 1.62 61
4 Lale 1.76 64
5 Arda 1.78 63
6 Bilgin 1.77 84
7 Cem 1.69 75
8 Ozlem 1.75 65
9 Ali 1.73 75
10 Haluk 1.71 81
Now consider a more complicated data file mydata4.txt, which contains some comments added by the data collector.
Name,Height,Weight
Can,1.70,65
Cem,1.75,66
# Here is a comment
Hande,1.62,61
Lale,1.76,64
Arda,1.78,63
Bilgin,1.77,84 # another comment
Cem,1.69,75
Ozlem,1.75,65
Ali,1.73,75
Haluk,1.71,81
The comment character can be set with the comment.char
parameter to read.table()
. Then, everything on a line starting with #
is ignored:
<- read.table("mydata4.txt",header = TRUE, sep=",", comment.char="#")
hwdata hwdata
Name Height Weight
1 Can 1.70 65
2 Cem 1.75 66
3 Hande 1.62 61
4 Lale 1.76 64
5 Arda 1.78 63
6 Bilgin 1.77 84
7 Cem 1.69 75
8 Ozlem 1.75 65
9 Ali 1.73 75
10 Haluk 1.71 81
Actually, this was a redundant setting, because by default comment.char
is already set to "#"
.
Sometimes the separator character can be used in a text field, such as the space character in the column for names. In such cases, we use quotes to delimit the column’s content, as below (mydata5.txt):
Name Height Weight
"Can Can" 1.70 65
"Cem Cem" 1.75 66
"Hande Hande" 1.62 61
"Lale Lale" 1.76 64
"Arda Arda" 1.78 63
"Bilgin Bilgin" 1.77 84
"Cem Cim" 1.69 75
"Ozlem Ozlem" 1.75 65
"Ali Ali" 1.73 75
"Haluk Haluk" 1.71 81
The function read.table()
recognizes the single- or double quotes by default.
<- read.table("mydata5.txt", header=TRUE)
hwdata hwdata
Name Height Weight
1 Can Can 1.70 65
2 Cem Cem 1.75 66
3 Hande Hande 1.62 61
4 Lale Lale 1.76 64
5 Arda Arda 1.78 63
6 Bilgin Bilgin 1.77 84
7 Cem Cim 1.69 75
8 Ozlem Ozlem 1.75 65
9 Ali Ali 1.73 75
10 Haluk Haluk 1.71 81
Other quote characters can be specified using the quote
parameter. For example, consider the data file mydata6.txt:
Name Height Weight
%Can Can% 1.70 65
%Cem Cem% 1.75 66
%Hande Hande% 1.62 61
%Lale Lale% 1.76 64
%Arda Arda% 1.78 63
%Bilgin Bilgin% 1.77 84
%Cem Cim% 1.69 75
%Ozlem Ozlem% 1.75 65
%Ali Ali% 1.73 75
%Haluk Haluk% 1.71 81
Writing data to a file
Suppose that we process the data file by, e.g., adding some columns.
<- read.table("mydata6.txt", header=TRUE, quote="%")
hwdata $BMI <- hwdata$Weight / hwdata$Height^2
hwdata$BMI <- round(hwdata$BMI, 2) # round to two decimal places
hwdata hwdata
Name Height Weight BMI
1 Can Can 1.70 65 22.49
2 Cem Cem 1.75 66 21.55
3 Hande Hande 1.62 61 23.24
4 Lale Lale 1.76 64 20.66
5 Arda Arda 1.78 63 19.88
6 Bilgin Bilgin 1.77 84 26.81
7 Cem Cim 1.69 75 26.26
8 Ozlem Ozlem 1.75 65 21.22
9 Ali Ali 1.73 75 25.06
10 Haluk Haluk 1.71 81 27.70
The function write.table()
can be used to store a data frame in a file.
write.table(hwdata,"mydata7.txt")
This function writes the table together with the row names and column names:
"Name" "Height" "Weight" "BMI"
"1" "Can Can" 1.7 65 22.49
"2" "Cem Cem" 1.75 66 21.55
"3" "Hande Hande" 1.62 61 23.24
"4" "Lale Lale" 1.76 64 20.66
"5" "Arda Arda" 1.78 63 19.88
"6" "Bilgin Bilgin" 1.77 84 26.81
"7" "Cem Cim" 1.69 75 26.26
"8" "Ozlem Ozlem" 1.75 65 21.22
"9" "Ali Ali" 1.73 75 25.06
"10" "Haluk Haluk" 1.71 81 27.7
We can omit the row and column names with the following parameter settings.
write.table(hwdata,"mydata7.txt",row.names = FALSE, col.names = FALSE)