## R Syntax, Data Structures, and Functions

Here we give a brief overview of some R programming fundamentals. Some references from Grolemund and Wickham R for Data Science
- [Basics](http://r4ds.had.co.nz/workflow-basics.html)
- [Vectors](http://r4ds.had.co.nz/vectors.html)
- [Functions](http://r4ds.had.co.nz/functions.html)

### Vectors

In [61]:
a <- 4
a

In [62]:
b <- 5
a + b

In [63]:
d <- c(1,10,11,12)
d

In [64]:
a + d

In [65]:
f <- c(1,2)
d + f

In [66]:
d <- 1:5
str(d)

 int [1:5] 1 2 3 4 5


In [67]:
d <- 2*d
str(d)

 num [1:5] 2 4 6 8 10


In [69]:
d
d[3]

In [70]:
d
d[c(1,4)]

In [71]:
a <- as.integer(4)
str(a) ## scalars are length 1 vectors in R
length(a)
a[1] 

 int 4


In [73]:
d
d[3] <- "howdy"
d

In [74]:
str(d)

 chr [1:5] "2" "4" "howdy" "8" "10"


Important point: vectors are homogeneous i.e. all elements of the same type. R does not warn when casting. This is much like arrays in numpy.

### Matrices and Arrays

In [35]:
a <- 1:4
b <- matrix(a,nrow=2,ncol=2)

In [37]:
b

0,1
1,3
2,4


In [38]:
b + 4

0,1
5,7
6,8


In [41]:
b + c(1,2)

0,1
2,4
4,6


In [45]:
d <- matrix(5:8,nrow=2,ncol=2)
d
d[1,1]
d[1,]

0,1
5,7
6,8


In [44]:
b
d
b*d ## elementwise

0,1
1,3
2,4


0,1
5,7
6,8


0,1
5,21
12,32


In [46]:
b%*%d ## matrix multiplication

0,1
23,31
34,46


In [47]:
str(b)

 int [1:2, 1:2] 1 2 3 4


In [49]:
d <- array(rnorm(16),dim=c(2,2,4))

In [50]:
dim(d)

In [54]:
str(d)
d
d[1,1,4]

 num [1:2, 1:2, 1:4] -0.327 -1.544 1.837 1.332 -1.315 ...


In [None]:
## do rowSums, colMeans, apply

### Lists and Dataframes

Lists are like heterogeneous vectors. More flexible but take up more space, fewer mathematical operations available.  

In [80]:
a <- list("first"=10,"second"="howdy","third"=mean)
a


In [81]:
a$first

In [83]:
a[[1]]

In [86]:
b <- rnorm(1e7)
d <- as.list(rnorm(1e7))

In [87]:
head(b)
head(d)

In [89]:
length(b)
length(d)

In [88]:
## huge difference
object.size(b)
object.size(d)

80000040 bytes

560000040 bytes

In [116]:
iris
head(iris)

Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
5.1,3.5,1.4,0.2,setosa
4.9,3.0,1.4,0.2,setosa
4.7,3.2,1.3,0.2,setosa
4.6,3.1,1.5,0.2,setosa
5.0,3.6,1.4,0.2,setosa
5.4,3.9,1.7,0.4,setosa
4.6,3.4,1.4,0.3,setosa
5.0,3.4,1.5,0.2,setosa
4.4,2.9,1.4,0.2,setosa
4.9,3.1,1.5,0.1,setosa


Sepal.Length,Sepal.Width,Petal.Length,Petal.Width,Species
5.1,3.5,1.4,0.2,setosa
4.9,3.0,1.4,0.2,setosa
4.7,3.2,1.3,0.2,setosa
4.6,3.1,1.5,0.2,setosa
5.0,3.6,1.4,0.2,setosa
5.4,3.9,1.7,0.4,setosa


In [117]:
str(iris)

'data.frame':	150 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...


In [119]:
## data frames are a list of vectors of the same length, but possibly different types
## pandas in python is meant to replicate R data frames
head(iris[1]) ## first column

Sepal.Length
5.1
4.9
4.7
4.6
5.0
5.4


In [None]:
### example of lapply for computing set of functions on vector

### For Loops versus Vectorization

In C, C++, and many compiled languages for loops are commonly used to repeat operations across vectors. In R it is more computationally efficient and often clearer to use built in functions.

In [93]:
for(ii in 1:10){
    print(ii)
}

[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
[1] 9
[1] 10


In [95]:
a <- 1:5
for(ii in 1:length(a)){
    a[ii] <- a[ii] + 1
}
a
a + 1

In [96]:
### time these with large a
### use proc.time() function

a <- 1:1e7
tm <- proc.time()
for(ii in 1:length(a)){
    a[ii] <- a[ii] + 1
}
proc.time() - tm

   user  system elapsed 
  0.672   0.012   0.687 

In [98]:
a <- 1:1e7
tm <- proc.time()
a <- a + 1
proc.time() - tm


   user  system elapsed 
  0.036   0.000   0.033 

0,1
1,501
2,502
3,503
4,504
5,505
6,506


In [100]:
### built in functions for vectorizing operations
a <- 1:4
sum(a)
mean(a)
var(a)


In [101]:
max(a)
min(a)

In [102]:
## sum rows with a for loop
a <- matrix(1:1000,nrow=500,ncol=2)
head(a)

rs <- rep(0,nrow(a))
for(ii in 1:nrow(a)){
    rs[ii] <- sum(a[ii,])
}
head(rs)

0,1
1,501
2,502
3,503
4,504
5,505
6,506


In [103]:
rs2 <- rowSums(a)
head(rs2)

### Functions and Scoping

In [104]:
AddTwo <- function(x,y){
    return(x+y)
}
AddTwo(4,9)

In [107]:
## scoping: where R loops up values for symbol
## by default 1) variables created within function are destroyed after function is run
##            2) functions first look for variable within its environment
##                 then in calling environment


rm(a) ## remove a so not found

## a is not defined in function, 
## so will look outside function for value
## this is not good programming practice
f <- function(){
    print(a)
    return(10)
}
f()




ERROR: Error in print(a): object 'a' not found


In [108]:
a <- 13
f()

[1] 13


In [109]:
## a is not defined in function, 
## so will look outside function for value
## this is not good programming practice
f <- function(){
    a <- 6
    print(a)
    return(10)
}
rm(a)
f()

[1] 6


In [110]:
## a changed within function, does not change our a
## this is good
a <- 4
f()
a

[1] 6


In [114]:
## a is not defined in function, 
## so will look outside function for value
## this is not good programming practice
f <- function(){
    a <<- 6 ## changes a outside function, ALMOST ALWAYS A BAD IDEA, similar to python global
    print(a)
    return(10)
}


In [115]:
a <- 4
a
b <- f()
a

[1] 6
