# 3 Deep dive

Many complex queries will described in R and SQL syntax. We will examine R feature computing on the language. Check how lazy evaluation allows to make optimizations. Write C code in your R package using OpenMP for parallel processing. We check how AutoML feature of `h2o` can just look for best models/ensambles.

## 3.1 R language

### 3.1.1 Computing on the language

#### 3.1.1.1 building a call

``````fun_name = function(x, y, ...) list(x, y, ...)
var1_name = 1
var2_name = 2
l = list(
as.name("fun_name"),
as.name("var1_name"),
as.name("var2_name")
)
as.call(l)
as.call(setNames(l, c("ignored in as.call","x","y")))
as.call(setNames(l, c("","y","x")))
eval(as.call(l))
eval(as.call(setNames(l, c("","y","x"))))``````

### 3.1.2 Lazy evaluation

``````x = 10:1
y = rnorm(10)
plot(x, y)
asd = y
plot(x, asd)``````

### 3.1.3 Tips

#### 3.1.3.1 Assignment `<-` vs `=`

There is no strict rule to use one of those over the other one

``````x <- 1
x = 1
f <- function() 1
f = function() 1``````

There is not difference in above case if you use `<-` or `=` but there are cases where it is important to distinguish use of those.

##### 3.1.3.1.1 assigning names to values

When assigning names to values we have to use `=` sign:

``````c(a=1, b=2)
c(a<-1, b<-2)
data.frame(a=1, b=2)
data.frame(a<-1, b<-2)``````
##### 3.1.3.1.2 passing arguments to function

When passing arguments to functions by their name you must have use `=`

``````f <- function(x, y) list(x=x, y=y)
f(y=1, x=2)
f(y<-1, x<-2)``````

When it actually may be useful. Consider functions:

``````f <- function(x, y, z) {
if (isTRUE(x)) {
cat("f doing branch 1\n")
list(y, z)
} else {
cat("f doing branch 2\n")
invisible(FALSE)
}
}
g <- function(x) {
cat("g doing heavy computation\n")
Sys.sleep(5)
x
}
h <- function(x) x+1``````

and we want to calculate

``f(x=TRUE, y=v<-g(1), z=h(v))``

many people would advocate to write it as

``````v = g(1)
f(x=TRUE, y=v, z=h(v))``````

which is quite reasonable but it is not taking advantage of the language feature lazy evaluation.

Using first call

``f(x=TRUE, y=v<-g(1), z=h(v))``

R language feature lazy evaluation, makes function to evaluate arguments when they are actually used inside function. First argument `x=FALSE` is a switch to exit function faster and not evaluate time consuming `y=v<-g(1)`. We can still achieve same functionality by wrapping `f` into another function to handle that.

``````ff <- function(x, val) {
if (isTRUE(x)) {
v = g(val)
z = h(v)
} else {
v = NULL
z = NULL
}
f(x, v, z)
}
ff(TRUE, 1)
ff(FALSE, 1)``````

Keep in mind that above example is simplified.

One of the main tasks for data scientist/analyst is ability to investigate data. In this chapter I am going to cover advanced queries that can be run on datasets based on their SQL syntax. SQL language is standard for quering data for many years already.

## 3.3 Machine Learning

automl

put model into production

## 3.4 Package development

### 3.4.2 src

compiled code, most commonly C or C++

init.c?

#### 3.4.2.4 gcc options

`-O0` vs `-O3`

#### 3.4.2.5 debugging C code

gdb valgrind

##### 3.4.2.6.1`#pragma omp parallel for`
memory allocation (pre-allocate memory and use C pointers from within parallel region) `REAL()` `allocVector()`