Scaling data.table using index

Written on 2015-11-23

R can handle fairly big data working on a single machine, 2B (2E9) rows and couple of columns require about 100 GB of memory.
This is already well enough to care about performance.
With this post I'm going discuss scalability...

Read More

Utilize function body inline comments for documentation

Written on 2015-09-18

When writing a long function which has to deal with multiple checks and complex processes, it is valuable to put comments in the function body. This allows readers (including you) to catch the concept of process workflow without going into...

Read More

Accept payments in shiny app

Written on 2015-08-04

Have you ever think about accepting payments in your shiny app?
Probably not, but now you can start ;)
Shiny apps are usually single task, not very heavy websites. It may be not so easy to turn them into online...

Read More

Data Warehousing with R

Written on 2015-06-30

Under this link you can find today's slides from the Cardiff R User Group meeting.
On the slides you may find interesting packages from the Data Warehousing / ETL perspective. Including few examples and a lot of links to...

Read More

Auditing data transformation

Written on 2015-06-03

Auditing data transformation can be simply described as gathering metadata about the transformation process. The most basics metadata would be a timestamp, atomic transformation description, data volume on input, data volume on output, time elapsed.

If you work with...

Read More

R in Business Intelligence

Written on 2015-01-19

Business Intelligence (BI) can be simply described as extracting useful informations from the data. This is quite a broad process as the source data structure (and quality) can vary, as well the useful information structure can vary. More technically process...

Read More

Data anonymization in R

Written on 2014-11-07

Use cases

  • Public reports.
  • Public data sharing, e.g. R packages download logs from CRAN's RStudio mirror - cran-logs.rstudio.com - mask ip addresses.
  • Reports or data sharing for external vendor.
  • Development works can operate on anonymized PRODUCTION data.
    ...
Read More

Hello World

Written on 2014-11-05

Hi,
I've just setup my new blog. Most probably it will be R related, data processing, etc.

Some comment about the technology in this very first post.
Posts on the blog are md files. These can be written by...

Read More