bch441-work-abc-units/RPR-Pipe.R

# tocID <- "RPR-Pipe.R"
#
# Purpose:  A Bioinformatics Course:
#              Discussing pipe operators.
#
# Version:  1.0
#
# Date:     2021  10
# Author:   Boris Steipe (boris.steipe@utoronto.ca)
#
# Versions:
#           1.0    New code
#
#
# TODO:
#   - find more interesting examples
#
# == DO NOT SIMPLY  source()  THIS FILE! =======================================
#
# If there are portions you don't understand, use R's help system, Google for an
# answer, or ask your instructor. Don't continue if you don't understand what's
# going on. That's not how it works ...
#
# ==============================================================================


#TOC> ==========================================================================
#TOC>
#TOC>   Section  Title                            Line
#TOC> ------------------------------------------------
#TOC>   1        Pipe  Concept                      41
#TOC>   2        Nested Expression                  73
#TOC>   3        magrittr:: Pipe                    78
#TOC>   4        Base R Pipe                        93
#TOC>   5        Intermediate Assignment           108
#TOC>   6        Postscript                        127
#TOC>
#TOC> ==========================================================================


# =    1  Pipe  Concept  =======================================================

# Pipes are actually an awesome idea for any code that implements a workflow -
# a sequence of operations, each of which transforms data in a specialized way.
#
# This principle is familiar from maths: chained functions. If have a function
# y = f(x) and want to use those results as in z = g(y), I can just write
# z = g(f(x))
#
# On the unix command line, pipes were used from the very beginning, implemented
# with the "|" pipe character.
#
# In R, the magrittr package provided the %>% operator, and recently the |>
# operator has been introduced into base R.
#
# However there are alternatives: intermediate assignment, and nested functions
# that have always existed in base R anyway.
#
# Let us look at an example. In writing this, I found out that virtually
# ALL non-trivial examples I came up with don't translate well into this idiom
# at all. It is actually quite limited to simple filtering operations on
# data. A more interesting example might be added in the future, let me know if
# you have a good idea.
#
# A somewhat contrived example is to sort a list of files by the
# length of the file names:

myFiles <- list.files(pattern = "\\.R$")

# nchar() gives the number of characters in a string, order() produces indices
# that map an array to its sorted form.
#
# =    2  Nested Expression  ===================================================

myFiles[order(nchar(myFiles))]


# =    3  magrittr:: Pipe  =====================================================

if (! requireNamespace("magrittr", quietly = TRUE)) {
  install.packages("magrittr")
}
# Package information:
#  library(help = magrittr)       # basic information
#  browseVignettes("magrittr")    # available vignettes
#  data(package = "magrittr")     # available datasets


library(magrittr)

myFiles  %>% nchar %>% order %>% myFiles[.]

# =    4  Base R Pipe  =========================================================

# Since version 4.1, base R now supports a pipe operator without the need
# to load a special package. Such an introductions of external functionality
# into the language is very rare.
#
# Unfortunately it won't (yet) work with the '[' function, so we need to write
# an intermediate function for this example
extract <- function(x, v) {
  return(v[x])
}

myFiles |> nchar() |> order() |> extract(myFiles)


# =    5  Intermediate Assignment  =============================================

# So what's the problem? As you can see, the piped code may be concise and
# expressive. But there is also a large amount of implicit assignment and
# processing going on and that is usually a bad idea because it makes code hard
# to maintain. I am NOT a big fan of the nested syntax, but I don't think that
# replacing it with the pipe makes things much better. My preferred idiom is
# to use intermediate assignments. Only then is it convenient to examine
# the code step by step and validate every single step. And that is the most
# important objective at all: no code is good if it does not compute
# correctly.


x <- nchar(myFiles)
x <- order(x)
myFiles[x]


# =    6  Postscript  ==========================================================

# I tried to write an example that strips all comments from a list of files, and
# another example that finds all files that were not yet updated this year
# (according to the "# Date: in the header). Neither examples can be well
# written without intermediate assignments, or at least sapply() functions
# that are not simpler at all than the intermediate assignment.

# [END]