New script - illustrating the pipe
This commit is contained in:
		
							
								
								
									
										135
									
								
								RPR-Pipe.R
									
									
									
									
									
										Normal file
									
								
							
							
						
						
									
										135
									
								
								RPR-Pipe.R
									
									
									
									
									
										Normal file
									
								
							| @@ -0,0 +1,135 @@ | ||||
| # tocID <- "RPR-Pipe.R" | ||||
| # | ||||
| # Purpose:  A Bioinformatics Course: | ||||
| #              Discussing pipe operators. | ||||
| # | ||||
| # Version:  1.0 | ||||
| # | ||||
| # Date:     2021  10 | ||||
| # Author:   Boris Steipe (boris.steipe@utoronto.ca) | ||||
| # | ||||
| # Versions: | ||||
| #           1.0    New code | ||||
| # | ||||
| # | ||||
| # TODO: | ||||
| #   - find more interesting examples | ||||
| # | ||||
| # == DO NOT SIMPLY  source()  THIS FILE! ======================================= | ||||
| # | ||||
| # If there are portions you don't understand, use R's help system, Google for an | ||||
| # answer, or ask your instructor. Don't continue if you don't understand what's | ||||
| # going on. That's not how it works ... | ||||
| # | ||||
| # ============================================================================== | ||||
|  | ||||
|  | ||||
| #TOC> ========================================================================== | ||||
| #TOC> | ||||
| #TOC>   Section  Title                            Line | ||||
| #TOC> ------------------------------------------------ | ||||
| #TOC>   1        Pipe  Concept                      41 | ||||
| #TOC>   2        Nested Expression                  73 | ||||
| #TOC>   3        magrittr:: Pipe                    78 | ||||
| #TOC>   4        Base R Pipe                        93 | ||||
| #TOC>   5        Intermediate Assignment           108 | ||||
| #TOC>   6        Postscript                        127 | ||||
| #TOC> | ||||
| #TOC> ========================================================================== | ||||
|  | ||||
|  | ||||
| # =    1  Pipe  Concept  ======================================================= | ||||
|  | ||||
| # Pipes are actually an awesome idea for any code that implements a workflow - | ||||
| # a sequence of operations, each of which transforms data in a specialized way. | ||||
| # | ||||
| # This principle is familiar from maths: chained functions. If have a function | ||||
| # y = f(x) and want to use those results as in z = g(y), I can just write | ||||
| # z = g(f(x)) | ||||
| # | ||||
| # On the unix command line, pipes were used from the very beginning, implemented | ||||
| # with the "|" pipe character. | ||||
| # | ||||
| # In R, the magrittr package provided the %>% operator, and recently the %|>% | ||||
| # operator has been introduced into base R. | ||||
| # | ||||
| # However there are alternatives: intermediate assignment, and nested functions | ||||
| # that have always existed in bas R anyway. | ||||
| # | ||||
| # Let us look at an example. In writing this, I found out that virtually | ||||
| # ALL non-trivial examples I cvame up with don't translate well into this idiom | ||||
| # at all. It is actually quite limited to simple filtering operations on | ||||
| # data. A more intersting example might be added in the future, let me know if | ||||
| # you have a good idea. | ||||
| # | ||||
| # A somwhat contrived example is to soart a list of files by the | ||||
| # length of the file names: | ||||
|  | ||||
| myFiles <- list.files(pattern = "\\.R$") | ||||
|  | ||||
| # nchar() gives the number of characters in a string, order() produces indices | ||||
| # that map an array to its sorted form. | ||||
| # | ||||
| # =    2  Nested Expression  =================================================== | ||||
|  | ||||
| myFiles[order(nchar(myFiles))] | ||||
|  | ||||
|  | ||||
| # =    3  magrittr:: Pipe  ===================================================== | ||||
|  | ||||
| if (! requireNamespace("magrittr", quietly = TRUE)) { | ||||
|   install.packages("magrittr") | ||||
| } | ||||
| # Package information: | ||||
| #  library(help = magrittr)       # basic information | ||||
| #  browseVignettes("magrittr")    # available vignettes | ||||
| #  data(package = "magrittr")     # available datasets | ||||
|  | ||||
|  | ||||
| library(magrittr) | ||||
|  | ||||
| nchar(myFiles) %>% order %>% myFiles[.] | ||||
|  | ||||
| # =    4  Base R Pipe  ========================================================= | ||||
|  | ||||
| # Since version 4.1, base R now supports a pipe operator without the need | ||||
| # to load a special package. Such an introductions of external functionality | ||||
| # into the language is very rare. | ||||
| # | ||||
| # Unfortunately it won't (yet) work with the '[' function, so we need to write | ||||
| # an intermediate fucntion for this example | ||||
| extract <- function(x, v) { | ||||
|   return(v[x]) | ||||
| } | ||||
|  | ||||
| nchar(myFiles) |> order() |> extract(myFiles) | ||||
|  | ||||
|  | ||||
| # =    5  Intermediate Assignment  ============================================= | ||||
|  | ||||
| # So what's the problem? As you can see, the piped code may be concise and | ||||
| # expressive. But there is also a large amount of implicit assignment and | ||||
| # processing going on and that is usually a bad idea because it makes code hard | ||||
| # to maintain. I am NOT a big fan of the nested syntax, but I don't think that | ||||
| # replacing it with the pipe makes things much better. My preferred idiom is | ||||
| # to use intermediate assignments. Only then is it convenient to examine | ||||
| # the code step by step and validate every single step. And that is the most | ||||
| # important objective at all: no code is good if it doe not compute | ||||
| # correctly. | ||||
|  | ||||
|  | ||||
| x <- nchar(myFiles) | ||||
| x <- order(x) | ||||
| myFiles[x] | ||||
|  | ||||
|  | ||||
|  | ||||
| # =    6  Postscript  ========================================================== | ||||
|  | ||||
| # I tried to write an example that strips all comments from a list of files, and | ||||
| # another example that finds all files that were not yet updated this year | ||||
| # (according to the "# Date: in the header). Neither examples can be well | ||||
| # written without intermediate assignments, or at least sapply() functions | ||||
| # that are not simpler at all than the intermediate assignment. | ||||
|  | ||||
| # [END] | ||||
		Reference in New Issue
	
	Block a user