add ID mapping instructions

This commit is contained in:
hyginn 2017-10-09 17:32:07 -04:00
parent 32a45fc059
commit 0c069a66ca

View File

@ -3,11 +3,12 @@
# Purpose: A Bioinformatics Course: # Purpose: A Bioinformatics Course:
# R code accompanying the BIN-Storing_data unit # R code accompanying the BIN-Storing_data unit
# #
# Version: 1.0 # Version: 1.1
# #
# Date: 2017 09 23 # Date: 2017 10 08
# Author: Boris Steipe (boris.steipe@utoronto.ca) # Author: Boris Steipe (boris.steipe@utoronto.ca)
# #
# V 1.1 Add instructions to retrieve UniProt ID from ID mapping service.
# V 1.0 First live version, complete rebuilt. Now using JSON data sources. # V 1.0 First live version, complete rebuilt. Now using JSON data sources.
# V 0.1 First code copied from BCH441_A03_makeYFOlist.R # V 0.1 First code copied from BCH441_A03_makeYFOlist.R
# #
@ -23,33 +24,35 @@
# going on. That's not how it works ... # going on. That's not how it works ...
# #
# ============================================================================== # ==============================================================================
#TOC> ========================================================================== #TOC> ==========================================================================
#TOC> #TOC>
#TOC> Section Title Line #TOC> Section Title Line
#TOC> ------------------------------------------------------------ #TOC> ------------------------------------------------------------
#TOC> 1 A Relational Datamodel in R: review 55 #TOC> 1 A Relational Datamodel in R: review 58
#TOC> 1.1 Building a sample database structure 95 #TOC> 1.1 Building a sample database structure 98
#TOC> 1.1.1 completing the database 206 #TOC> 1.1.1 completing the database 209
#TOC> 1.2 Querying the database 241 #TOC> 1.2 Querying the database 244
#TOC> 1.3 Task: submit for credit (part 1/2) 270 #TOC> 1.3 Task: submit for credit (part 1/2) 273
#TOC> 2 Implementing the protein datamodel 282 #TOC> 2 Implementing the protein datamodel 285
#TOC> 2.1 JSON formatted source data 308 #TOC> 2.1 JSON formatted source data 311
#TOC> 2.2 "Sanitizing" sequence data 343 #TOC> 2.2 "Sanitizing" sequence data 346
#TOC> 2.3 Create a protein table for our data model 363 #TOC> 2.3 Create a protein table for our data model 366
#TOC> 2.3.1 Initialize the database 365 #TOC> 2.3.1 Initialize the database 368
#TOC> 2.3.2 Add data 377 #TOC> 2.3.2 Add data 380
#TOC> 2.4 Complete the database 397 #TOC> 2.4 Complete the database 400
#TOC> 2.4.1 Examples of navigating the database 424 #TOC> 2.4.1 Examples of navigating the database 427
#TOC> 2.5 Updating the database 456 #TOC> 2.5 Updating the database 459
#TOC> 3 Add your own data 468 #TOC> 3 Add your own data 471
#TOC> 3.1 Find a protein 476 #TOC> 3.1 Find a protein 479
#TOC> 3.2 Put the information into JSON files 505 #TOC> 3.2 Put the information into JSON files 508
#TOC> 3.3 Create an R script to create the database 522 #TOC> 3.3 Create an R script to create the database 531
#TOC> 3.3.1 Check and validate 542 #TOC> 3.3.1 Check and validate 551
#TOC> 3.4 Task: submit for credit (part 2/2) 583 #TOC> 3.4 Task: submit for credit (part 2/2) 592
#TOC> #TOC>
#TOC> ========================================================================== #TOC> ==========================================================================
# = 1 A Relational Datamodel in R: review ================================= # = 1 A Relational Datamodel in R: review =================================
@ -203,7 +206,7 @@ str(philDB)
# go back, re-read, play with it, and ask for help. This is essential. # go back, re-read, play with it, and ask for help. This is essential.
# === 1.1.1 completing the database # === 1.1.1 completing the database
# Next I'll add one more person, and create the other two tables: # Next I'll add one more person, and create the other two tables:
@ -362,7 +365,7 @@ dbSanitizeSequence(x)
# == 2.3 Create a protein table for our data model ========================= # == 2.3 Create a protein table for our data model =========================
# === 2.3.1 Initialize the database # === 2.3.1 Initialize the database
# The function dbInit contains all the code to return a list of empty # The function dbInit contains all the code to return a list of empty
@ -374,7 +377,7 @@ myDB <- dbInit()
str(myDB) str(myDB)
# === 2.3.2 Add data # === 2.3.2 Add data
# fromJSON() returns a dataframe that we can readily process to add data # fromJSON() returns a dataframe that we can readily process to add data
@ -421,7 +424,7 @@ source("./scripts/ABC-createRefDB.R")
str(myDB) str(myDB)
# === 2.4.1 Examples of navigating the database # === 2.4.1 Examples of navigating the database
# You can look at the contents of the tables in the usual way we access # You can look at the contents of the tables in the usual way we access
@ -512,6 +515,12 @@ myDB$taxonomy$species[sel]
# "name" of your protein. Open the file in the RStudio editor and replace # "name" of your protein. Open the file in the RStudio editor and replace
# all of the MBP1_SACCE data with the corresponding data of your protein. # all of the MBP1_SACCE data with the corresponding data of your protein.
# #
# The UniProt ID may not be discoverable from the NCBI page. To retrieve
# it, navigate to http://www.uniprot.org/mapping/ , paste your RefSeq ID
# into the query field, make sure "RefSeqProtein" is selected for "From"
# and "UniProtKB" is selected for "To", and click "Go". In case this does
# not retrieve a single UniProt ID, contact me.
#
# - Do a similar thing for the MYSPE taxonomy entry. Copy # - Do a similar thing for the MYSPE taxonomy entry. Copy
# "./data/refTaxonomy.json" and make a new file named "MYSPEtaxonomy.json". # "./data/refTaxonomy.json" and make a new file named "MYSPEtaxonomy.json".
# Create a valid JSON file with only one single entry - that of MYSPE. # Create a valid JSON file with only one single entry - that of MYSPE.
@ -539,7 +548,7 @@ myDB$taxonomy$species[sel]
# in any of the JSON files. Later you will add more information ... # in any of the JSON files. Later you will add more information ...
# === 3.3.1 Check and validate # === 3.3.1 Check and validate
# Is your protein named according to the pattern "MBP1_MYSPE"? It should be. # Is your protein named according to the pattern "MBP1_MYSPE"? It should be.