Bioinformatics I: December 2016

Monday, December 19, 2016

Setting up Biopython

http://www.codesdope.com/python-introduction
Open IDLE
When running a new file and trying to add it to the old code (if you have a new code that you are messing up, but you don't want to screw up your old code with all this clutter of failures, you create a new file and then

In python you have color code schemes
Keywords (orange):
'False', 'None', 'True', 'and', 'as', 'assert', 'break', 'class', 'continue', 'def', 'del', 'elif', 'else', 'except', 'finally', 'for', 'from', 'global', 'if', 'import', 'in', 'is', 'lambda', 'nonlocal', 'not', 'or', 'pass', 'raise', 'return', 'try', 'while', 'with', 'yield'

Outputs (blue):
Anything you write after print

This will be the output:

In order to have a command executed to create an output you need the >>> you cannot have anything run if you just run it like this

You can write a new code here in a new file of IDLE, have it not run bc it has no arrows, then while in the new tab you click on run, up at the top there, then click on run module, it will ask you to save it, save it. Then it will run it in the main IDLE tab that runs all your instructions.

To make a comment you need a # an then a comment, it will appear in red and it will not affect your code, it's just instructions for yourself.

You can define variables and then manipulate them with commands making them add, subtract, divide or multiply.

There are certain words that you cannot make variables, these are keywords and they appear in orange. Each one has a different function in python.

You can get the code to identify what you are inputting (I don't understand the use of this yet). If you write, type and then a word, it'll identify it as string or str, if you write a number it'll say integer or int, if you write a decimaled number it will write float.

Issues: because the website I am using to learn to code is made for python 2 and not python 3, some of the commands and syntax and just general rules for coding have been changed. So as I try to complete the excercises the website gives me, I need to modify the code and it takes many trial and errors before I manage to get it to complete the function I want it to. What I am currently trying to figure out is how to solve these two function because I've been working on getting them to work for the past hour and reading articles about the changes from 2 to 3 and I can't figure it out.

https://docs.python.org/3.0/whatsnew/3.0.html

What has changed in python from 2 to 3.

Help for beginner functions (colors) http://www.annedawson.net/Python3_Intro.htm
http://www.annedawson.net/Python3Programs.txt (example programs)

Sunday, December 11, 2016

Progress report 3

https://docs.google.com/presentation/d/1dQfjkiDvQALoH287uLhs4TDPdfTRDmXyto4qvGHUdmk/edit#slide=id.g1a0be35812_0_23

Sunday, December 4, 2016

Including Protein Structure and Docking sites in Website (RaptorX)

RaptorX is a free downloadable program which predicts the 3D protein tertiary model as well as possible binding sites. It also allows you to compare 2 or more protien structures through protein structure alignment. You send in your sequence (job) and then they will send you back "its secondary and tertiary structures as well as contact map, solvent accessibility, disordered regions and binding sites." In order to download the software you must be affiliated with an organization. Since I am not, after I have designed my website I need to "please send us your name, organization and your email address through contact us and we will do some internal setup so that you can download the software."
Official Website:
http://raptorx.uchicago.edu/
User guide:
https://www.oregon.gov/OMD/OEM/docs/plan_train/RAPTOR/RAPTOR_User_Guide.pdf
Protein structure, modeling and applications:
https://www.ncbi.nlm.nih.gov/books/NBK6824/

ProDy:
https://pypi.python.org/pypi/ProDy/1.8.2

Thursday, December 1, 2016

Matlab Bioinformatics

The MatLab bioinformatics is a toolbox which includes modules which allow you to read information from FASTA, SAM, CEL, CDF files. You can also access information GenBank and NCBI Gene Expression Omnibus. The program can open the files and then present the data found within the files as visuals ( sequence browsers, spatial heatmaps, and clustergrams). It also has "statistical techniques for detecting peaks, imputing values for missing data, and selecting features"

After reviewing the overview of the bioinformatics program on MatLab, I have come to the conclusion that it would be no help to me. The primary purpose of the program is to allow scientists to input their data, code for how they'd like the program to organize the data and then analyze the data. This would help the scientist come to conclusions about the data faster than had they just looked at completely unorganized data. Also helps them detect trends. I am not trying to analyze data at this point. I am trying to collect all the proteins functions and place them in one place. Later on, once I have designed the website I could use Matlab to analyze all the data I've collected and come to a conclusion myself. Scientists would ideally go through and take all the data I've compiled and then organize it themselves and come to a conclusion themselves. I am just the mediator, I allow them to access all the data in one place and then they analyze it.

What I take from this: I need to include the protein sequence in a FASTA file so that if researchers do what to put my data into MatLab and analyze the trends they can.

Process for strain name retrieval

1. Go onto Science direct
2. Click on the button for Bronx Science proxy
3. Search subspecies name borrelia... in this order

Borrelia burgdorferi

Borrelia garinii

Borrelia afzelii

Borrelia bavariensis

Borrelia valaisiana

Borrelia lusitaniae

Borrelia filandensis

Borrelia bissettii

Borrelia spielmanii

Borrelia carolinensis

Borrelia kurtenbachii

Borrelia andersonii

Borrelia americana

Borrelia turdi

Borrelia yangtze

Borrelia japonica

Borrelia chilensis

Borrelia parkeri

Borrelia duttonii

Borrelia hermsii

Borrelia turicatae

Borrelia recurrentis

Borrelia crocidurae

Borrelia mayonii

Borrelia persica

4. Click on every link that has both keywords in it (Borrelia+subspecies name)

5. Go to the page of the link and see if it is the full article (it should be since I have full access to the database)

6. Scroll down and look for the chart or pictures where it lists every strain used in the study. It will list both the strain and the subspecies (genospecies) the strain is apart of.
7. In order to identify whether the chart holds strain, look for key words like strain name, or species and isolate, or just isolate.
8. Copy any data in the column under the key words, if it split up by species then make sure you record them under the specific species.

7. Go onto my master list of strains and then find the subspecies that the strain you have found is listed under. Scan to see whether you have already collected that strain name.

8. If you have not yet collected that strain, copy it down under the proper subspecies.

9. Do this for every paper that includes the subspecies in it's title when you search it.

10. While going through the papers, if the chart includes a new subspecies that you don't have on your master list, add it to the master list along with any strains that fall under its category.