basic programming and bioinformatics

1,743

Indeed the question is broad and quite hard to answer I think. I'll give a try. I very welcome editing to improve this answer.

The field of bioinformatics is a big field. Bioinformaticians need basic knowledge in

  • biology
  • molecular genetics
  • Population genetics
  • programming
  • statistics

You may find courses on statistics applied to bioinformatics here (R-language) and here (I haven't watched these sources).

How to start programming? - Python

You seem to be mostly interested in is programming. I think that Python is a very good start to get in touch with programming. Programming might look a bit scary when you don't really know what it is about but you can easily, in a few days, acquire basic knowledge in this field and already solve some pretty neat problems. Many people actually have lots of fun learning how to program. And you'll probably be amazed by all the power this tool will offer to you. I personally really enjoyed learning to programming in Python. I did it (I was mostly interested in object-oriented programming, you'll learn what it means) in a day or two with a very good source but unfortunately, this source is not available in English. But there are tons of introductory documents, you'll have no difficulty to find a good one. I'd counsel you to directly download Python and to look at online courses on khan academy or EdX (I haven't watched them).

Data analysis - R

While Python is very popular, I think that, as a biologist, it is very important that you know about R. R is a programming language which is slow (compare to Python, C, Java, …) but it is very useful for statistical analysis and visual display of data. Also, many people use R in bioinformatics (for phylogenetic analysis typically). I think that acquiring basic knowledge in R takes more time than in Python because we tend to use R because of its huge amount of already existing functions and therefore, we have to learn many of these functions before understanding that R can indeed be much more useful than Python for some tasks.

Command line - Shell script

Shell script (BASH for example) is a very specific and very important language too. Very useful for manipulating, transferring files, managing processes or pretty much anything that is happening on your computer.

Other

C and C++ are very fast and very much used as well. Perl is commonly used for genomic sequence analysis (although Perl is slowly losing users to the advantage of Python).

Usefulness of programming

You also ask about the usefulness of programming. Well, it is used in pretty much all areas of biology. It is used for analyzing empirical data, computer simulations in population genetics, graph theory, annotating DNA sequences, … I guess that 98% of biologists have at least some basic knowledge in programming. The main point about programming is that it performs calculation much faster than anything you could ever realize with your calculator. Typically, in bioinformatics, analysis of DNA sequences often asks for very intense calculation and asks for big computation power. Processes such as constructing phylogenetic trees, determining a goodness of fit of evolutionary models, annotating DNA, aligning DNA sequences, analyzing microarray and many other things are all sorts of tasks that require programming.

Share:
1,743

Related videos on Youtube

golgicik
Author by

golgicik

Updated on August 07, 2022

Comments

  • golgicik
    golgicik less than a minute

    As a molecular biology graduate student I have decided to learn some basic programming and bioinformatics since everybody says that it is crucial. For example, what would you learn if you need to work with RNA-Seq data, compare and interpret them?

    Thanks!

    • kmm
      kmm over 8 years
      I think that you should try to be more specific with your question. Right now it is too broad (you can do an entire degree in bioinformatics). What do you do now? What do you want to be able to do?
    • golgicik
      golgicik over 8 years
      I know its too broad and thats why I am asking actually. Now I am a masters student in neurosciences and will continue with a PhD in the same area. A post-doc from my lab was saying to me that I need to be able to understand basic stuff about programming and bioinformatics. Lets say I am going to compare RNA-seq results and interpret them.
    • kmm
      kmm over 8 years
      Then the question might be better suited for academia.SE or biostars. I think you're going to get too many discussion-type answers here.
    • shigeta
      shigeta over 8 years
      The coursera classes are a good introduction - one is starting next week! the focus on python a lot I think...
    • golgicik
      golgicik over 8 years
      Thanks! Is it the one that you mentioned?: coursera.org/course/pkubioinfo
    • shigeta
      shigeta over 8 years
      I was thinking of this one, coursera.org/course/bioinfomethods2 but that one looks good too. If you are a PhD student you should learn computation closely related to your research - that will motivate you. either R or better yet python as tools.
  • golgicik
    golgicik over 8 years
    This is the answer I was looking for actually. Thanks a lot!
  • terdon
    terdon over 8 years
    I'm not at all sure that R is slower than python. Both are interpreted scripting languages. C and Java are completely different.
  • MattDMo
    MattDMo over 8 years
    What you need to remember is that R is more of a domain-specific language (statistics and related fields like visualization), while Python is more of a "generic" programming language like the C superfamily, Java, Ruby, etc. What sets Python apart is its comparative ease of learning and use, the "batteries included" philosophy of the standard library, and the huge number of 3rd-party modules available for everything from bioinformatics (Biopython, etc.) to visualization (matplotlib) to numerical analysis (numpy/scipy) to web frameworks to natural language analysis and more...
  • MattDMo
    MattDMo over 8 years
    Python also has interfaces to other languages (R, C/C++, Fortran, Java, etc.) so you can do domain-specific work in the best language for that project, and use Python as the "glue" to piece it all together.
  • Faheem Mitha
    Faheem Mitha over 8 years
    @terdon R is definitely slower than Python, though not dramatically.
  • AMR
    AMR over 6 years
    Why are you making a trivial edit to a question that was closed almost two years ago?
  • Remi.b
    Remi.b over 6 years
    I did not realize it was closed when I edited it. I often improve my past posts. Why are you asking?
  • Remi.b
    Remi.b over 6 years
    ...because it comes back to the view of everybody and waste your time probably....mmhh ok. Well sorry about that.