Ankita Katdare
Ankita Katdare
Computer Science
25 Aug 2012

Bioinformatics Project Ideas/Topics Collection For Engineering Students

Here is a list of project ideas based on Bioinformatics. Students belonging to third year or final year can use these projects as mini-projects as well as mega-projects. This list has been complied after searching for project ideas across the net.

If you have questions regarding these projects feel free to ask them in the replies below. You may also ask for abstract of a project idea that you have or want to work on.
  1. Predicting Cellular Localization. Eukaryotic cells contain several sub-compartments, the Cellular Localization problem consists of predicting which compartment a protein is most likely to be found, on the basis of sequence information alone. The project may consist of a review of the literature and/or a novel analysis (I have access to a data-set that has never been used in a predictive context).
  2. Regulatory-motifs. Review of the literature on algorithms to automatically determine regulatory motifs (short sequence signals) in DNA sequence data. I have a Java library that can be used to implement a prototype application; see suffix tree below.

  3. SNP (Single Nucleotide Polymorphism). Review the literature of the methods for detecting SNPs, as well as their application. Single nucleotide polymorphisms (SNPs) are common DNA sequence variations among individuals. They promise to significantly advance our ability to understand and treat human disease. (Excerpt fromsnp.cshl.org). See also Linkage analysis. (S)

  4. Metabolic Pathways. Proteins interact together to perform specific functions. Such network of interaction is called a molecular pathway. There are two main aspects to this field: how to infer/determine the connections and how to simulate cellular processes. There exist several computational approaches to model molecular pathways, including Petri-net.
  5. Molecular -arrays. Todays technology (which borrows from inkjet technology) allows to fix tens of thousands of different macromolecules (DNA or protein molecules) onto a small surface. This technology allows to reveal which macromolecule is expressed, at different times, within different tissues, or different cellular states (disease vs non-disease). In the case of DNA chips, they measure the levels of expression of each gene.

  6. Mass spectrometry (MS). MS produces a spectrum of all the masses of all the compounds that are present in a sample. When an input protein is cut at specific sites, it will produce a specific spectrum. Such technology can now be used to fingerprint the content of a cell.

  7. Expression data + motif discovery. DNA--arrays allows to find genes that are simultaneously expressed. Those genes are most likely co-regulated, i.e. they share a common sequence signal in their promoter region. Daniela Cerna implemented a suffix tree library in Java, in the context of her honours project. Here, we would be re-using the library to help finding conserved motifs.

  8. Expression data + cell localization. Can the use of (predicted and experimental) data on cellular localization help distinguish between true and false positive when expression data is analyzed to find actors and inhibitors?
  9. Genome comparison. Implementing a MUMMER-like algorithm using Danielas suffix tree (Java) library. This involves writing a hybrid algorithm k-bands dynamic programming algorithm + suffix trees.
  10. Genome rearrangements. Genomes are evolving at several scales: from point mutations to large rearrangements. It the late 80s, it became evident that several closely related genomes had genes that were extremely similar (say 99 pid), one to another, but the order of genes along the chromosomes was not preserved. Review and present the main algorithms to compare entire genomes. Topics include: sorting by reversals (Sankoff), break point graph, Hannenhalli and Pevzner algorithm.

  11. Accurate Phylogenetic Reconstructionfrom Gene-Order Data.

  12. Ontologies. What is an ontology? What tools and knowledge representation formalisms (languages) are available to support the development of ontologies. Give examples of ontologies. Expose the problems associated with ontologies.An ontology is a controlled vocabulary (e.g. gene ontology). It allows to resolve some of the problems associated with data integration.

  13. Genome assembly. Because of physical limitations, only relatively short DNA sequences can be read (some 500nt). For processing a complete genome, one approach, called shut-gun, consists of sampling small reads (500 nt) at random location along the chromosomes. The total number of reads is chosen so that the likelihood that each nucleotide is included into more than one read is high (typically each nt is part of 3, 5 or 10 reads). Computers are then used to stitch the reads together. One solution to this problem is related to the shortest super-string problem.

  14. Grammatical frameworks for RNA structure. RNA secondary structure information can be represented using context-free grammars. As with most biological data, the information is better represented within a statistical framework. A Stochastic Context-Free Grammar (SCFG) has probabilities attached to its production rules. The two main issues with SCFGs are the parsing and the induction of the grammar. Review the literature on SCFGs (this includes COVE, infernal and pfold), and build a prototype parser in Java.
  15. Predicting Gene-Gene (Protein-Protein) interactions. There exist a vast number of algorithms that allow to predict if two genes will be interacting. This includes: text-mining, co-location along the chromosomes, phylogenetic footprinting, etc.

  16. Lattice models. Predicting the three-dimensional structure of a protein is a notoriously difficult problem. So much that alternative problems have been devised to circumvent it: secondary structure prediction, inverse folding problem, etc. Some authors have also been studying simpler systems, such as 2D and 3D lattices. Create your own implementation; this includes an algorithm to efficiently search the folding space and a scoring function. Run some simulations.

  17. Structure comparison methods. Review the literature on 3D structure comparison. Implement at least one algorithm. Input: 2 three-dimensional structures, output: a measure of distance (typically root-mean-square deviation expressed in ), and a list of equivalent residues.

  18. Methods for detecting trans-membrane helices. There is class of transmenbrane proteins whose secondary structure can be reliably predicted. Those proteins are mainly made of helices, such that if the loop connecting the helices i and i + iis exposed to the inside of the cell, then the next one will be exposed to the outside of the cell. Use a Hidden Markov Model or Neural Network to reproduce this result.

  19. Secondary Structure Prediction. Implement a secondary structure prediction method and compare its accuracy to known methods. Common choices for your implementation include: Neural Networks, Hidden Markov Models, and possibly decision trees.
  20. Surface/Interior. Implement a algorithm to predict the solvent accessibility. Common choices for your implementation include: Neural Networks, Hidden Markov Models, and possibly decision trees.

  21. Applications of suffix trees. Use Daniela Cerneas suffix tree library and implement some of the following algorithms: linear time algorithm for finding the longest common substring of k strings (interestingly, Knuth had predicted that no linear time algorithm would be found for solving this problem), finding all maximal repetitive substrings in linear time, finding all maximum palindromes, k-mismatch algorithm.

  22. Bio-Ethics. Bioinformatics deals with biological and medical data, according there are numerous related ethical issues: should patenting genes be allowed? how to handle patient data? how to deal with genomic data, imagine that the analysis of a dataset allows to draw conclusions about a population, a religious group, people who live in a specific region, etc. The consequences can be sever: it could be that this group will be more likely to suffer from certain diseases, such information could be used by insurance companies, employers, etc. to screen candidate.
  23. Genome motifs viewer. Construct a flexible graphical using interface to visualize shared motifs. Suggestions: make it 3D to ease viewing multiple strings. Motifs would be extracted from a suffix tree.
  24. Teaching tools: interactive linear time construction of a suffix tree, showing the suffix links, interactive tools for software alignments.
  25. Expectation-Maximization (EM) algorithm and some of its applications in molecular biology. EM is used for training certain Hidden Markov Models, Covariance Models and building phylogenetic trees. What is it? What are the main applications? Prototype implementation. (S)
  26. Gibbs sampling. This technique forms the basis for several motif detection tools. What is it? What are the main applications? Prototype implementation. (S)
  27. Bayesian networks. What are bayesian networks? What is interesting about them? What are the bioinformatics applications of bayesian networks? Carry out a small experiment. (S)

  28. Predicting Phenotype from Patterns of Annotation, -arrays, etc. One of the goals of bioinformatics research is to transform molecular biology into a predictive science. For example, given a certain pattern of gene expression, detected by -arrays for example, what would be the best treatment (personalized medicine)? Survey the literature on the use of bioinformatics techniques to assist medical diagnosis, prognosis and treatments. Where are we heading? When will personalized medicine be true? How much data? Remaining problems to be solved?

  29. Statistics behind BLAST. Good candidate for a multiple teams work, where one team would focus on the statistics of word matching while the other would focus on hashing. Produce a Java implementation of hashing techniques for speeding up the sequence alignment problem. The part on the statistical analysis of hits requires a statistical background (S) but not the algorithmic part.

  30. Constructing phylogenetic trees. Read an overview of the construction of phylogenetic trees using a neighbour-joining approach. For this project, you will produce a prototype implementation, in Java, of a modern method such as: quartet method, maximum likelihood or maximum parsimony. (s)

  31. QSAR. One of the main bioinformatics contributions to drug discovery is the Quantitative Structure Activity Relationship analysis (QSAR); the other is molecular docking. QSAR analyses take as input a set of compounds and their relative activity/efficacy. It then finds the commonalities between those molecules. The commonalities are then used to design new/better drugs.

  32. Molecular docking consists of predicting how two molecules will interact. This can either be two proteins or one protein and a small compound, such as a new drug. The two main factors that are taken into account are the shape and electrostatics of the two molecules.

  33. BioJava is a large collection of classes for solving bioinformatics problems. See www.biojava.org.

  34. Java3D. A protein viewer was developed two years ago in the context of a CSI 4900 project. Extensions of this project could be considered.

  35. Tandem repeats. Review the literature on tandem repeats detection and implement a prototype application. Tandem repeats are repeats of the form n, s.t. 2 <||< 5, in the case of micro-satellites, and each unit, , is degenerated (which implies that the algorithms must allow for mismatches).

  36. Simultaneous alignment and structure prediction for two RNA sequences. Implement a simplified version of dynalign, where the secondary structure prediction is calculated using the Nissinov algorithm; i.e. finds the maximum number of base pairs.

  37. 3 way genome alignments.
Source: Pondicherry University
  • 1. Testing for absence of secondary structure in combinatorial sets of DNA strands.
The purpose of this project is to understand the complexity of the
following question: given s1, s1', sn, sn',
is it the case that all 2^n strands in T are structure-free?

Specifically, is there a way to extend the dynamic programming
structure prediction algorithms to obtain a polynomial time algorithm
for the above question?

2. Combinatorial Design of Universal DNA Tag Systems.
Short DNA tags are used to label molecules in a
chemical library or to anchor DNA molecules to DNA arrays. A
combinatorial approach to design of DNA tags was proposed by Ben-Dor
et al. (RECOMB, 2000), based on the {\em melting temperature} of
strand duplexes. However, the measure of melting temperature used in
the paper is less than ideal; more accurate measures, based on nearest
neighbour calculations, are preferred.

3. A resource bounded theory of DNA self-assembly.
Molecular self-assembly naturally gives rise to
numerous complex forms. Self-assembly of DNA molecules with
programmable interactions can be used to construct structures at a
nano-metre scale.

Rothemund and Winfree propose a formal model for studying
self-assembled objects from the point of view of computational
complexity, and study the complexity of self-assembly of a
square in this model.

4. Phylogenetic classification of organisms based on sequence tags.
Given a (potentially partial) phylogenetic tree, develop strategies
and algorithms for placing sequences in that tree based on knowledge
of a short subsequence (tag) only.

This project is potentially of immediate and significant practical relevance
in the context of a collaboration we have with Bill Mohn and Michael Murphy
from the UBC Dept of Microbiology and Immunology.

5. RNA Secondary Structure Prediction.
Develop and test a (heuristic) algorithm that can predict RNA secondary structures
with pseudoknots. Since the problem is combinatorially hard, we expect that
stochastic local seach techniques might be particularly effective for solving
it algorithmically.

6. Multiple digestion restriction site mapping.
Implement a few known techniques for solving this problem,
try to improve them with modern heuristic techniques, and
do a comparative analysis of their performance on real genomic data.

Via: UBC Computer Science
mpk

mpk

Branch Unspecified
11 Sep 2012
Hello ,
I am pursuing ME in information Technology and wud like to do final project on bioinformatics . can u pls guide me about choosing the topic. I explained my guide about the project as comparison of existing methods of protein secondary structure. but she has cancelled this topic. Pls can u help
Rupam Das

Rupam Das

Branch Unspecified
12 Sep 2012
Look for Genomic signal processing topics. You have plenty of research papers in that direction. Would be helpful.
Ankita Katdare

Ankita Katdare

Computer Science
05 Aug 2015
If anyone has any problems, questions, suggestions about these topics please post them in replies below. We will be glad to help you in the best possible manner.
Akshay Sanap

Akshay Sanap

Branch Unspecified
17 Aug 2015
I am doing BE and i got topic as "Resolving Complexity of Bioinformatic Algorithms using Python". As I am not from Biological Environment I don't know anything about it. Can you plzz explain what basically we need to do ad what is the expectation behind project.
17 Aug 2015
Akshay Sanap
"Resolving Complexity of Bioinformatic Algorithms using Python".
Well, did you search on Google to find out what Bioinformatic Algorithms are? I think your professor chose the topic for you and I think you should discuss with them what the project is all about.
Akshay Sanap

Akshay Sanap

Branch Unspecified
17 Aug 2015
Yup you are right.. but he is the principal and have no time to discuss with us.
I googled it out i got algorithms like greedy search, exhaustive search and all but i am not getting wat to implement exactly
AKINMADE KAYODE SAMUEL

AKINMADE KAYODE SAMUEL

Branch Unspecified
10 Dec 2015
All the project topic listed above are good. As wanting a diversion in my field, because i hold a BS.c in chemical engineering and i really need topic for masters program that will engage me in simulating or modelling with the use of design software relating to bioinformatics.
Ankita Katdare

Ankita Katdare

Computer Science
02 Feb 2016
@AKINMADE KAYODE SAMUEL Here are some ideas:

1. Designing a neural network simulator—the MENS modelling environment for network systems
Designing a neural network simulator—the MENS modelling environment for network systems: I

2. Predict structures of proteins using various methods such as Homology Modelling, Threading and Ab-initio.

3. https://www.rsi.co.jp/kagaku/cs/products/pdf/moe_brochure_rsi.pdf
sadaf ijaz

sadaf ijaz

Branch Unspecified
11 Apr 2016
i am bs bioinformatics student i want to choose good and easy topic above one of them if i select one of them you help me in my project i want to select SNP project but i have no guideline how to start my project
M Menna

M Menna

Branch Unspecified
20 Apr 2016
Hello
What if i want to work on the first project, " Predicting Cellular Localization "?


how can i get the data set? Is it available on the Internet? or you have it like exclusively ?

Thanks,,
VamsiKrishna1414

VamsiKrishna1414

Branch Unspecified
20 May 2016
Hello, Do you have any ideas relating to forensics using Bioinformatics...Any ideas would be really grateful!
Prakshal doshi

Prakshal doshi

Branch Unspecified
04 Aug 2016
Hi, there..i'm a student of b.tech biotechnology final year ...i want to start my mini project on protein structure..but i have no idea how to go ahead on this topic..can you please guide me?
HalimaFarooq

HalimaFarooq

Branch Unspecified
28 Feb 2017
Can you explain more about Molecular-arrays? I want an abstract and project description please.
Areeba Fatima

Areeba Fatima

Branch Unspecified
06 Jan 2018
Could i get an abstract for "Predicting cellular localisation"


Thankyou
umer ali

umer ali

Computer Science
02 Oct 2018

Hi, I'm umer currently trying to find out a better final project for masters in bioinformatics. Can u plz send me extra details related to " Predicting Gene-Gene (Protein-Protein) interactions. There exist a vast number of algorithms that allow to predict if two genes will be interacting. This includes: text-mining, co-location along the chromosomes, phylogenetic footprinting, etc.  " this topic at umerali9000@gmail.com

Dureshahwar Waseem

Dureshahwar Waseem

Computer Science
03 Oct 2018

Hello I am a student of bioinformatics ..I am final year student and I want to select molecular docking as my final year project.. So kindly provide me a dataset of this project and its coding as well ..

Tooba Nadeem

Tooba Nadeem

Computer Science
07 Oct 2018

Hi, I am a student of bioinformatics, final year. I want to do good programming project using Neural Networks and deep learning. Suggest any idea or dataset to work upon. 

Aseel Abu Rajab

Aseel Abu Rajab

Biochemical
28 Jan 2019

can I get help with next-generation sequencing technique topic, to gene related to breast , pancreatic or lung cancer 

Chudi Victor

Chudi Victor

Biomedical
14 Mar 2019

good day! I am an undergraduate and I need project topics on biomedical informatics


Share this content on your social channels -

Only logged in users can reply.