AlphaFold2 Database:
In the exciting world of computational biology, AF2 stands out as an unparalleled achievement, utilizing attention mechanisms rather than traditional convolutional neural networks to predict protein structure directly from its sequence with unmatched precision. Designed by a team of AI experts from Google Deep Mind for the CASP 14 (often regarded as a competition focused on predicting protein structures). AF2 surpassed its predecessor, AlphaFold 1, in every aspect with a 92.4 GDT (The Global Distance Test is a widely recognized method for evaluating the accuracy of predicted protein structures). In July 2021, DeepMind collaborated with EMBL-EL to unveil the AlphaFold protein structure database. With this database, the scientific community now has access to a 200 million protein structure predicted by AlphaFold 2, including human proteomes and 47 proteomes of other organisms. In case the required protein structures are not available in the database then you can seamlessly generate your predictions using the publicly available open-source code or DeepMind's Colab notebook.
Applications of AlphaFold2:
Drug development, protein engineering, and bridging the gap between protein structures and sequences are just a few of the many vital areas in which protein structure prediction by AlphaFold2 finds use.
Drug Development Process:
The drug development process involves several key steps. First, potential drug targets are identified through a subtractive genomic approach, this process of target identification is also known as virtual screening. Next, the protein structures of the potential targets are predicted. This is followed by molecular docking, where various small molecules are tested for their ability to bind to the target proteins. Finally, molecular dynamic simulations are conducted to study the stability and behavior of these protein-ligand interactions over time.
How to get protein structure from the AlphaFold2 Database:
1. Go to AlphaFold Database.
2. On the layout page, your query entry should be in the form of UniProt accession, Sequence, gene, protein, or organism name.
3. Enter your query sequence and AlphaFold will give out a 3D structure of your protein query sequence with model confidence scores for different regions of the protein and a predicted align error (PAE) plot after clicking on the protein name or accession number.
4. Download your structure by clicking on your desired file type download option (PDB).
5. PAE plot provides insight into the precision of distance predictions within a protein structure, rather than depicting residue interactions or spatial proximity as inter-residue distance or contact maps do. In the PAE plot, Each pixel has a residue on the vertical axis and horizontal axis. The dark green box indicates a domain, if there are 2 green boxes then there are 2 domains in your protein structure. A dark green tile corresponds to a good prediction (low error), whereas a light green tile indicates poor prediction (high error).
Figure: IL6 cytokine Protein structure from alphafold 2 data base with PAE (predicted aligned error) plot. By dragging on PAE plot a specific region in the protein structure will be highlighted.
But what if your desired sequence’s structure is not present in the AlphaFold2 Database?
So, the resolution to this problem is very simple, DeepMind's Colab notebook open-source code https://colab.research.google.com/github/sokrypton/ColabFold/blob/main/AlphaFold2.ipynb
1. Enter your query sequence.
2. Go to runtime click on change runtime type and select ‘GPU’.
3. After this select the Run All option in the runtime menu. After some time, your protein structure will be generated.
4. Running time depends on the length of the sequence.
Genetic Mutations:
1. To detect changes in protein structure due to mutations, use ChimeraX and AlphaFold 2.
2. Download the wild-type protein structure from the AlphaFold2 database or RCSB PDB or predict the structure by employing Google Colab open-source code.
3. Predict the structure for your query sequence using DeepMind's Colab notebook.
4. Add both structures to ChimeraX and use the 'matchmaker #1 to #2' command to align them. This will help you to interpret the changes in your structure.
Figure: Chimera X is employed to analyze the differences in the structure of IL6 protein in Ovis aries (Sheep) and Homo sapiens (Human). Protein structures were downloaded from alpha fold 2 database.
AlphaFold3:
Beyond just predicting the structures of individual proteins, the most accurate tool available for predicting protein interactions with other biological molecules is Google DeepMind's AlphaFold Server in collaboration with isomorphic labs. Scientists from all across the world can use this free platform for non-commercial research. Biologists can model complicated structures containing proteins, DNA, RNA, and different ligands, ions, and chemical changes with ease by using AlphaFold 3. This is highly significant because proteins usually work in collaboration with other molecules within the cell.
How does AlphaFold3 work?
- Visit AlphaFold Server https://alphafoldserver.com.
- Using AlphaFold: One major advantage of AlphaFold 3 is its new web server with a user-friendly interface that doesn't require any coding. Anyone with a Google account can enter a protein sequence in Fasta format or a nucleic acid sequence to predict the structures of complexes it might form with other molecules.
- Adding Post-Translational Modifications (PTMs): To add PTMs, click on the three dots next to the protein sequence input bar, then select "PTMs." A new window will appear. Click the dropdown arrow, choose your specific PTM for the appropriate amino acid, and enter the modification sequence. Be cautious, as once PTMs are added and saved, the sequence cannot be edited.
- Adding Other Entities: Click on "Add Entity" to include sequences for DNA, RNA, ligands, or ions as needed.
- Preview and Submit Job: Click on "Continue and preview job”. Wait for the predictions, which will vary in time depending on the sequence length.
Retrieving Results: Finally, click on the three dots and select "open Results" to view the predictions. You can download predictions by clicking on download.
Figure: After job completion click on three dots to view your results.
Figure: structure of IL-6 cytokine with its ligand and Zn ions, when present inside the cell.
Conclusion: AlphaFold 2 excels in highly accurate protein structure prediction, while AlphaFold 3 advances in predicting structures and their interactions with other molecules. Open-source code is available for AlphaFold2 but not for AlphaFold3 yet. Both tools are invaluable in drug and vaccine design, protein engineering, agriculture, and enhancing our understanding of cellular processes.