Tuesday, September 25, 2018
Bioinformatics Python Uniprot Protein Sequence Fasta Downloader with Obsolete Check and Custom Range
September 25, 2018
bioinformatics
,
fasta downloader
,
obsolete protein checker
,
protein
,
python
,
uniprot sequence downloader
,
urllib
,
webpage query
Explanation:
This python script allows downloading of protein fasta sequence from Uniprot protein database. The code can be used in two ways, one is pasting protein identifiers directly, another is reading the them from file. The code can be modified for custom range in case there is download problem after a while or sudden disconnection, this will keep the existing download and start downloading from custom range.Some Protein can be obsolete due to tagging problem or researcher removing it. The code provides support for these obsolete proteins by keeping a list and printing them. It will save the data in fastas / mouse folder by default.
Code:
all_proteins = [] """ # Use only if your data is in (ID Position Sequence) Format text_file = open("peptide_data/allfasta.txt", "r") for i in text_file: temp = i.split(' ') all_proteins.append(temp[0]) text_file.close() all_proteins = list(sorted(set(all_proteins))) print(all_proteins) """ # If you use want to this, keep the code above commented vice versa all_proteins = [ 'P62821', 'Q9R1P0', 'P63101', 'Q8CAQ8', 'Q9ET01' ] obsolete_list = [] # query the website and return the html to the variable ‘page’ import urllib.request # Change the range here for custom range of protein query instead of all proteins for i in range(0, len(all_proteins)): query_page = 'https://www.uniprot.org/uniprot/' + all_proteins[i] + '.fasta' print(query_page) try: with urllib.request.urlopen(query_page) as url: page = url.read() fasta_string = page.decode("utf8") print(fasta_string) with open("fastas/mouse/" + all_proteins[i] + ".fasta", "a") as p: p.write(fasta_string) if len(fasta_string) == 0: obsolete_list.append(all_proteins[i]) except: obsolete_list.append(all_proteins[i]) pass print("Now at: ", i) print(list(set(obsolete_list)))
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment