Every time I want to download several EMBL files (eg. all the bacterial genomes) I spend at least an hour trying to find the right URL syntax. This post is a public note to self that will help me next time and perhaps help others who are also receiving a few lines of HTML when all they want is a verdammt plain-text EMBL formatted file.
There is actual documentation on the right syntax here, which again takes a while to find, searching for wget, curl, EMBL and various related combinations doesn't get you there quickly. However, the main issue I have is, if I go to the recommended sequence record eg. here, none of the links work with a simple "wget URL" or "curl -G URL".
So, if I want to fetch the Roseobacter denitrificans genome sequence with EMBL accession CP000362. I use:
wget http://www.ebi.ac.uk/Tools/dbfetch/dbfetch/embl/CP000362
or if you're into curl:
curl -G http://www.ebi.ac.uk/Tools/dbfetch/dbfetch/embl/CP000362 > CP000362.embl
Simple!
I came across the same problem as I needed to download 50+ shotgun sequences of the same organism. I just googled "download embl wget" and google sent me here! Small world. Hope you are having fun back home!
ReplyDeleteMiao (former rotation student)
Hi Miao, great to hear from you! I have to keep looking this up too.
ReplyDeleteis there a reason the accessions don't match?
ReplyDeleteTo annoy the OCD? Most likely due to my incompetence.
DeleteFixed now. Unless I screwed something else up. Thnx.
Delete