2014-04-02

sed to edit annotation line of NCBI RefSeq fasta file

cat test.fa
 >gi|254939587|ref|NG_012567.1| Homo sapiens NADPH oxidase 1 (NOX1), RefSeqGene on chromosome X
ATTCTGTGATCACCAGCTTATCAAAAGACTTCCTAGTACTCTGATATTGGGAATGGGGGTCCTACCTCACAGACATAAGG
GTCCAATCAGCATGGCATATATAATTCTTTAGATAATACATAAATTGTCATCCAGATTATAGATCATTCTTTTATGAATC
ACAGGATCTCAATGTTGGAGTATATTTAAGGGACATTTAGTTAACCATCTACCTGGTGCTGATATTCCCCTTATAAAAAG
CTGACAAGGGGTTGTCCATTTTTCCTTGAGAGTCTCAAGTAATAGGAAACTCATTACCTCTTTACTTCTCTGAATACCCG
TGTTGGAAAATTCTGCTTTATATTGAAAAAAAATTGTGTTACTTTATTTATTTTTATTTTTATTTTTTGACACAGAATCT
TATTCTTTCGCCCAGTCTGGAGTGCAGTGGCGTGACCTTGGCTCACCACAACCCTCGCCTCCTGGATTCAAGCGATTCAA
GTGCCTCAGCCTCCCGAGTAGCTGGGACTACAGGCATGCACCACCATGCCAGGCTAATTTTTGTATTTTTGGTAGAGATG
GGGGTTTGCCATGTTGGCCAGCCAGGCTTGTCTCAAACTCCTGACCCCAGGTGATCTGCCCGCCTCAGCCTCCCAAAGTG
CTGGGATTATAGGCGTGAGCCACCATGCCCAACCTAAAGTGTGTTACTTTAGAGCCTCTATTCCTTGGTTGACTCCTTGG


sed -e'/>/s/^[^|]*\|\(.*\)\|\(.*\)\|.*$/>\2/g'  test.fa |  sed 's/\.[0-9]*//g' > test2.fa
cat test2.fa
>NG_012567
ATTCTGTGATCACCAGCTTATCAAAAGACTTCCTAGTACTCTGATATTGGGAATGGGGGTCCTACCTCACAGACATAAGG
GTCCAATCAGCATGGCATATATAATTCTTTAGATAATACATAAATTGTCATCCAGATTATAGATCATTCTTTTATGAATC
ACAGGATCTCAATGTTGGAGTATATTTAAGGGACATTTAGTTAACCATCTACCTGGTGCTGATATTCCCCTTATAAAAAG
CTGACAAGGGGTTGTCCATTTTTCCTTGAGAGTCTCAAGTAATAGGAAACTCATTACCTCTTTACTTCTCTGAATACCCG
TGTTGGAAAATTCTGCTTTATATTGAAAAAAAATTGTGTTACTTTATTTATTTTTATTTTTATTTTTTGACACAGAATCT
TATTCTTTCGCCCAGTCTGGAGTGCAGTGGCGTGACCTTGGCTCACCACAACCCTCGCCTCCTGGATTCAAGCGATTCAA
GTGCCTCAGCCTCCCGAGTAGCTGGGACTACAGGCATGCACCACCATGCCAGGCTAATTTTTGTATTTTTGGTAGAGATG
GGGGTTTGCCATGTTGGCCAGCCAGGCTTGTCTCAAACTCCTGACCCCAGGTGATCTGCCCGCCTCAGCCTCCCAAAGTG
CTGGGATTATAGGCGTGAGCCACCATGCCCAACCTAAAGTGTGTTACTTTAGAGCCTCTATTCCTTGGTTGACTCCTTGG

No comments:

Post a Comment