Summary

This document is a tutorial on Python dictionaries and files. It explains the key concepts associated with dicts and how to use them.

Full Transcript

Dicts and Files https://developers.google.com/edu/python/dict-files 1 Dict Hash Table Python's efficient key/value hash table structure is called a "dict". The contents of a dict can be written as a series of key:value pairs within braces...

Dicts and Files https://developers.google.com/edu/python/dict-files 1 Dict Hash Table Python's efficient key/value hash table structure is called a "dict". The contents of a dict can be written as a series of key:value pairs within braces { }, e.g. dict = {key1:value1, key2:value2,... }. The "empty dict" is just an empty pair of curly braces {} 2 Dict (aka hash) Looking up or setting a value in a dict uses square brackets, e.g. dict['foo'] looks up the value under the key 'foo'. Strings, numbers, and tuples work as keys, and any type can be a value. Other types may or may not work correctly as keys (strings and tuples work cleanly since they are immutable). Looking up a value which is not in the dict throws a KeyError -- use "in" to check if the key is in the dict, or use dict.get(key) which returns the value or None if the key is not present or get(key, not-found) allows you to specify what value to return in the not- found case) 3 4 C++ analogy #include #include using namespace std; int main(void) { string keys; keys = "alpha"; keys = "omega"; keys = "gamma"; } 5 Dict/Hash A hash is similar to a fancy array, but instead of needing to know the index of the array element to find your value, you just need to know the key. Lots of examples of convenient key-> value pairs: ATG -> M (codon -> amino acid) terry -> braun (first name -> last name) tbraun -> 8675309 (hawkID -> student ID) etc. 6 Dict/Hash ## Can build up a dict by starting with the the empty dict {} ## and storing key/value pairs into the dict like this: ## dict[key] = value-for-that-key dict = {} dict['a'] = 'alpha' dict['g'] = 'gamma' dict['o'] = 'omega' print (dict) ## {'a': 'alpha', 'o': 'omega', 'g': 'gamma'} print (dict['a']) ## Simple lookup, returns 'alpha' dict['a'] = 6 ## Put new key/value into dict 'a' in dict ## True ## print dict['z'] ## Throws KeyError if 'z' in dict: print dict['z']) ## Avoid KeyError print (dict.get('z')) ## None (instead of KeyError) 7 Removing Items key_to_remove = "c" d = {"a": 1, "b": 2} del d[key_to_remove] # Raises `KeyError: 'c'` #and key_to_remove = "c" d = {"a": 1, "b": 2} d.pop(key_to_remove) # Raises `KeyError: 'c'` 8 Removing Items # Should always check item exists before trying # to remove # to catch exceptions key_to_remove = "c" d = {"a": 1, "b": 2} if key_to_remove in d: del d[key_to_remove] 9 Dicts as N-dimensional arrays You can do N-dimensional (such as 2-D arrays) in python Notation is a little cumbersome to set up (Use of numpy and/or pandas is much better) Can use dicts as simple container: matrix = {} matrix[1,2] =5 # the "tuple" of 1,2 becomes the key 10 What is the order of a dict? A for loop on a dictionary iterates over its keys by default. The keys will appear in an arbitrary order. Key-Value pairs will maintain the order in which they were originally inserted The methods dict.keys() and dict.values() return lists of the keys or values explicitly. There's also an items() which returns a list of (key, value) tuples, which is the most efficient way to examine all the key value data in the dictionary. All of these lists can be passed to the sorted() function. 11 Hash Operation Examples ## By default, iterating over a dict iterates over its keys. ## Note that the keys are in a random order. for key in dict: print (key) ## prints a g o ## Exactly the same as above for key in dict.keys(): print (key) ## Get the.keys() list: print (dict.keys() ) ## ['a', 'o', 'g'] ## Likewise, there's a.values() list of values print (dict.values() ) ## ['alpha', 'omega', 'gamma'] ## Common case -- loop over the keys in sorted order, ## accessing each key/value for key in sorted(dict.keys()): print (key, dict[key]) ##.items() is the dict expressed as (key, value) tuples print (dict.items()) ## [('a', 'alpha'), ('o', 'omega'), ('g', 'gamma')] ## This loop syntax accesses the whole dict by looping ## over the.items() tuple list, accessing one (key, value) ## pair on each iteration. for k, v in dict.items(): print( k, '>', v) ## a > alpha o > omega g > gamma 12 Dict/Hash In general, a Dict/Hash provides a performance advantage Strategy note: from a performance point of view, the dictionary is one of your greatest tools, and you should use it where you can as an easy way to organize data. 13 Incremental Development Building a Python program, don't write the whole thing in one step. Instead identify just a first milestone, e.g. "well the first step is to extract the list of words." Write the code to get to that milestone, and just print your data structures at that point, and then you can do a sys.exit(0) so the program does not run ahead into its not-done parts. Once the milestone code is working, you can work on code for the next milestone. Being able to look at the printout of your variables at one state can help you think about how you need to transform those variables to get to the next state. Rapid prototyping with Python is very quick with this approach, allowing you to make a little change and run the program to see how it works. Take advantage of that quick turnaround to build your program in little steps You would use a similar strategy with AI guidance tools 14 Introduction: Lab 4 - dictionaries For Lab 4, you will create a dictionary of codons and amino acids See lab for full starting code: import random # Input: empty dictionary # Return: populated dictionary def make_aminos(aminos): # this creates an empty dictionary aminos = { } # empty dictionary aminos['AAA'] = 'K' aminos['AAC'] = 'N' aminos['AAG'] = 'K' aminos['AAT'] = 'N' aminos['ACA'] = 'T' : : 15 Dictionary Part 1: You will need to complete the dictionary -- I provide 50-ish? 16 Starting Code def main(): # Note, you may assume seq1, seq2 and seq3 are multiples of 3 (they are). seq1 = "ATGACCCGGGATGCATGGCCAATATAGATAGATAGTGCCCATTTGGGCCCATGCCATGCTAGC" 17 Part 2 ##################### Beginning of "seq1" # This will be the growing amino acid sequence after # translating each codon protein1="" # This will be the growing sequence of new codons, # based on the original translated # amino acid sequence new_seq1="" # TODO: Part 2: # Write a loop (probably a while() loop) that pulls # 3 nucleotides at a time from seq1. Use the codon # and "aminos" dictionary to translate the codon # into the amino acid character. We did string manipulations # in Lab1 # Append the amino acid to the "protein1" string. # Then call pick_a_codon( ) # You will have to pass the "aminos" dictionary, and the 1 character # amino acid string to pick_a_codon( ). # Append the "new codon" to the growing string of "new_seq1" # LOOP GOES HERE. Don't forget to intialize loop variable to 0 # After your loop print("seq1 ",seq1) print("new_seq1 ",new_seq1) print("protein1 ",protein1) 18 Code given in lab4_codon_R3.py # INPUT1: dictionary of amino acids CODON -> AMINO ACID # INPUT2: The single character amino acid (string) for which # to generate a codon (randomly if more than 1) # For example, for amino "H", codon could be "CAC" or "CAT" # "CAC" or "CAT" will be selected at random. # # RETURNS: selected codon as string. Example: ATG # You probably don't have to modify this function. def pick_a_codon( aminos,AminoAcid): # This generates the list of possible codons from the dictionary codon_list = [key for key,value in aminos.items() if value == AminoAcid] #print("list",codon_list) count = len(codon_list) select_random = random.randint(0,count) # randint returns a value in [0,count] inclusive. # if count == 2, then select_randm could be (0,1,2) # but there are only 2 codons, so we subtract 1 new_codon=codon_list[select_random-1] return new_codon 19 Part 3 # TODO: Part 3 # Determine if the "protein1" is an open reading frame, meaning # search for the "stop" amino acid, that is stored as a "." (period) # character in the dictionary. # We searched for a string/character back in Lab1 # Just so "stop_test" has a value that will print... stop_test=222 print("seq1 is an open reading frame") print("seq1 NOT ORF. First stop detected in seq1 at position ",stop_test) ############################## END seq1 20 Sample code #create the dictionary of amino acids aminos = { } aminos = make_aminos( aminos ) # Here is an example of how to get the first codon # You will probably want to comment these 2 lines out # But you don't have to. codon=seq1[0:3] print(f'for codon {codon} amino acid = {aminos[codon]}') # This is how I tested the pick_a_codon function # I simply give it the P amino acid 10 times in a row # and look at the codons it gives me #for i in range(10): # new_codon = pick_a_codon(aminos,"P") # print(new_codon) 21 Program that I give you -- Runs, but does not complete the lab: If you run the starting program (without modifications), the output should be: for codon ATG amino acid = M seq1 ATGACCCGGGATGCATGGCCAATATAGATAGATAGTGCCCATTTGGGCCCATGCCATGCTAGC new_seq1 protein1 seq1 is an open reading frame seq1 NOT ORF. First stop detected in seq1 at position 222 22 Completed Program Sample output for "seq1" with completed program: for codon ATG amino acid = M seq1 ATGACCCGGGATGCATGGCCAATATAGATAGATAGTGCCCATTTGGGCCCATGCCATGCTAGC new_seq1 ATGACAAGAGACGCGTGGCCGATATGAATTGATTCAGCGCACCTCGGCCCATGTCACGCGAGT protein1 MTRDAWPI.IDSAHLGPCHAS seq1 NOT ORF. First stop detected in seq1 at position 8 seq2 ATGGCTGAGGAGAGAGTCG.... [REDACTED] 23 End 24

Use Quizgecko on...
Browser
Browser