Saturday, January 5, 2019

Learning Python


0. Set up python env

    install homebrew on macos
    install python3 with homebrew: brew install python3
    install python modules with pip3, pip3 is included in python3: pip3 install requests

0.1 Trick Nice for testing out python scripts in command lind

python << EOF
import sys
print(sys.version)
EOF

input([prompt]) #evaluates to a string equal to the user's input text. if prompt is not empty, it will print prmpt w/o newline

str(), int(), float()

print() print an empty line

print([object [, object]* [, sep=' '] [, end='\n'] [, file=sys.stdout] [, flush=Falsue])

1. Basics

    Math operators:
        **    #(exponent)
        //      #(integraer division/floored quotient, towards negative infinity)
    Strings
        'Alice' + 'Bob'    #concatenations, 'AliceBob'
        'Alice' * 3          #string replication, 'AliceAliceAlice'

    Command line arguments:
        https://www.tutorialspoint.com/python/python_command_line_arguments.htm
        sys.args
        getopt

2. Flow Control

    comparison: ==, !=, <, >, <=, >=
    boolean: True/False, and, or, not
    after any math and comparison operators evaluate, Python evaluates the not operators first, then the and operators, and then the or operators.
    Loops:
        if/elif/else
            if a<8:
                a = a*2
            elif a<10:
                a = a*8
            else:
                a = a-1
        general tenary form: value_true if <test> else value_false
            print('home') if is_home else print('not home')
        while
        break
        continue
        for i in range()
        range(start, stop, step)
    Ending a program early with sys.exit()
    the colon and the indentation design hisotry:
        http://python-history.blogspot.com/2011/07/karin-dewar-indentation-and-colon.html

3. Functions

    def name([arg,... arg=value,... *arg, **kwarg]):
        suite
#order counts. Just as non-default arguments have to precede default arguments, so *args must come before **kwargs.
#positional arguments, keywoard(named) arguments
#Python args and kwargs: Demystified
    return
    None    #absence of a vaule, only value of the NoneType data type
     *args #varying number of a tuple of positional arguments
     **kwargs #similar to *args, but the iterables are a dictionary of keyword arguments.
     keyword arguments:    #often used for optional parameters
        print('Hello', 'cats', 'dogs', sep=',' end='')
    Exception handling:
        def spam(divideBy):
            try:
                return 42 / divideBy
            except ZeroDivisionError:
                print('Error: Invalid argument.')
#Unpacking With the Asterisk Operators: * & **
# extract_list_body.py 
my_list = [1, 2, 3, 4, 5, 6] 
a, *b, c = my_list
# merging_lists.py 
my_first_list = [1, 2, 3] 
my_second_list = [4, 5, 6] 
my_merged_list = [*my_first_list, *my_second_list]
# merging_dicts.py 
my_first_dict = {"A": 1, "B": 2} 
my_second_dict = {"C": 3, "D": 4} 
my_merged_dict = {**my_first_dict, **my_second_dict}
# string_to_list.py 
a = [*"RealPython"]

4. Scoping

    Python does not have block scoping. Python does not have nested lexical scopes below the level of a function, unlike some other languages (C and its progeny, for example).
    Local Scope: variables assigned in a called function are said to exist in that function's local scope
    Global Scope: variables assigned outside all functions are said to exist in the global scope
    Python scoping rules:
        Local variables cannot be used in the global scope
        local scopes cannot use variables in other local scopes
        global vairables can be read from a local scope
        local and global variables with the same name
    global statement to declare a variables as global in a function
    how to tell whether a variable is in local scope or global scope:
        if a variable is being used in global scope, then it is always a global variable.
        if there is a global statement for that variable in a function, it is a global variable.
        otherwise, if the variable is used in an assignment statement in the function, it is a local variable.
        But if the variable is not used in an assignment statement, it is a global variable
    in a function, a variable will either always be global or always be local.
    nonlocal statement for nested functions to access local variables of its enclosing function
    There is only one global scope, and it is created when your program begins
    a local scope is created whenever a function is called.
    for-loop scoping issue:
        the for-loop makes assignments to the variables in the target list. [...] Names in the target list are not deleted when the loops finished, but if the sequence is empty, they will not have been                    assigned to at all by the loop.
        https://eli.thegreenplace.net/2015/the-scope-of-index-variables-in-pythons-for-loops/
        https://stackoverflow.com/questions/4198906/python-list-comprehension-rebind-names-even-after-scope-of-comprehension-is-thi

5. Lists, tuples, mutable and immutable data type, passing references

    lists ['a', 'b', 'c'], []
    tuple ('a', 'b', 'c')
    slices:
        0-based indexing, http://python-history.blogspot.com/2013/10/why-python-uses-0-based-indexing.html
        Half-open intervals: https://www.cs.utexas.edu/~EWD/transcriptions/EWD08xx/EWD831.html
        negative indexs (-1 the last index, -2 second-to-last index)
        spam[:2], spam[1:], spam[-3], spam[:]
    get a list's length with len()
    list concatenation and replication:
        [1, 2, 3] + ['a', 'b', 'c']    #[1, 2, 3, 'a', 'b', 'c']
        ['a', 'b', 'c'] * 2    #['a', 'b', 'c', 'a', 'b', 'c']
    del statement to delete values at an index in a list:
        del spam[2]
    for-loop with list:
        for i in [1, 3, 4, 5]
        for i in range(len(spma))
    in and not in operator:
        'howdy' in ['hello', howdy', 'hans']
        if name not in spam
    multiple assignment trick:
        size, color, disposition = spam    #spam is a list of 3 elemnts
    list methods,
        #l is a list name below
        l.index(value) #find the first appearance
        l.append(value), l.insert(index, value),
        l.remove(value) #remove the first appearance
        l.sort(key=str.lower, reverse=True/False) #default in ASCII order, for alphabetical order, need to treat all in lowercase, reverse default True.
        l.count(X) #return the number of occurences of X in the list l
        l.clear() #renive akk items from list l
    Python uses references whenever variables must store values of mutable data types, such as lists or dictionaries. For values of immutable data types such as strings, integers, or tuples, Python variables will store the value itself.

6. dictionary {'a':4, 'b':4, 'c':5}

    keys(), values(), items() #return not true list, list like
    del dict[key]
    color in somedict    #short hand for check if key exists in dict
    get(key, fallback value)
    setdefault(KEY, default value)
    pretty printing: pprint.pprint(), pprint.pformat()
    dict = {}
    dict['a'] = ...

    Unlike sequences, which are indexed by a range of numbers, dictionaries are indexed by keys, which can be any immutable type; strings and numbers can always be keys. Tuples can be used as keys if they contain only strings, numbers, or tuples; if a tuple contains any mutable object either directly or indirectly, it cannot be used as a key. You can’t use lists as keys, since lists can be modified in place using index assignments, slice assignments, or methods like append() and extend().

7. Strings

    Escape character: \n    #commonly referred to as a singular escape character
    raw strings:
        r'that is carol\s cat'    #all escape characters are ignored, backslashes and the first character after it is included.
    Concatenate strings:
        a) "string0" "string1" "string2" #this form may span multiple lines if parenthesized
        b) "string0" "string1" + "string2"
    string formatting: '{0}, {1}, {2:.2f}'.format(42, 'spam', 1/3.0)
    string formatting since python3.6 (string interpolation syntax): f'{self.id} ({self.title})'
    multiple line strings with tripple quotes: """ 
    multiline comments: '''
    indexing and slicing strings, same as lists
    upper(), lower(), isupper(), islower()    #do not change string, but return a new string
    isX(), isalpha(), isalnum(), isdecimal(), isspace(), istitgle()
    startswith() and endswith()
    join(), split(sep=None, Maxsplit=-1):
        https://note.nkmk.me/en/python-split-rsplit-splitlines-re/
        ', '.join(['cats', 'rats', 'bats'])
        [MyABCnameABCisABCSimon'.split('ABC')#resulting list does not include 'ABC'
        #the argument is omitted, it will be separated by whitespace. Whitespace include spaces, newlines \n and tabs \t, and consecutive whitespace are processed together.
    strip([chars]), rstrip(), lstrip() #remove whitespaces
    #The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped
    #'www.example.com'.strip('cmowz.') evaluates to 'example'
    #non-empty strings evaluate to True

8. Regular expression

    What is the difference between re.search and re.match?
    pattern = r'lsakdfjldfj'
    mo = re.search(pattern, 'text to search')
    mo.group(), mo.group(0)    #return entire matched text
    mo.group(1),
    mo.groups()    #return a tuple of all the group matches

    |    #matching multiple groups, e.g. r'Bat(man|mobile|copter|bat)'
    ?    #optional matching, matching 0 or 1 of the group preceding ?
    .    #match 1 of any character, except for newline(unless pass re.DOTALL options)
    *    #match 0 or more with *
    +    #match 1 or more with +
    {3}    #match specific repetitions of the group preciding {}
    {3,5}    #macth 3, 4, or 5 repetitions
    {,5}
    {3,}
    {3,5}? or *? or +?    #non-greedy matching
    ^    #match start of line
    $    $match end of line
    .* #match everything greedy way
    [aeiouAEIOU]    # character class
    [^aeiouAEIOU]    #negative character class, will match anything not in the character class
 
    re.findall(pattern, text):
        #if there are groups in the regular expression, then findall() will return a list of tuples. each tuple represents a found match, and it's items are the matched strings for each group in the regex.
        pattern = r'(\d\d\d)-(\d\d\d)-(\d\d\d\d)'
        re.findall(pattern, 'Cell: 415-555-9999 Work: 212-555-000')
        results is [('415', '555', '9999'), ('212', '555', '0000')]
   Substituting strings with sub():
        pattern = r'Agent (\w)\w*'
        re.sub(pattern, r'\1****', 'Agent Alice told Agent Carol that Agent Eve was a double agent')
        #above returns 'A**** told C**** that E**** was a double agent'
    ignore whitespace and comments inside regex string:
        regex = re.compile(r'''(
            (\d{3})    #alsdkfj
            (\s+)        #laksdjf
            )''', re.VERBOSE)
    re.search(r'lskdj', string, re.IGNORECAE | re.DOTALL | re.VERBOSE)
    re.split(pattern, string [, maxsplit=0]) #more powerful than str.split
    #re.split('\d+', s_nums)
    #re.split('[-+#]', s_marks)
    #re.split('XXX|YYY|ZZZ', s_strs)

9. Reading and writing files

    system folder structures:
        os.path.join('user', 'bin', 'spam')
        os.getcwd()
        os.chdir()
        absolute path vs relative path
        . and ..
        os.makedirs()    #will create any necessary folders in order to ensure that the ful path exists
        os.path.abspath(path)
        os.path.isabs(path)
        os.path.relpath(path, [start])
        os.path.dirname(path)
        os.path.basename(path):
            /usr/bin/python3    #dirname: /usr/bin/   basename: python3
        os.path.split()    #return a tuple with two strings, (dirname, basename)
        os.path.sep    #system folder separator
        os.path.getsize(path)    #size in bytes
        os.listdir(path)    #list of filename strings for each file in the path argument
        os.path.exists(path)
        os.path.isfile(path)
        os.path.isdir(path)
    File reading and writing:
        open(path, [c(reate)/r(read)/w(rite)/a(ppend)]/wb(write binary)])    #return a File object
        File.read()    #entire contents of a file as a string value
        File.readlines()    #list of string values from the file, one string for each line of text
        File.write()
        File.close()
    Two methods to explicitly close files after use:
        1) try/finally
        myfile = open(r'some_file_path', 'w')
        try:
            ...use myfile...
        finally:
            myfile.close()
        2) with statement
        with open(r'some_file_path', 'w') as a, open(r'some_file_path', 'w') as b:
            ...proces a (auto-closed on suite exit)...
    Shelve module    #store any type of data with string keys, dictionary like
        import shelve
        shelfFile = shelve.open('mydata')
        cats = ['Zophie', 'Pooka', 'Simon']
        shelfFile['cats'] = cats
        shelfFile.close()

10. Organizing files

    Zip files:
        import zipfile
        zf = zipfile.ZipFile('zip_file_name')
        zf.namelist()
        zf.read('file_name')        #return single file content in bytes datatype
        zinfo = zf.getinfo('file_name')
        zinfo.file_size
        zinfo.compress_size
        zinfo.comment        #return bytes datatype
        extract files
    Extracting from ZIP files:
        zf = zipfile.ZipFile('zip_file_name')
        zf.extractall([destination_dir_path])    #extract all files in the zip file to current or destination dir
        zf.extract('single_file_name', 'destination_dir_path')    #extract a single file from zip file
    Creating and adding to ZIP files:
        newzip = zipfile.ZipFile('new.zip', 'w(rite)/a(ppend)')    #write mode erase all existing content, append mode will add file to zip
        newzip.write('file_to_add_to_zip', compress_type=zipfile.ZIP_DEFLATED)
        newzip.close()
    Bytes type to string
        b'xxx'.decode()        #python 3 default using utf-8 encoding
        b'xxx'.decode(encoding='utf-8')    #specify the encoding to use
    String decode:
        'xxxx'.decode('bz2')    #treat strings as bz2 compressed file content and decompress it
    Linux file command to identify:
        file.write('xxx', 'w')
        file f    #identify file type and it's related program to open
   

11. CSV Files

      import csv

      csvreader = csv.reader(file_object, delimiter='\t')
      csvwriter = csv.writer(file_object, delimiter='\t')
      #For large CSV files, you’ll want to use the reader object in a for loop. This avoids loading the entire file into memory at once.
      for line in csvreader:
            csvwriter.writerow(line)


      csvreader = csv.DictReader(file_object, fieldnames=None, restkey=Nonerestval=Nonedialect='excel'*args**kwds)
     #Create an object that operates like a regular reader but maps the information in each row to a dict whose keys are given by the optional fieldnames parameter. If fieldnames is omitted, the values in the first row of file f will be used as the fieldnames.
     #All other optional or keyword arguments are passed to the underlying reader instance. this means all reader/writer arguments are accepted. same goes for DictWriter
      fieldnames = ['first name', 'last name']
      csvwriter = csv.DictWriter(file_object, field_names=fieldnames, delimiter='\t')
      csvwriter.writeheader()
      for line in csvreader:
          del line['email']
          csvwriter.writerow(line)

12. SQLITE database

      tutorials: A great tutorial (comprehensive)A SQLite Tutorial with Python
      video tutorials: Python SQLite Tutorial: Complete Overview - Creating a Database, Table, and Running Queries
      import sqlite3
      conn = sqlite3.connect('/dir_path/xxx.db') #db in files
      conn = sqlite3.connect(':memory:') #every time run script, the db start from fresh in memory
      cur = conn.cursor()
      cur.execute("""CREATE table employees(
                                  first TEXT,
                                  last TEXT,
                                  pay INTEGER
                                  )""")
      #two ways with placeholders for parameterized queries
      cur.execute("INSERT INTO employees VALUES (?, ?, ?)", ("John", "Doe", 80000))
      cur.execute("INSERT INTO employees VALUES (:first, :last, :pay)", {first:"John", last:"Doe", pay:80000})
      cur.execute("SELECT * FROM employees WHERE last=?", ("Doe", ))
      cur.execute("SELECT * FROM employees WHERE last=:last", {last:"Doe"})

    "Parameterized queries are an important feature of essentially all database interfaces to modern high level programming languages such as the sqlite3 module in Python. This type of query serves to improve the efficiency of queries that are repeated several times. Perhaps more important, they also sanitize inputs that take the place of the ? placeholders which are passed in during the call to the execute method of the cursor object to prevent nefarious inputs leading to SQL injection.

    I feel compelled to give one additional piece of advice as a student of software craftsmanship. When you find yourself doing multiple database manipulations (INSERTs in this case) in order to accomplish what is actually one cumulative task (ie, creating an order) it is best to wrap the subtasks (creating customer, order, then line items) into a single database transaction so you can either commit on success or rollback if an error occurs along the way."

      #two ways to commit, or rollback if exception happens
      #a: explicit writing try statement
      try:
          codd_id = create_customer(con, 'Edgar', 'Codd')
          codd_order = create_order(con, codd_id, '1969-01-12')
          codd_li = create_lineitem(con, codd_order, 4, 1, 16.99)
          # commit the statements
          con.commit()
      except:
          # rollback all database actions since last commit
          con.rollback()
          raise RuntimeError("Uh oh, an error occurred ...") 
      #b: using with statement and context manager
      with conn:
          conn.execute(xxxx)
       #automatically commit if no exception, otherwise will rollback

12. XML with xml.etree.ElementTree

    XML is an inherently hierarchical data format, and the most natural way to represent it is with a tree. ET has two objects for this purpose – ElementTree represents the whole XML document as a tree, and Element represents a single node in this tree. Interactions with the whole document (reading, writing, finding interesting elements) are usually done on the ElementTree level. Interactions with a single XML element and its sub-elements is done on the Element level.

The Element type is a flexible container object, designed to store hierarchical data structures in memory. The type can be described as a cross between a list and a dictionary.

  • Tag It is a string representing the type of data being stored
  • Attributes Consists of a number of attributes stored as dictionaries
  • Text String A text string having information that needs to be displayed
  • Tail String Can also have tail strings if necessary
  • Child Elements Consists of a number of  child elements stored as sequences
ElementTree is a class that wraps the element structure and allows conversion to and from XML.

To create element or subelement:
    import xml.etree.ElementTree as ET
    # build a tree structure
    root = ET.Element("html")
    head = ET.SubElement(root, "head")
    title = ET.SubElement(head, "title")
    title.text = "Page Title"
    body = ET.SubElement(root, "body")
    body.set("bgcolor", "#ffffff")
    body.text = "Hello, World!"
    # wrap it in an ElementTree instance, and save as XML
    tree = ET.ElementTree(root)
    tree.write("page.xhtml")
When creating a new element, you can pass in element attributes using keyword arguments. The previous example is better written as:

    elem = Element("tag", first="1", second="2")

Atrribute:
The Element type provides shortcuts for attrib.get, attrib.keys, and attrib.items. There’s also a set method, to set the value of an element attribute:

from elementtree.ElementTree import Element

elem = Element("tag", first="1", second="2")

# print 'first' attribute
print elem.attrib.get("first")
# same, using shortcut
print elem.get("first")

# print list of keys (using shortcuts)
print elem.keys()
print elem.items()

# the 'third' attribute doesn't exist
print elem.get("third")
print elem.get("third", "default")

# add the attribute and try again
elem.set("third", "3")
print elem.get("third", "default")

Search for subelement:

  • find(pattern) returns the first subelement that matches the given pattern, or None if there is no matching element.
  • findtext(pattern) returns the value of the text attribute for the first subelement that matches the given pattern. If there is no matching element, this method returns None.
  • findall(pattern) returns a list (or another iterable object) of all subelements that match the given pattern and are direct children of the current element. powerful is that this can take XPath expressions. e.g. root.findall("./genre/decade/movie/[year='1992']") will find the movie with attribute of year='1992'. python doc: supported XPath synctax
In ElementTree 1.2 and later, the pattern argument can either be a tag name, or a path expression. If a tag name is given, only direct subelements are checked. Path expressions can be used to search the entire subtree.

Read and Write XML files:
The Element type can be used to represent XML files in memory. The ElementTree wrapper class is used to read and write XML files.

To load an XML file into an Element structure, use the parse function:
                    tree = parse(filename)
                    elem = tree.getroot()
You can also pass in a file handle (or any object with a read method):
                    file = open(filename, "r")
                    tree = parse(file)
                    elem = tree.getroot()

The parse method returns an ElementTree object. To get the topmost element object, use the getroot method.

In recent versions of the ElementTree module, you can also use the file keyword argument to create a tree, and fill it with contents from a file in one operation:
          tree = ElementTree(file=filename)
          elem = tree.getroot()

To save an element tree back to disk, use the write method on the ElementTree class. Like the parse function, it takes either a filename or a file object (or any object with a write method):
          tree.write(outfile)

If you want to write an Element object hierarchy to disk, wrap it in an ElementTree instance:

          from elementtree.ElementTree import Element, SubElement, ElementTree
          html = Element("html")
          body = SubElement(html, "body")
          ElementTree(html).write(outfile)

Note that the standard element writer creates a compact output. There is no built-in support for pretty printing or user-defined namespace prefixes in the current version, so the output may not always be suitable for human consumption (to the extent XML is suitable for human consumption, that is).

#read the python doc to understand the following parameters!
write(file, encoding="us-ascii", xml_declaration=None, default_namespace=None, method="xml", *, short_empty_elements=True)

One way to produce nicer output is to add whitespace to the tree before saving it; see the indent function on the Element Library Functions page for an example.

To convert between XML and strings, you can use the XML, fromstring, and tostring helpers:

from elementtree.ElementTree import XML, fromstring, tostring
elem = XML(text)
elem = fromstring(text) # same as XML(text)
text = tostring(elem)

When in doubt, print it out (print(ET.tostring(root, encoding='utf8').decode('utf8'))) - use this helpful print statement to view the entire XML document at once. It helps to check when editing, adding, or removing from an XML.

Appendix I Book References

References: online resources from the book: https://nostarch.com/automatestuffresources

Appendix II HTML Tags and CSS Selectors

    A good reference for HTM/CSS: https://www.htmldog.com/guides/css/intermediate/classid/
    class and id Selectors:
        In the CSS, a class selector is a name preceded by a full stop (“.”) and an ID selector is a name preceded by a hash character (“#”).
        The difference between an ID and a class is that an ID can be used to identify one element, whereas a class can be used to identify more than one.


Appendix III Python Concepts

    with statements:
        with expression [as variable] 
                [, expression [as variable]]*:
            suite
    The with statement wraps a nested block of code in a context manager, which can run block entry actions, and ensure that block exit actions are run whether exceptions are raised or not. with can be alternative to try/finally for exit actions, but only for objects having context managers.
    expression is assumed to return an object that supports the context management protocol. This object may also return a value that will be assigned to the name variable if the optional as clause is present. Classes may define custom context managers, and some built-in types such as files and threads provide context managers with exit actions that close files, release thread locks, etc.:


    list comprehensions

Appendix III Bitwise Operation

Python save number in Two's complement with unlimited precision.
bitwise operators:
&, |, ^, ~,<<
>>
#right shift is arithmetic shift instead of logical shift. so the 
#leftmost bit is filled with original value. this will make difference 
#when dealing with negative numbers.

Appendix IV Type Conversion

Number to Strings:
hex(X), oct(X), bin(X), str(X)
chr(I) #integer to character
String to Number:
int(S [, base]), float(S)
ord(C) #character to Unicode value (including ASCII)
 

 


C Programming

Header Files and Includes https://cplusplus.com/forum/articles/10627/ https://stackoverflow.com/questions/2762568/c-c-include-header-file-or...