0. Set up python env
install homebrew on macosinstall python3 with homebrew: brew install python3
install python modules with pip3, pip3 is included in python3: pip3 install requests
0.1 Trick Nice for testing out python scripts in command lind
python << EOFimport sys
print(sys.version)
EOF
input([prompt]) #evaluates to a string equal to the user's input text. if prompt is not empty, it will print prmpt w/o newline
str(), int(), float()
print() print an empty line
print([object [, object]* [, sep=' '] [, end='\n'] [, file=sys.stdout] [, flush=Falsue])
** #(exponent)
// #(integraer division/floored quotient, towards negative infinity)
Strings
'Alice' + 'Bob' #concatenations, 'AliceBob'
'Alice' * 3 #string replication, 'AliceAliceAlice'
Command line arguments:
https://www.tutorialspoint.com/python/python_command_line_arguments.htm
sys.args
getopt
boolean: True/False, and, or, not
after any math and comparison operators evaluate, Python evaluates the not operators first, then the and operators, and then the or operators.
Loops:
if/elif/else
if a<8:
a = a*2
elif a<10:
a = a*8
else:
a = a-1
general tenary form: value_true if <test> else value_false
print('home') if is_home else print('not home')
while
break
continue
for i in range()
range(start, stop, step)
Ending a program early with sys.exit()
the colon and the indentation design hisotry:
http://python-history.blogspot.com/2011/07/karin-dewar-indentation-and-colon.html
suite
#order counts. Just as non-default arguments have to precede default arguments, so *args must come before **kwargs.
#positional arguments, keywoard(named) arguments
#Python args and kwargs: Demystified
return
None #absence of a vaule, only value of the NoneType data type
*args #varying number of a tuple of positional arguments
1. Basics
Math operators:** #(exponent)
// #(integraer division/floored quotient, towards negative infinity)
Strings
'Alice' + 'Bob' #concatenations, 'AliceBob'
'Alice' * 3 #string replication, 'AliceAliceAlice'
Command line arguments:
https://www.tutorialspoint.com/python/python_command_line_arguments.htm
sys.args
getopt
2. Flow Control
comparison: ==, !=, <, >, <=, >=boolean: True/False, and, or, not
after any math and comparison operators evaluate, Python evaluates the not operators first, then the and operators, and then the or operators.
Loops:
if/elif/else
if a<8:
a = a*2
elif a<10:
a = a*8
else:
a = a-1
general tenary form: value_true if <test> else value_false
print('home') if is_home else print('not home')
while
break
continue
for i in range()
range(start, stop, step)
Ending a program early with sys.exit()
the colon and the indentation design hisotry:
http://python-history.blogspot.com/2011/07/karin-dewar-indentation-and-colon.html
3. Functions
def name([arg,... arg=value,... *arg, **kwarg]):suite
#order counts. Just as non-default arguments have to precede default arguments, so *args must come before **kwargs.
#positional arguments, keywoard(named) arguments
#Python args and kwargs: Demystified
return
None #absence of a vaule, only value of the NoneType data type
*args #varying number of a tuple of positional arguments
**kwargs #similar to *args, but the iterables are a dictionary of keyword arguments.
keyword arguments: #often used for optional parameters
print('Hello', 'cats', 'dogs', sep=',' end='')
Exception handling:
def spam(divideBy):
try:
return 42 / divideBy
except ZeroDivisionError:
print('Error: Invalid argument.')
#Unpacking With the Asterisk Operators: * & **
print('Hello', 'cats', 'dogs', sep=',' end='')
Exception handling:
def spam(divideBy):
try:
return 42 / divideBy
except ZeroDivisionError:
print('Error: Invalid argument.')
#Unpacking With the Asterisk Operators: * & **
# extract_list_body.py
my_list = [1, 2, 3, 4, 5, 6]
a, *b, c = my_list
# merging_lists.py
# merging_lists.py
my_first_list = [1, 2, 3]
my_second_list = [4, 5, 6]
my_merged_list = [*my_first_list, *my_second_list]
# merging_dicts.py
# merging_dicts.py
my_first_dict = {"A": 1, "B": 2}
my_second_dict = {"C": 3, "D": 4}
my_merged_dict = {**my_first_dict, **my_second_dict}
# string_to_list.py
# string_to_list.py
a = [*"RealPython"]
Local Scope: variables assigned in a called function are said to exist in that function's local scope
Global Scope: variables assigned outside all functions are said to exist in the global scope
Python scoping rules:
Local variables cannot be used in the global scope
local scopes cannot use variables in other local scopes
global vairables can be read from a local scope
local and global variables with the same name
global statement to declare a variables as global in a function
how to tell whether a variable is in local scope or global scope:
if a variable is being used in global scope, then it is always a global variable.
if there is a global statement for that variable in a function, it is a global variable.
otherwise, if the variable is used in an assignment statement in the function, it is a local variable.
But if the variable is not used in an assignment statement, it is a global variable
in a function, a variable will either always be global or always be local.
nonlocal statement for nested functions to access local variables of its enclosing function
There is only one global scope, and it is created when your program begins
a local scope is created whenever a function is called.
for-loop scoping issue:
the for-loop makes assignments to the variables in the target list. [...] Names in the target list are not deleted when the loops finished, but if the sequence is empty, they will not have been assigned to at all by the loop.
https://eli.thegreenplace.net/2015/the-scope-of-index-variables-in-pythons-for-loops/
https://stackoverflow.com/questions/4198906/python-list-comprehension-rebind-names-even-after-scope-of-comprehension-is-thi
tuple ('a', 'b', 'c')
slices:
0-based indexing, http://python-history.blogspot.com/2013/10/why-python-uses-0-based-indexing.html
4. Scoping
Python does not have block scoping. Python does not have nested lexical scopes below the level of a function, unlike some other languages (C and its progeny, for example).Local Scope: variables assigned in a called function are said to exist in that function's local scope
Global Scope: variables assigned outside all functions are said to exist in the global scope
Python scoping rules:
Local variables cannot be used in the global scope
local scopes cannot use variables in other local scopes
global vairables can be read from a local scope
local and global variables with the same name
global statement to declare a variables as global in a function
how to tell whether a variable is in local scope or global scope:
if a variable is being used in global scope, then it is always a global variable.
if there is a global statement for that variable in a function, it is a global variable.
otherwise, if the variable is used in an assignment statement in the function, it is a local variable.
But if the variable is not used in an assignment statement, it is a global variable
in a function, a variable will either always be global or always be local.
nonlocal statement for nested functions to access local variables of its enclosing function
There is only one global scope, and it is created when your program begins
a local scope is created whenever a function is called.
for-loop scoping issue:
the for-loop makes assignments to the variables in the target list. [...] Names in the target list are not deleted when the loops finished, but if the sequence is empty, they will not have been assigned to at all by the loop.
https://eli.thegreenplace.net/2015/the-scope-of-index-variables-in-pythons-for-loops/
https://stackoverflow.com/questions/4198906/python-list-comprehension-rebind-names-even-after-scope-of-comprehension-is-thi
5. Lists, tuples, mutable and immutable data type, passing references
lists ['a', 'b', 'c'], []tuple ('a', 'b', 'c')
slices:
0-based indexing, http://python-history.blogspot.com/2013/10/why-python-uses-0-based-indexing.html
Half-open intervals: https://www.cs.utexas.edu/~EWD/transcriptions/EWD08xx/EWD831.html
negative indexs (-1 the last index, -2 second-to-last index)
spam[:2], spam[1:], spam[-3], spam[:]
get a list's length with len()
list concatenation and replication:
[1, 2, 3] + ['a', 'b', 'c'] #[1, 2, 3, 'a', 'b', 'c']
['a', 'b', 'c'] * 2 #['a', 'b', 'c', 'a', 'b', 'c']
del statement to delete values at an index in a list:
del spam[2]
for-loop with list:
for i in [1, 3, 4, 5]
for i in range(len(spma))
in and not in operator:
'howdy' in ['hello', howdy', 'hans']
if name not in spam
multiple assignment trick:
size, color, disposition = spam #spam is a list of 3 elemnts
list methods,
#l is a list name below
l.index(value) #find the first appearance
l.append(value), l.insert(index, value),
l.remove(value) #remove the first appearance
l.sort(key=str.lower, reverse=True/False) #default in ASCII order, for alphabetical order, need to treat all in lowercase, reverse default True.
l.count(X) #return the number of occurences of X in the list l
l.clear() #renive akk items from list l
Python uses references whenever variables must store values of mutable data types, such as lists or dictionaries. For values of immutable data types such as strings, integers, or tuples, Python variables will store the value itself.
del dict[key]
color in somedict #short hand for check if key exists in dict
get(key, fallback value)
setdefault(KEY, default value)
pretty printing: pprint.pprint(), pprint.pformat()
negative indexs (-1 the last index, -2 second-to-last index)
spam[:2], spam[1:], spam[-3], spam[:]
get a list's length with len()
list concatenation and replication:
[1, 2, 3] + ['a', 'b', 'c'] #[1, 2, 3, 'a', 'b', 'c']
['a', 'b', 'c'] * 2 #['a', 'b', 'c', 'a', 'b', 'c']
del statement to delete values at an index in a list:
del spam[2]
for-loop with list:
for i in [1, 3, 4, 5]
for i in range(len(spma))
in and not in operator:
'howdy' in ['hello', howdy', 'hans']
if name not in spam
multiple assignment trick:
size, color, disposition = spam #spam is a list of 3 elemnts
list methods,
#l is a list name below
l.index(value) #find the first appearance
l.append(value), l.insert(index, value),
l.remove(value) #remove the first appearance
l.sort(key=str.lower, reverse=True/False) #default in ASCII order, for alphabetical order, need to treat all in lowercase, reverse default True.
l.count(X) #return the number of occurences of X in the list l
l.clear() #renive akk items from list l
Python uses references whenever variables must store values of mutable data types, such as lists or dictionaries. For values of immutable data types such as strings, integers, or tuples, Python variables will store the value itself.
6. dictionary {'a':4, 'b':4, 'c':5}
keys(), values(), items() #return not true list, list likedel dict[key]
color in somedict #short hand for check if key exists in dict
get(key, fallback value)
setdefault(KEY, default value)
pretty printing: pprint.pprint(), pprint.pformat()
dict = {}
dict['a'] = ...
Unlike sequences, which are indexed by a range of numbers, dictionaries are indexed by keys, which can be any immutable type; strings and numbers can always be keys. Tuples can be used as keys if they contain only strings, numbers, or tuples; if a tuple contains any mutable object either directly or indirectly, it cannot be used as a key. You can’t use lists as keys, since lists can be modified in place using index assignments, slice assignments, or methods like append() and extend().
raw strings:
r'that is carol\s cat' #all escape characters are ignored, backslashes and the first character after it is included.
Concatenate strings:
a) "string0" "string1" "string2" #this form may span multiple lines if parenthesized
b) "string0" "string1" + "string2"
string formatting: '{0}, {1}, {2:.2f}'.format(42, 'spam', 1/3.0)
string formatting since python3.6 (string interpolation syntax): f'{self.id} ({self.title})'
multiple line strings with tripple quotes: """
multiline comments: '''
indexing and slicing strings, same as lists
upper(), lower(), isupper(), islower() #do not change string, but return a new string
isX(), isalpha(), isalnum(), isdecimal(), isspace(), istitgle()
startswith() and endswith()
join(), split(sep=None, Maxsplit=-1):
https://note.nkmk.me/en/python-split-rsplit-splitlines-re/
', '.join(['cats', 'rats', 'bats'])
[MyABCnameABCisABCSimon'.split('ABC')#resulting list does not include 'ABC'
#the argument is omitted, it will be separated by whitespace. Whitespace include spaces, newlines \n and tabs \t, and consecutive whitespace are processed together.
strip([chars]), rstrip(), lstrip() #remove whitespaces
#The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped
#'www.example.com'.strip('cmowz.') evaluates to 'example'
#non-empty strings evaluate to True
pattern = r'lsakdfjldfj'
mo = re.search(pattern, 'text to search')
mo.group(), mo.group(0) #return entire matched text
mo.group(1),
mo.groups() #return a tuple of all the group matches
| #matching multiple groups, e.g. r'Bat(man|mobile|copter|bat)'
? #optional matching, matching 0 or 1 of the group preceding ?
. #match 1 of any character, except for newline(unless pass re.DOTALL options)
* #match 0 or more with *
+ #match 1 or more with +
{3} #match specific repetitions of the group preciding {}
{3,5} #macth 3, 4, or 5 repetitions
{,5}
{3,}
{3,5}? or *? or +? #non-greedy matching
^ #match start of line
$ $match end of line
.* #match everything greedy way
[aeiouAEIOU] # character class
[^aeiouAEIOU] #negative character class, will match anything not in the character class
re.findall(pattern, text):
#if there are groups in the regular expression, then findall() will return a list of tuples. each tuple represents a found match, and it's items are the matched strings for each group in the regex.
pattern = r'(\d\d\d)-(\d\d\d)-(\d\d\d\d)'
re.findall(pattern, 'Cell: 415-555-9999 Work: 212-555-000')
results is [('415', '555', '9999'), ('212', '555', '0000')]
Substituting strings with sub():
pattern = r'Agent (\w)\w*'
re.sub(pattern, r'\1****', 'Agent Alice told Agent Carol that Agent Eve was a double agent')
#above returns 'A**** told C**** that E**** was a double agent'
ignore whitespace and comments inside regex string:
regex = re.compile(r'''(
(\d{3}) #alsdkfj
(\s+) #laksdjf
)''', re.VERBOSE)
re.search(r'lskdj', string, re.IGNORECAE | re.DOTALL | re.VERBOSE)
re.split(pattern, string [, maxsplit=0]) #more powerful than str.split
#re.split('\d+', s_nums)
#re.split('[-+#]', s_marks)
#re.split('XXX|YYY|ZZZ', s_strs)
os.path.join('user', 'bin', 'spam')
os.getcwd()
os.chdir()
absolute path vs relative path
. and ..
os.makedirs() #will create any necessary folders in order to ensure that the ful path exists
os.path.abspath(path)
os.path.isabs(path)
os.path.relpath(path, [start])
os.path.dirname(path)
os.path.basename(path):
/usr/bin/python3 #dirname: /usr/bin/ basename: python3
os.path.split() #return a tuple with two strings, (dirname, basename)
os.path.sep #system folder separator
os.path.getsize(path) #size in bytes
os.listdir(path) #list of filename strings for each file in the path argument
os.path.exists(path)
os.path.isfile(path)
os.path.isdir(path)
File reading and writing:
open(path, [c(reate)/r(read)/w(rite)/a(ppend)]/wb(write binary)]) #return a File object
File.read() #entire contents of a file as a string value
File.readlines() #list of string values from the file, one string for each line of text
File.write()
File.close()
Two methods to explicitly close files after use:
1) try/finally
myfile = open(r'some_file_path', 'w')
try:
...use myfile...
finally:
myfile.close()
2) with statement
with open(r'some_file_path', 'w') as a, open(r'some_file_path', 'w') as b:
...proces a (auto-closed on suite exit)...
Shelve module #store any type of data with string keys, dictionary like
import shelve
shelfFile = shelve.open('mydata')
cats = ['Zophie', 'Pooka', 'Simon']
shelfFile['cats'] = cats
shelfFile.close()
import zipfile
zf = zipfile.ZipFile('zip_file_name')
zf.namelist()
zf.read('file_name') #return single file content in bytes datatype
zinfo = zf.getinfo('file_name')
zinfo.file_size
zinfo.compress_size
zinfo.comment #return bytes datatype
extract files
Extracting from ZIP files:
zf = zipfile.ZipFile('zip_file_name')
zf.extractall([destination_dir_path]) #extract all files in the zip file to current or destination dir
zf.extract('single_file_name', 'destination_dir_path') #extract a single file from zip file
Creating and adding to ZIP files:
newzip = zipfile.ZipFile('new.zip', 'w(rite)/a(ppend)') #write mode erase all existing content, append mode will add file to zip
newzip.write('file_to_add_to_zip', compress_type=zipfile.ZIP_DEFLATED)
newzip.close()
Bytes type to string
b'xxx'.decode() #python 3 default using utf-8 encoding
b'xxx'.decode(encoding='utf-8') #specify the encoding to use
String decode:
'xxxx'.decode('bz2') #treat strings as bz2 compressed file content and decompress it
Linux file command to identify:
file.write('xxx', 'w')
file f #identify file type and it's related program to open
csvreader = csv.reader(file_object, delimiter='\t')
csvwriter = csv.writer(file_object, delimiter='\t')
#For large CSV files, you’ll want to use the reader object in a for loop. This avoids loading the entire file into memory at once.
for line in csvreader:
csvwriter.writerow(line)
csvreader = csv.DictReader(file_object, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds)
#Create an object that operates like a regular reader but maps the information in each row to a dict whose keys are given by the optional fieldnames parameter. If fieldnames is omitted, the values in the first row of file f will be used as the fieldnames.
#All other optional or keyword arguments are passed to the underlying
fieldnames = ['first name', 'last name']
csvwriter = csv.DictWriter(file_object, field_names=fieldnames, delimiter='\t')
csvwriter.writeheader()
for line in csvreader:
del line['email']
csvwriter.writerow(line)
video tutorials: Python SQLite Tutorial: Complete Overview - Creating a Database, Table, and Running Queries
#two ways to commit, or rollback if exception happens
#a: explicit writing try statement
try:
codd_id = create_customer(con, 'Edgar', 'Codd')
codd_order = create_order(con, codd_id, '1969-01-12')
codd_li = create_lineitem(con, codd_order, 4, 1, 16.99)
# commit the statements
con.commit()
except:
# rollback all database actions since last commit
con.rollback()
raise RuntimeError("Uh oh, an error occurred ...")
#b: using with statement and context manager
with conn:
conn.execute(xxxx)
#automatically commit if no exception, otherwise will rollback
To create element or subelement:
import xml.etree.ElementTree as ET
# build a tree structure
root = ET.Element("html")
head = ET.SubElement(root, "head")
title = ET.SubElement(head, "title")
title.text = "Page Title"
body = ET.SubElement(root, "body")
body.set("bgcolor", "#ffffff")
body.text = "Hello, World!"
# wrap it in an ElementTree instance, and save as XML
tree = ET.ElementTree(root)
tree.write("page.xhtml")
When creating a new element, you can pass in element attributes using keyword arguments. The previous example is better written as:
elem = Element("tag", first="1", second="2")
Atrribute:
The Element type provides shortcuts for attrib.get, attrib.keys, and attrib.items. There’s also a set method, to set the value of an element attribute:
from elementtree.ElementTree import Element
elem = Element("tag", first="1", second="2")
# print 'first' attribute
print elem.attrib.get("first")
# same, using shortcut
print elem.get("first")
# print list of keys (using shortcuts)
print elem.keys()
print elem.items()
# the 'third' attribute doesn't exist
print elem.get("third")
print elem.get("third", "default")
# add the attribute and try again
elem.set("third", "3")
print elem.get("third", "default")
Search for subelement:
Read and Write XML files:
The Element type can be used to represent XML files in memory. The ElementTree wrapper class is used to read and write XML files.
To load an XML file into an Element structure, use the parse function:
tree = parse(filename)
elem = tree.getroot()
You can also pass in a file handle (or any object with a read method):
file = open(filename, "r")
tree = parse(file)
elem = tree.getroot()
The parse method returns an ElementTree object. To get the topmost element object, use the getroot method.
In recent versions of the ElementTree module, you can also use the file keyword argument to create a tree, and fill it with contents from a file in one operation:
tree = ElementTree(file=filename)
elem = tree.getroot()
To save an element tree back to disk, use the write method on the ElementTree class. Like the parse function, it takes either a filename or a file object (or any object with a write method):
tree.write(outfile)
If you want to write an Element object hierarchy to disk, wrap it in an ElementTree instance:
from elementtree.ElementTree import Element, SubElement, ElementTree
html = Element("html")
body = SubElement(html, "body")
ElementTree(html).write(outfile)
Note that the standard element writer creates a compact output. There is no built-in support for pretty printing or user-defined namespace prefixes in the current version, so the output may not always be suitable for human consumption (to the extent XML is suitable for human consumption, that is).
#read the python doc to understand the following parameters!
write(file, encoding="us-ascii", xml_declaration=None, default_namespace=None, method="xml", *, short_empty_elements=True)
One way to produce nicer output is to add whitespace to the tree before saving it; see the indent function on the Element Library Functions page for an example.
To convert between XML and strings, you can use the XML, fromstring, and tostring helpers:
from elementtree.ElementTree import XML, fromstring, tostring
elem = XML(text)
elem = fromstring(text) # same as XML(text)
text = tostring(elem)
When in doubt, print it out (print(ET.tostring(root, encoding='utf8').decode('utf8'))) - use this helpful print statement to view the entire XML document at once. It helps to check when editing, adding, or removing from an XML.
class and id Selectors:
In the CSS, a class selector is a name preceded by a full stop (“.”) and an ID selector is a name preceded by a hash character (“#”).
The difference between an ID and a class is that an ID can be used to identify one element, whereas a class can be used to identify more than one.
with expression [as variable]
[, expression [as variable]]*:
suite
The with statement wraps a nested block of code in a context manager, which can run block entry actions, and ensure that block exit actions are run whether exceptions are raised or not. with can be alternative to try/finally for exit actions, but only for objects having context managers.
expression is assumed to return an object that supports the context management protocol. This object may also return a value that will be assigned to the name variable if the optional as clause is present. Classes may define custom context managers, and some built-in types such as files and threads provide context managers with exit actions that close files, release thread locks, etc.:
list comprehensions
Unlike sequences, which are indexed by a range of numbers, dictionaries are indexed by keys, which can be any immutable type; strings and numbers can always be keys. Tuples can be used as keys if they contain only strings, numbers, or tuples; if a tuple contains any mutable object either directly or indirectly, it cannot be used as a key. You can’t use lists as keys, since lists can be modified in place using index assignments, slice assignments, or methods like append() and extend().
7. Strings
Escape character: \n #commonly referred to as a singular escape characterraw strings:
r'that is carol\s cat' #all escape characters are ignored, backslashes and the first character after it is included.
Concatenate strings:
a) "string0" "string1" "string2" #this form may span multiple lines if parenthesized
b) "string0" "string1" + "string2"
string formatting: '{0}, {1}, {2:.2f}'.format(42, 'spam', 1/3.0)
string formatting since python3.6 (string interpolation syntax): f'{self.id} ({self.title})'
multiple line strings with tripple quotes: """
multiline comments: '''
indexing and slicing strings, same as lists
upper(), lower(), isupper(), islower() #do not change string, but return a new string
isX(), isalpha(), isalnum(), isdecimal(), isspace(), istitgle()
startswith() and endswith()
join(), split(sep=None, Maxsplit=-1):
https://note.nkmk.me/en/python-split-rsplit-splitlines-re/
', '.join(['cats', 'rats', 'bats'])
[MyABCnameABCisABCSimon'.split('ABC')#resulting list does not include 'ABC'
#the argument is omitted, it will be separated by whitespace. Whitespace include spaces, newlines \n and tabs \t, and consecutive whitespace are processed together.
strip([chars]), rstrip(), lstrip() #remove whitespaces
#The chars argument is a string specifying the set of characters to be removed. If omitted or None, the chars argument defaults to removing whitespace. The chars argument is not a prefix or suffix; rather, all combinations of its values are stripped
#'www.example.com'.strip('cmowz.') evaluates to 'example'
#non-empty strings evaluate to True
8. Regular expression
What is the difference between re.search and re.match?pattern = r'lsakdfjldfj'
mo = re.search(pattern, 'text to search')
mo.group(), mo.group(0) #return entire matched text
mo.group(1),
mo.groups() #return a tuple of all the group matches
| #matching multiple groups, e.g. r'Bat(man|mobile|copter|bat)'
? #optional matching, matching 0 or 1 of the group preceding ?
. #match 1 of any character, except for newline(unless pass re.DOTALL options)
* #match 0 or more with *
+ #match 1 or more with +
{3} #match specific repetitions of the group preciding {}
{3,5} #macth 3, 4, or 5 repetitions
{,5}
{3,}
{3,5}? or *? or +? #non-greedy matching
^ #match start of line
$ $match end of line
.* #match everything greedy way
[aeiouAEIOU] # character class
[^aeiouAEIOU] #negative character class, will match anything not in the character class
re.findall(pattern, text):
#if there are groups in the regular expression, then findall() will return a list of tuples. each tuple represents a found match, and it's items are the matched strings for each group in the regex.
pattern = r'(\d\d\d)-(\d\d\d)-(\d\d\d\d)'
re.findall(pattern, 'Cell: 415-555-9999 Work: 212-555-000')
results is [('415', '555', '9999'), ('212', '555', '0000')]
Substituting strings with sub():
pattern = r'Agent (\w)\w*'
re.sub(pattern, r'\1****', 'Agent Alice told Agent Carol that Agent Eve was a double agent')
#above returns 'A**** told C**** that E**** was a double agent'
ignore whitespace and comments inside regex string:
regex = re.compile(r'''(
(\d{3}) #alsdkfj
(\s+) #laksdjf
)''', re.VERBOSE)
re.search(r'lskdj', string, re.IGNORECAE | re.DOTALL | re.VERBOSE)
re.split(pattern, string [, maxsplit=0]) #more powerful than str.split
#re.split('\d+', s_nums)
#re.split('[-+#]', s_marks)
#re.split('XXX|YYY|ZZZ', s_strs)
9. Reading and writing files
system folder structures:os.path.join('user', 'bin', 'spam')
os.getcwd()
os.chdir()
absolute path vs relative path
. and ..
os.makedirs() #will create any necessary folders in order to ensure that the ful path exists
os.path.abspath(path)
os.path.isabs(path)
os.path.relpath(path, [start])
os.path.dirname(path)
os.path.basename(path):
/usr/bin/python3 #dirname: /usr/bin/ basename: python3
os.path.split() #return a tuple with two strings, (dirname, basename)
os.path.sep #system folder separator
os.path.getsize(path) #size in bytes
os.listdir(path) #list of filename strings for each file in the path argument
os.path.exists(path)
os.path.isfile(path)
os.path.isdir(path)
File reading and writing:
open(path, [c(reate)/r(read)/w(rite)/a(ppend)]/wb(write binary)]) #return a File object
File.read() #entire contents of a file as a string value
File.readlines() #list of string values from the file, one string for each line of text
File.write()
File.close()
Two methods to explicitly close files after use:
1) try/finally
myfile = open(r'some_file_path', 'w')
try:
...use myfile...
finally:
myfile.close()
2) with statement
with open(r'some_file_path', 'w') as a, open(r'some_file_path', 'w') as b:
...proces a (auto-closed on suite exit)...
Shelve module #store any type of data with string keys, dictionary like
import shelve
shelfFile = shelve.open('mydata')
cats = ['Zophie', 'Pooka', 'Simon']
shelfFile['cats'] = cats
shelfFile.close()
10. Organizing files
Zip files:import zipfile
zf = zipfile.ZipFile('zip_file_name')
zf.namelist()
zf.read('file_name') #return single file content in bytes datatype
zinfo = zf.getinfo('file_name')
zinfo.file_size
zinfo.compress_size
zinfo.comment #return bytes datatype
extract files
Extracting from ZIP files:
zf = zipfile.ZipFile('zip_file_name')
zf.extractall([destination_dir_path]) #extract all files in the zip file to current or destination dir
zf.extract('single_file_name', 'destination_dir_path') #extract a single file from zip file
Creating and adding to ZIP files:
newzip = zipfile.ZipFile('new.zip', 'w(rite)/a(ppend)') #write mode erase all existing content, append mode will add file to zip
newzip.write('file_to_add_to_zip', compress_type=zipfile.ZIP_DEFLATED)
newzip.close()
Bytes type to string
b'xxx'.decode() #python 3 default using utf-8 encoding
b'xxx'.decode(encoding='utf-8') #specify the encoding to use
String decode:
'xxxx'.decode('bz2') #treat strings as bz2 compressed file content and decompress it
Linux file command to identify:
file.write('xxx', 'w')
file f #identify file type and it's related program to open
11. CSV Files
import csvcsvreader = csv.reader(file_object, delimiter='\t')
csvwriter = csv.writer(file_object, delimiter='\t')
#For large CSV files, you’ll want to use the reader object in a for loop. This avoids loading the entire file into memory at once.
for line in csvreader:
csvwriter.writerow(line)
csvreader = csv.DictReader(file_object, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds)
#Create an object that operates like a regular reader but maps the information in each row to a dict whose keys are given by the optional fieldnames parameter. If fieldnames is omitted, the values in the first row of file f will be used as the fieldnames.
#All other optional or keyword arguments are passed to the underlying
reader
instance. this means all reader/writer arguments are accepted. same goes for DictWriterfieldnames = ['first name', 'last name']
csvwriter = csv.DictWriter(file_object, field_names=fieldnames, delimiter='\t')
csvwriter.writeheader()
for line in csvreader:
del line['email']
csvwriter.writerow(line)
12. SQLITE database
tutorials: A great tutorial (comprehensive), A SQLite Tutorial with Pythonvideo tutorials: Python SQLite Tutorial: Complete Overview - Creating a Database, Table, and Running Queries
import sqlite3
conn = sqlite3.connect('/dir_path/xxx.db') #db in files
conn = sqlite3.connect(':memory:') #every time run script, the db start from fresh in memory
cur = conn.cursor()
cur.execute("""CREATE table employees(
first TEXT,
last TEXT,
pay INTEGER
)""")
#two ways with placeholders for parameterized queries
cur.execute("INSERT INTO employees VALUES (?, ?, ?)", ("John", "Doe", 80000))
cur.execute("INSERT INTO employees VALUES (:first, :last, :pay)", {first:"John", last:"Doe", pay:80000})
cur.execute("SELECT * FROM employees WHERE last=?", ("Doe", ))
cur.execute("SELECT * FROM employees WHERE last=:last", {last:"Doe"})
"Parameterized queries are an important feature of essentially all database interfaces to modern high level programming languages such as the sqlite3 module in Python. This type of query serves to improve the efficiency of queries that are repeated several times. Perhaps more important, they also sanitize inputs that take the place of the ? placeholders which are passed in during the call to the execute method of the cursor object to prevent nefarious inputs leading to SQL injection.
I feel compelled to give one additional piece of advice as a student of software craftsmanship. When you find yourself doing multiple database manipulations (INSERTs in this case) in order to accomplish what is actually one cumulative task (ie, creating an order) it is best to wrap the subtasks (creating customer, order, then line items) into a single database transaction so you can either commit on success or rollback if an error occurs along the way."
conn = sqlite3.connect('/dir_path/xxx.db') #db in files
conn = sqlite3.connect(':memory:') #every time run script, the db start from fresh in memory
cur = conn.cursor()
cur.execute("""CREATE table employees(
first TEXT,
last TEXT,
pay INTEGER
)""")
#two ways with placeholders for parameterized queries
cur.execute("INSERT INTO employees VALUES (?, ?, ?)", ("John", "Doe", 80000))
cur.execute("INSERT INTO employees VALUES (:first, :last, :pay)", {first:"John", last:"Doe", pay:80000})
cur.execute("SELECT * FROM employees WHERE last=?", ("Doe", ))
cur.execute("SELECT * FROM employees WHERE last=:last", {last:"Doe"})
"Parameterized queries are an important feature of essentially all database interfaces to modern high level programming languages such as the sqlite3 module in Python. This type of query serves to improve the efficiency of queries that are repeated several times. Perhaps more important, they also sanitize inputs that take the place of the ? placeholders which are passed in during the call to the execute method of the cursor object to prevent nefarious inputs leading to SQL injection.
I feel compelled to give one additional piece of advice as a student of software craftsmanship. When you find yourself doing multiple database manipulations (INSERTs in this case) in order to accomplish what is actually one cumulative task (ie, creating an order) it is best to wrap the subtasks (creating customer, order, then line items) into a single database transaction so you can either commit on success or rollback if an error occurs along the way."
#two ways to commit, or rollback if exception happens
#a: explicit writing try statement
try:
codd_id = create_customer(con, 'Edgar', 'Codd')
codd_order = create_order(con, codd_id, '1969-01-12')
codd_li = create_lineitem(con, codd_order, 4, 1, 16.99)
# commit the statements
con.commit()
except:
# rollback all database actions since last commit
con.rollback()
raise RuntimeError("Uh oh, an error occurred ...")
with conn:
conn.execute(xxxx)
#automatically commit if no exception, otherwise will rollback
12. XML with xml.etree.ElementTree
Elements and Element Trees
Python XML with ElementTree: Beginner's Guide
What characters do I need to escape in XML documents?: follow Welbog's and kjhughes's answers
Python XML with ElementTree: Beginner's Guide
What characters do I need to escape in XML documents?: follow Welbog's and kjhughes's answers
XML is an inherently hierarchical data format, and the most natural way to represent it is with a tree. ET has two objects for this purpose – ElementTree represents the whole XML document as a tree, and Element represents a single node in this tree. Interactions with the whole document (reading, writing, finding interesting elements) are usually done on the ElementTree level. Interactions with a single XML element and its sub-elements is done on the Element level.
The Element type is a flexible container object, designed to store hierarchical data structures in memory. The type can be described as a cross between a list and a dictionary.
The Element type is a flexible container object, designed to store hierarchical data structures in memory. The type can be described as a cross between a list and a dictionary.
- Tag It is a string representing the type of data being stored
- Attributes Consists of a number of attributes stored as dictionaries
- Text String A text string having information that needs to be displayed
- Tail String Can also have tail strings if necessary
- Child Elements Consists of a number of child elements stored as sequences
ElementTree is a class that wraps the element structure and allows conversion to and from XML.
To create element or subelement:
import xml.etree.ElementTree as ET
# build a tree structure
root = ET.Element("html")
head = ET.SubElement(root, "head")
title = ET.SubElement(head, "title")
title.text = "Page Title"
body = ET.SubElement(root, "body")
body.set("bgcolor", "#ffffff")
body.text = "Hello, World!"
# wrap it in an ElementTree instance, and save as XML
tree = ET.ElementTree(root)
tree.write("page.xhtml")
When creating a new element, you can pass in element attributes using keyword arguments. The previous example is better written as:
elem = Element("tag", first="1", second="2")
Atrribute:
The Element type provides shortcuts for attrib.get, attrib.keys, and attrib.items. There’s also a set method, to set the value of an element attribute:
from elementtree.ElementTree import Element
elem = Element("tag", first="1", second="2")
# print 'first' attribute
print elem.attrib.get("first")
# same, using shortcut
print elem.get("first")
# print list of keys (using shortcuts)
print elem.keys()
print elem.items()
# the 'third' attribute doesn't exist
print elem.get("third")
print elem.get("third", "default")
# add the attribute and try again
elem.set("third", "3")
print elem.get("third", "default")
Search for subelement:
- find(pattern) returns the first subelement that matches the given pattern, or None if there is no matching element.
- findtext(pattern) returns the value of the text attribute for the first subelement that matches the given pattern. If there is no matching element, this method returns None.
- findall(pattern) returns a list (or another iterable object) of all subelements that match the given pattern and are direct children of the current element. powerful is that this can take XPath expressions. e.g. root.findall("./genre/decade/movie/[year='1992']") will find the movie with attribute of year='1992'. python doc: supported XPath synctax
Read and Write XML files:
The Element type can be used to represent XML files in memory. The ElementTree wrapper class is used to read and write XML files.
To load an XML file into an Element structure, use the parse function:
tree = parse(filename)
elem = tree.getroot()
You can also pass in a file handle (or any object with a read method):
file = open(filename, "r")
tree = parse(file)
elem = tree.getroot()
The parse method returns an ElementTree object. To get the topmost element object, use the getroot method.
In recent versions of the ElementTree module, you can also use the file keyword argument to create a tree, and fill it with contents from a file in one operation:
tree = ElementTree(file=filename)
elem = tree.getroot()
To save an element tree back to disk, use the write method on the ElementTree class. Like the parse function, it takes either a filename or a file object (or any object with a write method):
tree.write(outfile)
If you want to write an Element object hierarchy to disk, wrap it in an ElementTree instance:
from elementtree.ElementTree import Element, SubElement, ElementTree
html = Element("html")
body = SubElement(html, "body")
ElementTree(html).write(outfile)
Note that the standard element writer creates a compact output. There is no built-in support for pretty printing or user-defined namespace prefixes in the current version, so the output may not always be suitable for human consumption (to the extent XML is suitable for human consumption, that is).
#read the python doc to understand the following parameters!
write(file, encoding="us-ascii", xml_declaration=None, default_namespace=None, method="xml", *, short_empty_elements=True)
One way to produce nicer output is to add whitespace to the tree before saving it; see the indent function on the Element Library Functions page for an example.
To convert between XML and strings, you can use the XML, fromstring, and tostring helpers:
from elementtree.ElementTree import XML, fromstring, tostring
elem = XML(text)
elem = fromstring(text) # same as XML(text)
text = tostring(elem)
When in doubt, print it out (print(ET.tostring(root, encoding='utf8').decode('utf8'))) - use this helpful print statement to view the entire XML document at once. It helps to check when editing, adding, or removing from an XML.
Appendix I Book References
References: online resources from the book: https://nostarch.com/automatestuffresourcesAppendix II HTML Tags and CSS Selectors
A good reference for HTM/CSS: https://www.htmldog.com/guides/css/intermediate/classid/class and id Selectors:
In the CSS, a class selector is a name preceded by a full stop (“.”) and an ID selector is a name preceded by a hash character (“#”).
The difference between an ID and a class is that an ID can be used to identify one element, whereas a class can be used to identify more than one.
Appendix III Python Concepts
with statements:with expression [as variable]
[, expression [as variable]]*:
suite
The with statement wraps a nested block of code in a context manager, which can run block entry actions, and ensure that block exit actions are run whether exceptions are raised or not. with can be alternative to try/finally for exit actions, but only for objects having context managers.
expression is assumed to return an object that supports the context management protocol. This object may also return a value that will be assigned to the name variable if the optional as clause is present. Classes may define custom context managers, and some built-in types such as files and threads provide context managers with exit actions that close files, release thread locks, etc.:
list comprehensions
Appendix III Bitwise Operation
Python save number in Two's complement with unlimited precision.
bitwise operators:
&, |, ^, ~,<<
>>
#right shift is arithmetic shift instead of logical shift. so the
#leftmost bit is filled with original value. this will make difference
#when dealing with negative numbers.
Appendix IV Type Conversion
Number to Strings:
hex(X), oct(X), bin(X), str(X)
chr(I) #integer to character
String to Number:
int(S [, base]), float(S)
ord(C) #character to Unicode value (including ASCII)