In an earlier story, I was exulting about Sphinx, a documentation generator that turns restructured text (a smattering of punctuation) into a handsome website.
Turning your Python project into something with official-looking state of the art documentation is an ego boost for anyone.
Sometimes a lonely slog through some project needs a shot in the arm, and boosting one’s ego is just the ticket. “Maybe this project will have more lasting value, now that it’s documented” thinks the little cogito.
I’m still a big Sphinx fan, bigger than ever, but have since learned not to dwell on it too much at first. One must sometimes temper one’s enthusiasm.
Show it off as a valuable asset in the ecosystem, but start in with Jupyter Notebooks instead, as the skills are similar.
Instead of Restructured Text, we use Markdown. The cheat sheet is pretty short.
You get handsome HTML / CSS out the other end, and the code stays interactive.
Put it on Github. The ego gratification is more immediate. Then use Git to keep making it better, whatever it is.
Once in the context of a Jupyter Notebook stash, how best to learn Python?
I approach this task in terms of levels, which I spiral through, adding more to each level in turn, in contrast to providing any exhaustive treatment of one before advancing to the next. By “levels” I mean:
- keywords and punctuation, basic syntax, dot notation, brackets and colon.
- the built-ins, what you don’t need to import, the initial vocabulary (dir, id, hex, input, print) including primitive types (int, str, float, list, tuple, dict, set, range).
- the special names, what I call the __ribs__ of Python, a whole vocabulary unto itself and in a way the key to understanding OOP (object oriented programming) as elegantly implemented in this language.
- the Standard Library (“batteries included”), a vast “cheese shop” of wonderful cheeses (that’s the obligatory Monty Python allusion, for which Python was named).
- the 3rd Party ecosystem of Sphinx and Pandas, Django and Requests. This is the infinite jungle into which one’s own coding projects likely fit, unless you’re a core developer.
I’m tempted to add a sixth dimension: ways to learn all of the above, curriculum materials, videos, “meta-Python” if you will.
What I’ve neglected entirely is going into the machinery of the interpreter itself, perhaps implemented in C, C# or some other language.
The Python virtual machine may be coded in any other Turing complete language, in theory. But I wouldn’t call learning the guts of a Python virtual machine the same thing as learning Python, which is our focus here.
So yes, I spiral through these five levels, sharing a bit more from each with each turn of the spiral.
Then comes the need for overview, combined with zoomed in treatments of the nitty-gritty, and relating these two. That’s where my most recent Jupyter Notebook fits in (the one that inspires me to write this story).
The title is pretty ordinary: Data Structures: Keeping Data Organized.
The concept of “organizing” is fairly complex.
In Intro to OOP: Organizing Polyhedrons, a separate notebook, the topic is sorting shapes by name, or by volume.
Given ((‘VE’, 20), (‘O’, 4), (‘RD’, 6), (‘RT’, 5), (‘C’, 3), (‘T’, 1)) how does one sort these into:
- Name order: ((‘C’, 3), (‘O’, 4), (‘RD’, 6), (‘RT’, 5), (‘T’, 1), (‘VE’, 20))
- Volume order: ((‘T’, 1), (‘C’, 3), (‘O’, 4), (‘RT’, 5), (‘RD’, 6), (‘VE’, 20))
Hint: the sorted function takes a named argument, key=, you may use to tell which element in the pair is the key (leftmost by default, with tie-breaking to the right).
However, sorting requires first getting the data into a structure, in the above case a “tuple”. But that’s pretty nitty-gritty. We could use a bigger picture going in.
So I start with the whole idea of a website as a data store. Readers relate to websites, and to the fact that they:
(a) synthesize web pages on demand and
(b) use stored data to do so.
The MVC classic web framework, such as Web2py, Django or Flask, help us set the stage.
More generally, and mythologically, I find it useful to transform our Python into a Dragon at this point (by way of serpent if necessary).
In fairy tales, such as Lord of the Rings, a dragon guards a hoard of treasure. That’s Python managing (organizing) our data.
Only after looking at the anatomy of a website do I then get nitty-gritty and dive into the built-in data structures.
At this point, I’ll likely use some emoji as string elements, mainly to emphasize that the string type long ago stopped being American Standard Code for Information Interchange. As any grade schooler knows, we’re in the Age of Unicode these days.
### BUILT-IN DATA STRUCTURES# TUPLE
the_tuple = ('🐙', '🐳', '🐯') # <-- emoji are Unicode
Then finally, for my finishing segment, I move from built-in data structures to 3rd party.
The NumPy n-dimensional array is the bread and butter, the meat and potatoes, of computational Python.
The built-in list is great for “orchestration” (big picture organizing of program flow) but when you need to do solid number crunching, like inverting a matrix, that’s where your Numpy array enters the picture.
All elements will need to be the same type. Slice notation is on steroids (because of the multi-dimensions).
The NumPy array is a data structure for grownups.
We need more big picture though, like: “so what about NumPy arrays, what are they used for?”
Enter Machine Learning.
Here, my innovation is to create a pattern coming from the piano keyboard and learning to play piano.
Have you ever played Chopsticks? Certain pairs of notes start the melody, namely: (F,G), (E,G), (D,B), (C,C).
Make each of the eight keys, C to C, a slot and fill in with 1 or 0 depending on if pressed or not. Generate random sequences of eight 1s and 0s, like 01001000 or 10010001. Neither of these is a “chopstick pattern”. Those would be:
Add a ninth column, with a 1 if the pattern is a chopstick, with a 0 if it’s not, and feed the whole data set to a machine learning algorithm using the scikit-learn API.
I compare two such algorithms: K Nearest Neighbors (KNN), and a Multi-layer Perceptron Classifier (MLPC).
Because this is all in a Jupyter Notebook, the student is encouraged to grab it, trust it (a button push), and make changes. Try fine tuning the hyperparameters of the MLPC model why not?
This is a notebook to keep coming back to, as one’s knowledge expands. I’ve got a whole maze of notebooks to wander within. Students discover their own pathways, and are inspired to make new ones. Make your own maze, why not?
My final remarks mention “big data” and some of the tools one might use there.
I don’t get into Pytorch or TensorFlow in this notebook as it’s the concepts that matter and scikit-learn does a fine job with those.