I spent quite a bit of my career operating in the not-for-profit sector, which included a lot of technology-challenged, yet mission critical operations, from the point of view of those they served. This was in the closing decades of the previous century, when free and open software was just getting established. Whereas many of these non-profits were using proprietary software, we were starting to see the downward pressure on prices exerted by such packages as Open Office, later rename Office Libre.
The grant-giving organizations, at the core of this sector, were especially frustrated to see their limited funds financing copy after copy of the same office software. Databases were proportionately more expensive. Meyer Memorial Trust, in our neighborhood, was especially keen to have applicants use copyleft tools whenever possible, in the name of doing the greatest good. To this end, the trust undertook to “eat its own dog food” which meant replacing a proprietary grant applicant processing system in Filemaker, with something more relational, written in Perl or one of those “P-languages” from the LAMP stack days (LAMP stands for Linux Apache MySQL, and any P-language: Perl, Python and of course PHP).
The same pressure on the non-profits to avoid expensive software was operative in academia as well. Those with a technical and engineering background were especially singled out, as surely these people would be a part of the open software revolution. They could write their own tools, right? Many of them did. That’s how free and open source software got started: people who coded for a living decided to stop paying for at least some of the tools, and instead work together on making and sharing their own. With digital IP, such licensing arrangements are especially sane, given the infinite copyability, without degradation, of the original.
When I flew to Baltimore that time, to train the Hubble space telescope instrumentation team in Python, the economics became clear to me. The astronomers wanted to share raw data and the signal processing pipelines that resulted in more meaningful visualizations. However, their code was written in a proprietary language which university departments could scarcely afford. Spreading their computation intensive infrastructure in some affordable form, to collaborators and peers, thereby developing their natural constituency (think future funding), had become an existential requirement. Python, and especially numpy (a third party free package), was starting to make inroads, thanks to the astronomers themselves. That community had some talented coders on board.
Jake Vanderplas picks up the story in his Pycon keynote of 2017. Python was helping to save the day in many academic circles. He spoke for the astronomers. By this time we had our Jupyter Notebooks and JupyterLabs. Jake, with the University of Washington, dove into evangelizing around these tools, in the form of excellent on-line and in-person tutorials and workshops. The positive synergy continued making waves, to where the Atlantic Monthly caught wind of these developments, and published The Scientific Paper is Obsolete in April, 2018 by James Somers.
What was James on about in this article? For a long time now, academics have dreamed of a “notebook” format in which the mathematical computations were “live”. Instead of a slim diet of inert formulae, a scientific publication could hook to an “engine” or “kernel” and carry out its number crunching internally to the presentation. The article correctly points out that this dream had come true for Mathemetica users. Wolfram had delivered on the promise of “literate programming” meaning the code would be embedded within quality prose with pictures, a journal-quality publication in some cases. However Mathematica was proprietary and we’ve already seen the heavy pressure to economize.
In the next chapter, the software revolution entered a next phase. Breakthroughs in machine learning made it feasible to harvest meaningful results from “big data” or such was the promise. “Big data” meant server farms and companies such as Google and Amazon were getting a lot of their revenue from selling server space i.e. storage. Clients were tempted to move their data to these services not only because of the socialized costs, but because of the software ecosystems offered, in many cases for free. You might learn Google’s TensorFlow package using small data on a laptop, but then your company would upload terabytes of data for “at scale” GPU processing. We were now in the era of micro and cloud services, which features both containerization and virtualization. Amazon had its own tool suite, or you could run TensorFlow there too. Facebook offered PyTorch. Microsoft could run everything on Azure. IBM was in there too.
Behind these machine learning tools, which come towards the end of the pipeline, I should mention map-reduce and the Apache Foundation tool chains, along with Hadoop. These tools work directly with big data. The goal is to push this data through the machine learning algorithms to come up with speech and facial recognition models, and all manner of useful decision-making aides, collectively known as “AI” (artificial intelligence).
Now that we’ve reviewed a progression, which pushed a lot of open source to the cloud, in service of data science, lets talk about curriculum and some other cultural matters.
Speaking of ARM, I haven’t yet mentioned one of the signature changes that came with the new millennium: the proliferation of smartphones around the world. The Android system developed by Google is written in Java and runs faster on smartphones with ARM chips. As corporations pushed governments for greater protections against competition from around the world, the high tech trade wars started escalating. As of 2019, we’re teetering on the brink of a global recession, brought on by deteriorating international relations among the “virtual nations” (our newest global “super citizens” the corporate persons).
Back to code schools: the emphasis on Python in data science shows no signs of abating in the near future, given the robustness of the tools and their free, open source nature. Python has a liberal license that allows for proprietary products to be made with it, whereas the costs associated with learning the language itself stay low. Aspiring careerists tackle Python and data science on their own time, making use of both free and tuition-charging courses. The pressure to learn Python, and to use Jupyter Notebooks, is having ripple effects through universities, and by extension to their feeder schools, the prep academies. Most public high schools have not kept up, but they feel the same pressures. Instead of redesigning the mathematics curriculum, the high schools have opted to enhance computer science with the new material.
At the Oregon Curriculum Network website, I chronicle my first few chapters as a curriculum developer, branching out to Github as we approach the more recent years. How my curriculum has developed will take more stories to explain, and the precise twists and turns may not be that relevant. What I’m establishing here is that I’m aware of my environment, and so the work I’m doing is not in some proverbial vacuum. I’m well equipped to share about Python and the AI revolution more generally. However my approach to object oriented programming is through “math objects” and not specifically computer science. In particular, I introduce polyhedrons as a bridging concept, as they’re literally objects, if abstract, and so feed our imaginations even as we learn to program.
Polyhedrons, lets remember, are also “graphs” in the sense of wire frames. Juxtaposed with Planet Earth, polyhedrons provide grid patterns for organizing and displaying geospatial data. Developing some fluency with spatial geometry, while coding with data sets at the same time, is putting students where they need to be, in terms of having relevant skill sets. The focus on geospatial data and planet Earth is what the ESRI products have been about, with Python used internally. The Oregon Curriculum Network is well-positioned at a confluence of these many trends.
Those of you already well schooled in linear algebra may see the logic behind a polyhedrons-intensive approach. Our vector spaces tend to use a Euclidean metric to measure distance, and our multi-dimensional data sets point to the space of the Euclidean polytopes. Regular Polytopes, by the late H.S.M Coxeter (University of Toronto), is the classic in this field. Linear algebra is deeply imbued with the language of spatial geometry in other words. A “vector space” is indeed spatial, even when n-dimensional.
The OCN curriculum is lucid about the “dimension” concept in part because of the company behind it: 4D Solutions. The 4D meme took off around the turn of the last century, including in the art world. I’ve done a lot of research on this topic and developed some interesting lectures based around what I’ve found. I’ve been sharing some of these results on Youtube, always with a tip of the hat to Coxeter and the linear algebra of vector spaces. Much of my material is suitable for high schoolers and even middle schoolers.
I’ve done numerous pilots. The current work is to share these ideas with a wider group of math and literature teachers. I’ve been working with the literature and history curriculum most specifically, as Oregon connects to the spatial geometry story through Linus Pauling especially.
Linus Pauling, the only person in history to win two unshared Nobel Prizes, helped get contemporary chemistry off the ground, with its focus on the structure of organic molecules, macro-molecules in particular. At the core of organic chemistry is of course carbon, and in the mid 1980s we saw a major breakthrough: the discover of C60, or buckminsterfullerene. Sixty carbon atoms form a network of only hexagons and pentagons, the form of a soccer ball. This and other fullerenes form spontaneously, but require advanced processes to isolate and purify. Nanotubes (“buckytubes”) have much the same story, and then came graphene. Oregon State saw the writing on the wall and dove into nano-technology big time, occupying a former Hewlett-Packard building. Nano-technology is all about spatial geometry, and the volume to surface area relationship. The Santa Fe Institute shares dovetailing curriculum materials, relating properties to scale (size).
Pauling’s peace work had to do with making clearer to people the true costs of messing around with radio-toxic isotopes in a cavalier manner. The military had been using itself as guinea pigs, as well as Marshall Islanders and the people of Nevada. The results had been coming in and were sobering. A lot of vested players wanted to keep a lid on that story, but the hard science was not that difficult to follow. Another Silicon Forest based physicist, professor Rudi Nussbaum, had been a part of the same effort.
Might we model a soccer ball or C60 using object oriented Python? Might we do so in a Jupyter Notebook? I’m introducing this approach on Github, with a generic Polyhedron class at the top of my hierarchy. With a polyhedron come properties, attributes, such as the number of vertexes, facets, and edges. Such as volume and surface area. The vertexes may be specified with vectors. We’re doing mathematics, and learning to code, and creating a basis for further linear algebra down the road.
Speaking of linear algebra, it’s very true that a vector represents a point in some n-dimensional space with n likely above three. A first confusion to clear up is we don’t need numpy data arrays of more than two dimensions to capture n-dimensional data. Each row vector, of however many columns, is a sample. The number of columns may be the number of dimensions in a linear algebra sense, but the data structure needed to hold these vectors is always 2D. Sometimes students get spooked into thinking they’ll need a lot more axes than are actually required, from a numpy standpoint.
Partly why I bring this up is to underline the importance of the “dimension” concept, and the need to disambiguate this term in order to have it make sense in the “language game” (cite Wittgenstein, Remarks on the Foundations of Mathematics).