Technical aspects of KA Lite: taking the web offline
We're currently experiencing what we're calling an "online learning revolution" — but what about the 65% of the world that can't take advantage of it? KA Lite is a lightweight web app for serving core Khan Academy content (videos and exercises) without needing internet connectivity, from a local server. This is the story of how the project came about, and why I think it's important.
For more details, or to get started with using KA Lite, please visit the KA Lite homepage.
This article is part 3 of a 3-part series (using zero-based indexing):
Part 1) What I was up to at Khan Academy this past summer: From Khanberry Pi to KA Lite
Part 2) Introducing KA Lite, an offline version of Khan Academy
Part 3) Technical aspects of KA Lite: taking the web offline
Taking the web offline
In addition to being highly motivational due to its potential for positive impact, working on this project has also been fun from a technological tinkering perspective.
- Needs to be able to run on a diverse range of operating systems and hardware (some of it very old or cobbled together)
- Wide variety of deployment scenarios (e.g. directly on end-user's computer, as a server in a lab, or in a roving tech van)
- Seamlessly updating content/software and synchronizing user data between computers with intermittent connectivity
- Gathering usage statistics to a central server, when possible, to inform organizational policy-making and reporting
All server-side code is in pure Python, with no non-Python dependencies, which means all required libraries (e.g. Django, requests, rsa) can be bundled up into a cross-platform package. Originally, it depended on a WSGI server such as Apache being installed, but through the magic of django-wsgiserver, even the server itself is now pure Python. Packaging the Python dependencies up in the same repository also has the advantages that 1) we never have to worry about dependency conflicts, and 2) updating the entire system to a new version with all dependencies in sync is as easy as "git pull" (which we can do P2P, when needed).
After developing everything on laptops, it was exciting to discover that the server ran smoothly on the Raspberry Pi, out of the box. With the addition of a USB Wifi adapter running in Access Point mode, and a low-current 5V power supply, this could make for a very inexpensive wireless server solution, which could be placed in a classroom where students connect to it using cheap tablets such as the Aakash, which is now available to students in India for ~$20.
KA Lite is designed to enable P2P adhoc syncing of database records between devices, or between devices and a central server, towards the goal of eventual consistency within a syncing zone, making devices and facilities effectively interchangeable from the point of view of the end-user (with user accounts and progress data kept in sync). This is accomplished through a public-private key system in which every syncable database record is hashed and signed using the private key of the originating device, and the signature is stored along with the database record. Self-signed device records (containing the device's public key) are passed around as well, so that receiving devices can verify the integrity and origin of all incoming records. The "zone" membership of devices is administered through a central server, which signs a certificate using its own private key, stating that a device belongs to a particular zone, which then allows that device to convince other devices in the same zone to sync with it. (Note that in the current iteration, syncing is only done via the central server, but the way records are stored will support P2P syncing through the same mechanism, once a few details have been worked out.)
There were some questions about how the public/private key negotiations work when setting up a device; here is a rough outline of the steps the system goes through:
- Admin signs up on central server, for an email-verified account.
- Admin creates an organization, and adds "zones" inside it.
- Admin installs the software on a distributed device (a couple step process), and in the process generates a local admin account.
- Device auto-generates a public/private RSA key pair.
- Admin signs into local device, and it instructs them to add the device to a zone on the central server, passing the local device's public key as a GET parameter. When the admin is logged in on the central server, she can then assign this public key to a particular zone (under her control).
- The local device can now negotiate with the central server, and prove that it belongs to that zone. It then downloads a record signed by the central server, which the device can use to prove zone membership to other devices.
- All records (user accounts, video/exercise progress, facility info, etc) from within that zone are now serialized and synced, in batches. The sync process is initiated at intervals by a Python thread running in the background.
Today, we're launching with Khan Academy content, but we have a broader vision of making the internet and its treasure trove of knowledge and community-building more available to the outermost fringes of the global network, and beyond. Imagine being able to subscribe to RSS feeds on an offline device that eventually will be populated with content via a chain of P2P contacts between syncing devices. Now, imagine an isolated community in Nepal being able to blog via the adhoc sneakernet, to a live website where it can bridge them with the global community, whose comments will be fed back through the adhoc network to the authors, allowing them to build a dialogue -- and have a voice.
Find out more about KA Lite, and signup for updates on the project homepage!
You can also read more blog posts about this project, by my co-conspirators:
Dylan Barth: KA-Lite: Khan Academy For The Other 70%
Matt O'Rourke: KA Lite: Bringing Education To Those Who Need It Most