After creating a short-list of languages and frameworks to be evaluated (see here for the first installment in this series) the next step is going to be a lot harder. It involves getting my hands dirty and at a minimum going through the website of the framework, looking at each in turn to see if they actually match the criteria and then to decide whether or not to build a ‘toy’ application in that framework or not. Time consuming and annoying because it requires you to learn the essentials of each and every framework involved. On the other hand, there is no substitute for hard work in this case, you can’t really pass judgment on anything before you have become familiar with it. The previous article in this series sparked a ton of interest and I got a large amount of very useful (and some not so useful) feedback.
Quite a few people addressed the ‘what framework to use’ question with arguments rather than agenda pushing or religion, corrected a metric ton of mistakes in the previous post and helped to guide me on my quest. To those people I’m extremely grateful for their hard work, every word got read and will be integrated into the revision of the original post. This has caused some changes to the set of languages/frameworks still under consideration.
Toy Application
Because I want to compare the frameworks in as realistic a setting as possible I’ll be building a real application with all of the ones that made it to the short-list.
The toy application will be the thing this all started with, a flash-card (or Spaced Repetition application. It will not contain any user interface components related to creating cards, the database will be ‘magically’ populated with a series of flashcards grouped into several decks that you could use to study/practice some subject. New users should be able to create accounts and select decks they want to study, and the studying process should work as well.
By the way, studying a language over a long-distance medium is not exactly new. In the 1950’s you could already learn languages over the phone!
The schema for the toy application to evaluate the various webframeworks is (relatively) simple. Iteration on the design starts before the first line of code is written. This is the 4th or so revision of what I came up with initially (which was totally different and way too rigid). Some days I feel like I’m in the Monty Python ‘Spanish Inquisition’ sketch. Every time I finish a sentence I have to back up to correct it without ever getting out of the loop and making the sentences longer/different every time!
Apologies for stealing 8 minutes of your life there. And that ‘toy’ application is starting to look pretty complex.
Database Schema
So, the final (hah!) version of the human readable version of the schema looks like this:
We have ‘users’, who use the application to learn stuff and to review and refresh what they learned afterwards.
We have ‘cards’ that have a front and a back side representing the ‘question’ and the desired ‘answer’ to the question.
Cards can be tagged, 0 or more tags are allowed, these are stored in the card separated by ‘/’ characters.
The format of the fronts and backs is json encoded strings, the interpretation of which is left up to the application. Originally I experimented with fixed fields, s-expressions and other trickery but after a short while I realized that since json is now more or less an accepted standard and present as a library module for serialization and deserialization in almost every language I would want to use that this would save me the time of having to implement/maintain a parser for the data chunks.
The trick with this field is that because it is just a text string unforeseen future developments (multimedia, for instance) can be tacked on without re-doing the whole application so it is fairly future proof. All I need to do is come up with a reasonable interpretation of the data. And if the change would be so drastic that the json format itself would need revision then that could be fairly easily done using a background job without having to take the whole thing down. It also allows for easy generation of huge amounts of cards from permutations of the various bits and pieces.
Cards have a difficulty associated with them and you’re supposed to master one level of difficulty before you’re allowed to move on to the next, this ensures that cards that build on knowledge from simpler cards will not be presented until the simpler stuff is fully grokked.
A deck is simply a pre-programmed filter that lifts out cards with certain (combinations of) tags. For instance, this filter would select for an English speaker all the cards that teach you about numbers in Romanian: “/en-ro/ & /numbers/” (note that the ‘numbers’ is in the language of the user, the ‘en’ in the first part of the tag). This also allows for quick, impromptu and unforeseen course creation (such as: learn to count to ten in all different languages: “/numbers/ & /upto10/“)
Every exposure of a card is logged, as is how long it took to get an answer back and whether or not that was the correct answer.
A user can have none, some or all of the cards of any given course in play and has the capability to de-activate specific cards.
Users acquire cards as they learn, new cards are added if fewer than a user configurable number of cards are in the ‘learning’ phase.
Cards can be reversed in a deck, but only if the card allows it.
You could pull really nifty tricks in interpreted languages, for instance, you could make a front contain code that generated a different card every time it was exposed, or have the cards adapt to circumstances (for instance, a card that explains ‘today’ could always use the actual day as an example). For a compiled language such tricks are not so easy, you’d always have to bundle some kind of interpreter along. Or, alternatively, you could assume the display environment to be capable of doing this, and embed a chunk of javascript in the code! We’re not going down that route though (because we also want to do mobile at some point).
For the future, multimedia (audio, pictures, video, LaTex, html, whatever else you can come up with) could be added to the cards.
So, for the first time ever I used mysql-workbench to create a database schema. I wished I could rave about how easy it was and how well it worked to do this simple task. But truth be told I’ve never ever seen such a piece of junk. It crashed a great many times, user interface components are all over the place, it is totally counter-intuitive and makes something simple much harder than typing in the commands by hand. The save dialog had the ‘save’ button off-screen, and it happily creates broken SQL during the export… Incredible that a company like Oracle would let a thing like this pass Q&A with their name on it.
Anyway, after multiple attempts the job was done (count 4 hours or so, not bad for a handful of tables).
Here is the machine readable version of the schema:
-- MySQL dump 10.13 Distrib 5.5.37, for debian-linux-gnu (i686)
--
-- Host: localhost Database: srs
-- ------------------------------------------------------
-- Server version 5.5.37-0ubuntu0.14.04.1
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
/*!40103 SET TIME_ZONE='+00:00' */;
/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;
--
-- Table structure for table `cards`
--
DROP TABLE IF EXISTS `cards`;
/*!40101 SET @saved_cs_client = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `cards` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`creator_id` int(10) unsigned NOT NULL COMMENT 'who made the card\n',
`reversible` int(1) DEFAULT '0' COMMENT 'not all cards can be reversed\n',
`front` mediumtext COLLATE utf8_bin NOT NULL,
`back` mediumtext COLLATE utf8_bin NOT NULL,
`difficulty` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'difficulty is a number that starts at ''0'' and ascends as far as required to order cards. Two cards can be of the same difficulty (they their order of appearance is not guaranteed).\n',
`source` int(10) unsigned NOT NULL DEFAULT '0',
`prompt` int(10) unsigned NOT NULL COMMENT 'which prompt to use\n',
`tags` varchar(256) COLLATE utf8_bin DEFAULT NULL,
`prerequisites` int(10) unsigned NOT NULL COMMENT 'up to and including which card should be learned before the user can be exposed to this one.\n\n',
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`),
KEY `source` (`source`),
KEY `creator_id` (`creator_id`),
CONSTRAINT `creator_id` FOREIGN KEY (`creator_id`) REFERENCES `users` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `source` FOREIGN KEY (`source`) REFERENCES `sourcedata` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=1505 DEFAULT CHARSET=utf8 COLLATE=utf8_bin COMMENT='cards are the ''units of learning'' in the system, cards are t /* comment truncated */ /*agged, the first tag is the name of the deck. If a user makes a card for themselves then it starts out without tags.\n\n*/';
/*!40101 SET character_set_client = @saved_cs_client */;
--
-- Table structure for table `courses`
--
DROP TABLE IF EXISTS `courses`;
/*!40101 SET @saved_cs_client = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `courses` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`description` varchar(250) COLLATE utf8_bin DEFAULT NULL COMMENT 'A description of the course, if it is a language course it should be in the language of the student\n',
`tag` varchar(45) COLLATE utf8_bin DEFAULT NULL COMMENT 'the unique tag identifying cards in this course',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8 COLLATE=utf8_bin COMMENT='a course is something a user could follow, for instance "Rom /* comment truncated */ /*anian for English speakers", using the unique tag "en-ro" for the cards.\n\n*/';
/*!40101 SET character_set_client = @saved_cs_client */;
--
-- Table structure for table `exposures`
--
DROP TABLE IF EXISTS `exposures`;
/*!40101 SET @saved_cs_client = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `exposures` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`side` int(11) DEFAULT NULL,
`timestamp` int(10) unsigned NOT NULL DEFAULT '0',
`duration` int(11) NOT NULL DEFAULT '0',
`user_id` int(10) unsigned NOT NULL,
`card_id` int(10) unsigned NOT NULL,
`answer` varchar(256) COLLATE utf8_bin NOT NULL COMMENT 'as long as the answer is null there was no answer from the user for this card\n',
`score` int(10) unsigned NOT NULL DEFAULT '0',
`grade` int(11) DEFAULT '0',
PRIMARY KEY (`id`),
KEY `fk_exposures_users1_idx` (`user_id`),
KEY `card_id_idx` (`card_id`),
KEY `grade` (`grade`),
CONSTRAINT `card_id` FOREIGN KEY (`card_id`) REFERENCES `cards` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `user_id` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=481 DEFAULT CHARSET=utf8 COLLATE=utf8_bin COMMENT='an exposure is a single showing of a card to a user, and wh /* comment truncated */ /*at their response was*/';
/*!40101 SET character_set_client = @saved_cs_client */;
--
-- Table structure for table `user_cards`
--
DROP TABLE IF EXISTS `user_cards`;
/*!40101 SET @saved_cs_client = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `user_cards` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`user_id` int(10) unsigned NOT NULL,
`card_id` int(10) unsigned NOT NULL DEFAULT '0',
`first` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'first time this card was exposed\n',
`last` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'last time this card was exposed\n',
`next` int(11) NOT NULL DEFAULT '0' COMMENT 'next time this card should be exposed',
`interval` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'the interval between showings\n',
`reversed` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'reverse the card or not on viewing\n',
`exposures` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'times the card has been exposed\n',
`correct` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'times the user got it right\n',
`wrong` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'times the user got it wrong\n',
`score` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'score is in percents, so 10 exposures correct would yield a score of 1000\n',
`state` int(10) unsigned NOT NULL DEFAULT '1' COMMENT '0: inactive\n1: active\n2: ignored\n\n',
`learning` int(10) unsigned NOT NULL DEFAULT '0' COMMENT 'card is currently considered to be in the learning phase or if it should count as ''learned'', this counts down from a number to ''0''\n\n',
`tags` varchar(250) COLLATE utf8_bin DEFAULT NULL COMMENT 'initially translated to the language of the user, these are copies of the tags the cards are created with, but the user can modify them\n',
`difficulty` float DEFAULT '0',
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`),
KEY `user_id_idx` (`user_id`),
KEY `card_id_idx` (`card_id`),
CONSTRAINT `user_cards_card_id` FOREIGN KEY (`card_id`) REFERENCES `cards` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `user_cards_user_id` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=77 DEFAULT CHARSET=utf8 COLLATE=utf8_bin COMMENT='user cards are all cards in play for a user at this moment, /* comment truncated */ /*they are in verification or learning mode or have been set to be ignored\n*/';
/*!40101 SET character_set_client = @saved_cs_client */;
--
-- Table structure for table `user_courses`
--
DROP TABLE IF EXISTS `user_courses`;
/*!40101 SET @saved_cs_client = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `user_courses` (
`user_id` int(10) unsigned NOT NULL,
`course_id` int(10) unsigned NOT NULL,
`score` int(11) NOT NULL DEFAULT '0',
PRIMARY KEY (`user_id`,`course_id`),
KEY `fk_user_courses_users1_idx` (`user_id`),
KEY `fk_user_courses_courses1_idx` (`course_id`),
CONSTRAINT `fk_user_courses_courses1` FOREIGN KEY (`course_id`) REFERENCES `courses` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
CONSTRAINT `fk_user_courses_users1` FOREIGN KEY (`user_id`) REFERENCES `users` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COMMENT='a user_course is a course that a user has started on';
/*!40101 SET character_set_client = @saved_cs_client */;
--
-- Table structure for table `users`
--
DROP TABLE IF EXISTS `users`;
/*!40101 SET @saved_cs_client = @@character_set_client */;
/*!40101 SET character_set_client = utf8 */;
CREATE TABLE `users` (
`id` int(10) unsigned NOT NULL AUTO_INCREMENT,
`nick` varchar(45) COLLATE utf8_bin NOT NULL,
`pwhash` varchar(250) COLLATE utf8_bin NOT NULL,
`score` int(11) DEFAULT '0',
`email` varchar(250) COLLATE utf8_bin NOT NULL,
`newcardsperday` int(11) DEFAULT '20',
`cardspersession` int(11) DEFAULT '25',
`decay` int(11) DEFAULT '3',
`language` char(2) COLLATE utf8_bin DEFAULT 'en',
`initialinterval` int(11) DEFAULT '5',
PRIMARY KEY (`id`),
UNIQUE KEY `id_UNIQUE` (`id`),
UNIQUE KEY `nick_UNIQUE` (`nick`),
UNIQUE KEY `email_UNIQUE` (`email`)
) ENGINE=InnoDB AUTO_INCREMENT=3 DEFAULT CHARSET=utf8 COLLATE=utf8_bin COMMENT='users are people that use the system\n';
/*!40101 SET character_set_client = @saved_cs_client */;
/*!40103 SET TIME_ZONE=@OLD_TIME_ZONE */;
/*!40101 SET SQL_MODE=@OLD_SQL_MODE */;
/*!40014 SET FOREIGN_KEY_CHECKS=@OLD_FOREIGN_KEY_CHECKS */;
/*!40014 SET UNIQUE_CHECKS=@OLD_UNIQUE_CHECKS */;
/*!40101 SET CHARACTER_SET_CLIENT=@OLD_CHARACTER_SET_CLIENT */;
/*!40101 SET CHARACTER_SET_RESULTS=@OLD_CHARACTER_SET_RESULTS */;
/*!40101 SET COLLATION_CONNECTION=@OLD_COLLATION_CONNECTION */;
/*!40111 SET SQL_NOTES=@OLD_SQL_NOTES */;
-- Dump completed on "2014-05-27"
SRS Algorithm
I’m going to do the actual SRS stuff based on ‘exponential back-off’. If you respond to a question immediately and get it right then you earn a 5x increase in schedule interval. If you get it right but after a longer pause then the scheduling interval remains unchanged, get it almost right (within 1 character of the right answer) (this only works for languages!!) the interval will be reduced by a factor of 5, get it completely wrong and you start all over again from the initial interval (set to 5 seconds for now).
Intervals below 5 seconds will be reset to 5 seconds.
So this makes the scheduled intervals 5, 25, 125, 625, 3125, 15625, 78125 seconds etc. So after answering right 7 times in a row you’re on a rough schedule of reviewing a card in another day. Of course people will use this whenever they feel like so we need to make sure that we don’t overwhelm them with a backlog of cards to review, and we need to present those cards that are most likely to be forgotten first.
To allow some personalization we can make the ‘factor’ (5 in this case) configurable on a per-user basis, maybe automatically adjust it for users that score particularly good (or bad) once we have enough data to figure out how well they retain information.
I don’t like the concept of ‘lessons’, so I’ll drop that entirely, you’re ‘learning’ a deck or you’re doing something else, you can learn as long as you want and you can ‘pause’ or ‘stop’ the process at any point in time.
I don’t want users to ‘grade themselves’, the program should do any and all grading completely automatic (this creates some very interesting challenges later on when audio support gets added). And I want people to be able to start learning without being logged in the first time they hit the site.
I’m keeping all this information nicely organized in a Leo Outline Editor file, Leo is absolutely fantastic and I wished there was a version of it that used a centralized docs store.
Command-line version
The simplest framework is of course none at all.
To make sure I understand all the business logic involved and have a working knowledge of the whole process I first built a command line version of the program playing off the same ‘magically populated’ database. That way I can very quickly test stuff without having to bother with layouts, RPC, servers, client server issues and other complications. An interactive command line version of this is about as simple as it gets (and can be built in a few hours if you have some experience).
So, that seems to work. And if it is ugly/buggy that’s fine, it did what it was supposed to do, bring out the flaws in my reasoning about the data model.
The Stack
If you’re reading this far you no doubt have an excellent understanding of web development, but just in case you don’t: The web is an ugly thing these days. What used to be a single binary serving up some static files from a directory on a server has turned into a whole slew of tecnologies, software, standards and options. Collectively, any set of specific choices from all those tools is called a ‘stack’. Typically, a stack will include an operating system, some sort of database for persistence, a back-end language (server side stuff), a front end language (typically JavaScript or something that compiles down to JavaScript), a web server (for instance, Apache, nginx) or ‘application server’, and some bits and pieces to glue it all together. The reason for this hodgepodge of loosely connected bits of tech is that we’re trying to shoehorn application interaction across a medium designed to serve static files on a fire-and-forget basis.
In my case, only the operating system (Linux) and the database (MySQL) are set in stone, the rest is variable. I do have an apache installation on this development machine so if any of the language/framework combos that we will be looking at can run under apache I’ll definitely try to get that to work since it stops me from having to learn how to configure yet another web server. I’m a lazy person.
Next up, the language/framework combos.