Ideas Not Working Out (or WILL THEY?!?!?)

Well this is a bit discouraging. I was experimenting with implementing an ELK style EL classification oriented reasoner primarily using SQLite. I love SQLite and it is super cool. I want to do more with it. If you look at the algorithms in the Incredible ELK paper you see that they involve a fair number of joins and indexes, etc. They even discuss join order and reference some attempts to implement on a database. I figured using SQLite in memory databases could be a nice win.

Nope nope nope nope. Well, not yet. The problem is that it’s way to slow to interact with the database. Consider:

43393 loops done in 1.2992407480875652 minutes
    3224564 function calls in 77.975 seconds

    Ordered by: internal time

    ncalls tottime percall cumtime percall filename:lineno(function)
    183895 70.709 0.000 75.125 0.000 {method 'execute' of 'apsw.Cursor' objects}
...
    28951 0.079 0.000 34.495 0.001 calf.py:443(in_subs)

I’m testing for whether a subsumption in my todo list is in the closure. I’m using a query:

SELECT 1 FROM subs 
WHERE subclass = ? and superclass = ?

Ok that should probably be an EXISTS, but still. It’s called a fair bit (28,951 times!).

My subsumptions here are just binary tuples of integers, so I can use a Python set to track ’em easily enough. And here’s the results:

43393 loops done in 0.6202206969261169 minutes
    2819244 function calls in 37.231 seconds

    Ordered by: internal time

    ncalls tottime percall cumtime percall filename:lineno(function)
    154944 32.796 0.000 35.274 0.000 {method 'execute' of 'apsw.Cursor' objects}
...
    28951 0.025 0.000 0.053 0.000 calf.py:443(in_subs)

Well that leaves a mark.

It’s an in memory database and I’ve dorked with the transactions and no joy.

Now it’s not hopeless maybe. Consider these:

  22592   0.320 SELECT ?, superclass FROM concIncs  WHERE concIncs.subclass = ?
  22591   0.302 SELECT type, arg1, arg2 FROM IdxConcept WHERE id=?
  22592   0.180 INSERT INTO subs VALUES (?, ?)
  14442   0.141 SELECT 1 FROM inits WHERE id = ?

Same order of magnitude of queries. Sometimes returning more data. The INSERTs into subs are pretty fast. Is it just the conjunction?

Oh crap. I didn’t have a primary key on subs. It’s compound and I forgot…D’OH! Let’s fix that and…

43393 loops done in 0.14656875133514405 minutes
         3224564 function calls in 8.811 seconds

   Ordered by: internal time

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
   183895    6.106    0.000    7.675    0.000 {method 'execute' of 'apsw.Cursor' objects}
...
   28951     0.033    0.000    0.462    0.000 calf.py:443(in_subs)

Well, that’s much better! Game on! The links query is still a bit slow:

  22592  4.977 SELECT links.subclass as E, negExists.existential as F
               FROM  (links JOIN hier as h1 ON links.role = h1.subrole),
                     (hier as h2 JOIN negExists ON h2.superrole = negExists.role)
               WHERE links.filler = ? AND negExists.filler = ?

No obvious issues. I’m unclear whether an index will help…but stilL! Back in the game!

Advertisements

Two Months of Nearly Daily Blogging to Go

I just might make it.

It’s been touch and go with classes and illness and feeling overwhelmed. I’ve not had the energy to write enough analytical stuff. My backlog there is growing.

I did fix a superfun bug in my exam authoring tool. It is a classic “dynamic typing plus truthiness” bug that gets static typing folks very excited.

I have a Python object that represents true-false questions. It inherits from an abstract question object. So far, fine. It is parsable from and printable to a simple format eg:

Q: This sentence is true!
+ F
Marks: 1
By: BJP

(Yes the sentence is true if you assume it’s true and false if you assume it’s false.)

This is fine until I used the “shuffle” option which reorganises the exam and question options. What I was getting out was:

Q: This sentence is true!
+ T
Marks: 1
By: BJP

This is bad! And has nothing to do with shuffling. The culprit looked like this:

options = '+ T' if self.key else '+ F'

This would have worked if the key was a Boolean. But it was a string: “True” or “False”. Any non empty string is truthy so the else never gets taken.

D’oh!

Either types on variables or no truthiness would have flushed this out. It was damned hard to spot esp when I wrote tests that constructed the object directly and set the key to a Boolean. (Unit testing didn’t work! It needed an integration test at least.)

Oy!

Two months to go…

All About A*

Amit Patel’s multi part discussion of A* (esp in games) is a very nice read. Even if you just read the first page, you’ll get a clear picture of various pathfinding algorithms and their trade offs. So later bits aren’t quite as nice to follow (e.g., the trade offs between terms in the heuristic could be fruitfully visually indicated), but overall, it’s great.

It’s part of a series teaching math and computer science via games. Not necessarily wildly original in concept (cf AI: A Modern Approach), but here execution is everything. Check out the discussion of pathfinding for a tower defence game.

PythonTeX

I went down the rabbit hole trying to code up some simple class data analysis tools. I have a custom script for exam results which uses Jinja2 as a template language and generates a Markdown doc. It’s alright, but I figured there must be something ready made that wasn’t a “note book” and could generate documents and reveal.js slides.

Nope. Not so I could use.

I had knitpy in a tab for forever so I though I’d give a spin. It seems both moribund and broken for all my struggling could demonstrate.

I poked at Pweave. I don’t know if I just lost the will to live or I really couldn’t get it working. Either way I was spending more time looking at the tool than pulling in and massaging the data, so I gave up.

Along the way I came across a paper on PythonTeX which has some interesting arguments:

PythonTeX makes possible reproducible LaTeX documents in which Python, Ruby, and Julia code is embedded. This can mitigate the potential for copy-and-paste errors, simplify the creation of figures, and allow significant portions of documents to be generated automatically. Built-in utilities for dependency tracking reduce the need for makefiles or similar systems. Tight LaTeX integration allows code to adapt to its context in a document. User-defined sessions that run in parallel provide high performance. Synchronization of errors and warnings with document line numbers mean that writing and coding remain efficient. Finally, the depythontex utility ensures that PythonTeX may be used even when plain LaTeX documents are needed.

In particular, they argue that tight integration with the host language is an advantage as you can more easily pass data back and forth:

  • As a LaTeX package, PythonTeX allows users to write valid LaTeX documents in which the full power of LaTeX is immediately accessible, rather than hybrid documents that contain LaTeX markup. Unlike in IPython and Pweave, it is simple and straightforward to pass information such as page dimensions from LaTeX into Python. It is even possible to create LaTeX macros that mix LaTeX with other languages.

I think it may be possible to abstract some of that out. I don’t see a strong need for super tight integration to get most of this. But who knows? It’s worth exploring.

Oh Mistune

I am about ready to give up on you.

I gained control of your renderers and they work great! Yay!

Lexers, grammars, all the parsing is a maze of twisty passages where I bark my shins constantly and make no progress.

I tried to enable math support which is a standard contributions and…it’s still not working. I’m going to try this other thing now. As far as I can tell, no one uses the “mistune-contribs” version.

I think this signals the end for me. I’ll have to migrate to something else more managable.

Python Class and Instance Methods Share a Namespace

Which sucks.

Coming from Smalltalk, I expect a lot of polymorphism. Class/instance polymorphism seems pretty obvious…if I send a message to a class it goes to the class for handling. If I send a message to an instance it goes to the instance for handling. the methods can be different for each.

This is really useful for cases where much of the time you have a “fire and forget” method (e.g., serialising an object) for which you want to hide the effort of dorking with an instance in the normal case but sometimes you want to reuse the object. This just happened to me!

class BlackboardDelimitedSerialiser(ExamVisitor):
    def serialise_to(self, exam, path):
        with path.open('r') as f:
            writer = csv.writer(f, delimiter='\t', dialect='excel', quoting=csv.QUOTE_NONE, escapechar='"')
            for q_row in self.visit(exam):
                writer.write(q_row)

To use this as a one off I have to first instantiate the object. So something like:

bbserialiser = BlackboardDelimitedSerialiser()
bbserialiser.serialise_to(exam, path)

Since in my command line tool, I just do this serialisation once, I hoped to add a convenience class method:

class BlackboardDelimitedSerialiser(ExamVisitor):
    @classmethod
    def serialise_to(cls, exam, path):
        serialiser = cls()
        serialiser.serialise_to(exam, path)

    def serialise_to(self, exam, path):
        with path.open('r') as f:
            writer = csv.writer(f, delimiter='\t', dialect='excel', quoting=csv.QUOTE_NONE, escapechar='"')
            for q_row in self.visit(exam):
                writer.write(q_row)

So my call site would look like:

BlackboardDelimitedSerialiser.serialise_to(exam, path)

But all methods share a namespace so the instance method (which is lexically last) wins.

Boo! I don’t want the names to be different! I started hacking on clever, horrible tricks when I realised that I was being completely daft for this case. If I just add instantiating parens, the call site looks like:

BlackboardDelimitedSerialiser().serialise_to(exam, path)

So my use case was bad. There’s no need for a class method so better not to have the extra method. This won’t be true for other cases (where I need some extra logic in the class method), but let’s not borrow trouble. After all, it’s not even clear that I’ll ever need to reuse the object, so having the class method and an uglier name for the instance method would be fine.

Terminal Funkinesses

Two terminal graphicsy thingies crossed my radar:

  1. Brow.sh, a terminal front end for the Web. It uses a headless browser backend in a separate process and renders everything to the terminal even graphics!
  2. On a much smaller scale, there’s termgraph.py, a library for rendering bar graphs to the terminal.

I’m pretty excited about the second because, as I’ve whined before, getting a simple graph out of Python is way more difficult than it should be.