Two Months of Nearly Daily Blogging to Go

I just might make it.

It’s been touch and go with classes and illness and feeling overwhelmed. I’ve not had the energy to write enough analytical stuff. My backlog there is growing.

I did fix a superfun bug in my exam authoring tool. It is a classic “dynamic typing plus truthiness” bug that gets static typing folks very excited.

I have a Python object that represents true-false questions. It inherits from an abstract question object. So far, fine. It is parsable from and printable to a simple format eg:

Q: This sentence is true!
+ F
Marks: 1
By: BJP

(Yes the sentence is true if you assume it’s true and false if you assume it’s false.)

This is fine until I used the “shuffle” option which reorganises the exam and question options. What I was getting out was:

Q: This sentence is true!
+ T
Marks: 1
By: BJP

This is bad! And has nothing to do with shuffling. The culprit looked like this:

options = '+ T' if self.key else '+ F'

This would have worked if the key was a Boolean. But it was a string: “True” or “False”. Any non empty string is truthy so the else never gets taken.

D’oh!

Either types on variables or no truthiness would have flushed this out. It was damned hard to spot esp when I wrote tests that constructed the object directly and set the key to a Boolean. (Unit testing didn’t work! It needed an integration test at least.)

Oy!

Two months to go…

Advertisements

All About A*

Amit Patel’s multi part discussion of A* (esp in games) is a very nice read. Even if you just read the first page, you’ll get a clear picture of various pathfinding algorithms and their trade offs. So later bits aren’t quite as nice to follow (e.g., the trade offs between terms in the heuristic could be fruitfully visually indicated), but overall, it’s great.

It’s part of a series teaching math and computer science via games. Not necessarily wildly original in concept (cf AI: A Modern Approach), but here execution is everything. Check out the discussion of pathfinding for a tower defence game.

PythonTeX

I went down the rabbit hole trying to code up some simple class data analysis tools. I have a custom script for exam results which uses Jinja2 as a template language and generates a Markdown doc. It’s alright, but I figured there must be something ready made that wasn’t a “note book” and could generate documents and reveal.js slides.

Nope. Not so I could use.

I had knitpy in a tab for forever so I though I’d give a spin. It seems both moribund and broken for all my struggling could demonstrate.

I poked at Pweave. I don’t know if I just lost the will to live or I really couldn’t get it working. Either way I was spending more time looking at the tool than pulling in and massaging the data, so I gave up.

Along the way I came across a paper on PythonTeX which has some interesting arguments:

PythonTeX makes possible reproducible LaTeX documents in which Python, Ruby, and Julia code is embedded. This can mitigate the potential for copy-and-paste errors, simplify the creation of figures, and allow significant portions of documents to be generated automatically. Built-in utilities for dependency tracking reduce the need for makefiles or similar systems. Tight LaTeX integration allows code to adapt to its context in a document. User-defined sessions that run in parallel provide high performance. Synchronization of errors and warnings with document line numbers mean that writing and coding remain efficient. Finally, the depythontex utility ensures that PythonTeX may be used even when plain LaTeX documents are needed.

In particular, they argue that tight integration with the host language is an advantage as you can more easily pass data back and forth:

  • As a LaTeX package, PythonTeX allows users to write valid LaTeX documents in which the full power of LaTeX is immediately accessible, rather than hybrid documents that contain LaTeX markup. Unlike in IPython and Pweave, it is simple and straightforward to pass information such as page dimensions from LaTeX into Python. It is even possible to create LaTeX macros that mix LaTeX with other languages.

I think it may be possible to abstract some of that out. I don’t see a strong need for super tight integration to get most of this. But who knows? It’s worth exploring.

Oh Mistune

I am about ready to give up on you.

I gained control of your renderers and they work great! Yay!

Lexers, grammars, all the parsing is a maze of twisty passages where I bark my shins constantly and make no progress.

I tried to enable math support which is a standard contributions and…it’s still not working. I’m going to try this other thing now. As far as I can tell, no one uses the “mistune-contribs” version.

I think this signals the end for me. I’ll have to migrate to something else more managable.

Python Class and Instance Methods Share a Namespace

Which sucks.

Coming from Smalltalk, I expect a lot of polymorphism. Class/instance polymorphism seems pretty obvious…if I send a message to a class it goes to the class for handling. If I send a message to an instance it goes to the instance for handling. the methods can be different for each.

This is really useful for cases where much of the time you have a “fire and forget” method (e.g., serialising an object) for which you want to hide the effort of dorking with an instance in the normal case but sometimes you want to reuse the object. This just happened to me!

class BlackboardDelimitedSerialiser(ExamVisitor):
    def serialise_to(self, exam, path):
        with path.open('r') as f:
            writer = csv.writer(f, delimiter='\t', dialect='excel', quoting=csv.QUOTE_NONE, escapechar='"')
            for q_row in self.visit(exam):
                writer.write(q_row)

To use this as a one off I have to first instantiate the object. So something like:

bbserialiser = BlackboardDelimitedSerialiser()
bbserialiser.serialise_to(exam, path)

Since in my command line tool, I just do this serialisation once, I hoped to add a convenience class method:

class BlackboardDelimitedSerialiser(ExamVisitor):
    @classmethod
    def serialise_to(cls, exam, path):
        serialiser = cls()
        serialiser.serialise_to(exam, path)

    def serialise_to(self, exam, path):
        with path.open('r') as f:
            writer = csv.writer(f, delimiter='\t', dialect='excel', quoting=csv.QUOTE_NONE, escapechar='"')
            for q_row in self.visit(exam):
                writer.write(q_row)

So my call site would look like:

BlackboardDelimitedSerialiser.serialise_to(exam, path)

But all methods share a namespace so the instance method (which is lexically last) wins.

Boo! I don’t want the names to be different! I started hacking on clever, horrible tricks when I realised that I was being completely daft for this case. If I just add instantiating parens, the call site looks like:

BlackboardDelimitedSerialiser().serialise_to(exam, path)

So my use case was bad. There’s no need for a class method so better not to have the extra method. This won’t be true for other cases (where I need some extra logic in the class method), but let’s not borrow trouble. After all, it’s not even clear that I’ll ever need to reuse the object, so having the class method and an uglier name for the instance method would be fine.

Terminal Funkinesses

Two terminal graphicsy thingies crossed my radar:

  1. Brow.sh, a terminal front end for the Web. It uses a headless browser backend in a separate process and renders everything to the terminal even graphics!
  2. On a much smaller scale, there’s termgraph.py, a library for rendering bar graphs to the terminal.

I’m pretty excited about the second because, as I’ve whined before, getting a simple graph out of Python is way more difficult than it should be.

String Formatting in Python

Most of my Python code involves string manipulation esp. conversion between text formats. Right now I’m doodling on a prez clone. prez was great for getting started with reveal.js presentations, but it’s very opinionated and some of those opinions chafe. Plus, I want to be able to manipulate the presentations in a number of ways: generating topic lists, extracting notes, etc.

The prez workflow is simple: You create one Markdown file per slide in a special directory (slides) with a simple naming convention. You can have one level of nested folder, which gives you a vertical stack. There’s other bits including the precise bits of Markdown, naming conventions for skipping slides, some config stuff, etc. There’s also some nice features like live preview and generating PDFs. But the core bit is loop over markdown files, convert to HTML, then insert into a reveal.js template.

As there are several Markdown parsers for Python, the conversion bit is easy. For this, I’m using Python Markdown because it has a wealth of features and plugins. (For another project, I used mistune which seems nicer on the rendering but I never figured out how to hack the parser.)

The next step is to figure out how to wrap the converted Markdown into the right bits of HTML. A typical reveal.js slide looks like:


<section id="welcome-course-goals" class="slide">
	...
</section>

(The id is derived by prez from the folder/filename of your source Markdown files.)

Vertical slides appear in a nested section, and the whole shebang appears in a div of class “slides”.

For the hacky person that I am, given the facilities available, a bit of string interpolation is what’s needed.

There Should Be Fewer Than 4 Ways To Do It

There are a bazillion templating systems available for Python, including at least 4 built in (there are many overviews of varying completeness and correctness…many!).

The new hotness is “f-strings” (see the PEP), that is, string literals with a leading “f” which can interpret arbitrary Python expressions in curly braces. They are very compact and expressive. The current implementation in the standard Python interpreter is also very fast. Thus they address a number of issues with the other methods. One of the biggest problems is the horrific amount of repetition other methods need in order to have “named” holes in the format string. Consider using standard string interpolation:

# the variables "slide_id", "slide_txt", and "note_txt" are set before this

"""<section %(slide_id)s class="slide">
%(slide_txt)
<aside class="notes">
%(note_txt}
</aside>
</section>""" % {"slide_id": slide_id,  "slide_txt": slide_txt, "note_txt": note_txt}

As you can see, there’s truly a bonkers amount of repetition. It’s hard to read and maintain (imagine adding another variable or worse changing a variablename).  One workaround is to pass in the dictionary of local variables using the “locals()” function. This is terser but feels rather dodgy. It passes too much (and sometimes too little!) information and uses a reflective mechanism for what is, after all, a very basic, first order operation. f-strings solve this in an elegant way:

# the variables "slide_id", "slide_txt", and "note_txt" are set before this string
f"""<section {slide_id} class="slide">
{slide_txt}
<aside class="notes">
{note_txt}
</aside>
</section>""" # That's it! The variables are found in context!

So, there’s some magic there, but it’s pretty cool magic. This is orthogonal to the expressivity increase. With standard interpolation I can pass in strings and numbers and do a bit of padding but that’s it. With f-strings, I can put pretty arbitrary Python expressions in there. You might notice that “slide_id” is a string and contains the whole attribute string (typically, ‘id=”some_derived_id”‘). That’s left over from the string interpolation approach…my id variable had to contain the whole serialisation because if there’s no id, I don’t want a dangling attribute. With f-strings I could do some thing like:

slide_id = 'a_cool_id'
f"""<section {'id="%s"'% slide_id if slide_id else ''} class="slide">"""

Here, I used a condition expression and a traditional string interpolation! to conditionally add an id. I have to at least bind “slide_id” to “None” or the empty string, but I can leave my variable free from output serialisation cruft.

This all seems good, but f-strings stuff from one severe defect: You can’t save them in variables and instantiate them elsewhere. Given that I have HTML fragments ranging from under 10 lines to 100s, this is a non-starter. I need to give them meaningful names. This is especially important if I want to serialise to different formats like LaTeX. Basically, f-strings are compiled in their lexical context only to mitigate the extreme security risk that comes from the fact that they can embedded arbitrary Python expressions. Phooey! Frankly, I’d trade expressions in format strings for the ability to save them in variables. (I also think that taking f-strings from random user input is bonkers.)

So, f-strings as is are basically useless to me for the Prez clone. (They still seem great for all sorts of other things.)

By using ‘compile’ and ‘eval’, you can make f-string strings storable:

f_slide = compile('''f"""<section {slide_id} class="slide">
{slide_txt}
</section>"""''', '<string>', 'eval')

# the variables "slide_id", "slide_txt", and "note_txt" are set AFTER the f-string!

eval(f_slide)

Now, that’s pretty nasty and opens the door wide open for injection attacks. I don’t like seeing “eval” in my code. Passing “locals()” to a formatter seems way better. There’s a PEP proposing a different variant of f-strings where a designated “format” function would do the “eval” in a slightly more controlled way. My current move is to use a wrapper function, i.e.:

def f_slide(slide_id, slide_txt, note_txt):
return f"""<section {slide_id} class="slide">
{slide_txt}
<aside class="notes">
{note_txt}
</aside>
</section>"""

# the variables "text_of_slide", and "note_txt" are set

f_slide('an_id', text_of_slide, note_txt)

We lost some terseness, but we are better than the other solutions. We get a bit of flexibility in the names as well: We can vary the variable names between the caller and the template (e.g., “text_of_slide” to “slide_tex”). We could do some other munging in the body of the function as well instead of popping it into the f-string. You can also do really funky stuff with calling f-string functions from f-strings!

def f_slide_combo(slide_id, slide_txt, note_txt=None):
    return f"""<section {slide_id} class="slide">
{slide_txt}
{f_note(note_txt)}
</section>"""

def f_note(note_txt):
    if note_txt:
        return f"""\t<aside class="notes">
{note_txt}
</aside>"""
    else:
        return ''

We may lose some performance with the extra function calls. I did a little timeit testing on regular f-string vs. compile/eval vs. f-string function. I called each 10,000,000 (I needed that many to get clear seconds level differences):

  • Regular: 3.58s
  • Compile: 9.12s
  • Function: 5.25s

So compile/eval sucks as well as being nasty. The function approach looks pretty good though it has a measurable cost. All of this is picayune for the size strings I’m using and I imagine the function call overhead proportion gets smaller as the strings get larger and more complex.

So! This makes f-strings a complete win for me. In the worst case I have less redundancy (or ugly), more flexibility, and more speed (standard interlop came in at 14.85s!). There’s no loss of safety and I can make things reasonably composable. It reduces the need for more complex templating systems like Jinja2, though they have powerful features like template inheritance. You can’t use them for file based templates (without some hackery) and you can’t dork the delimiters. It’d be cool to have a templating system that used basic f-string syntax but added these two features. That’d handle all my current template needs.

So Many Ways…

It is a shame that there are so many ways but string interpolation is a hard problem. f-strings seem, on the whole, to be an advance, esp if coupled with the f-string function pattern. I’ll see how I feel next week.