Browsing Program Structure with Worgle and SQLite

2019-09-02

Literate Programs as Structured Data

A compelling aspect of writing literate programs is that they can be represented as structured data. In Worgle, there are two main tree representations of the overall program structure. The woven tree represents the document structure as a collection of headers found in the Org markup language. The tangled tree represents the generated code structure as a series of named codeblocks created using the noweb syntax.

I always thought it would be a very powerful thing to be able to explore literate programs as trees. As of now, this is beginning to be possible in Worgle. Worgle has the ability to write data to a SQLite database, which is then queried using a program called worgmap.

Extracting Data from a Worgle Literate Program

Before a literate program made using Worgle can be queried, data must first be written to an intermediate format. I have chosen to use SQLite, as it is a robust and mature data format that is trivial for other programs to parse.

A database is generated using the "-d" flag in Worgle. The code below generates a database from the main Worgle org file.

 worgle -d a.db worgle.org

The name "a.db" is the default name of the database that Worgmap opens to query information.

Database write times are reasonable. My largest program written in Worgle to-date, Monolith, is able to write a database in under half a second on my 2015 macbook pro. My GPD laptop running alpine Linux does seem to take a few seconds. This performance difference feels larger than I expected, even when considering the hardware difference. Even so, it still feels manageable.

Some Querying Via Worgmap

Once a database is generated, it can be queried using "get" utilities found in a program called Worgmap. The database is a pure SQLite database, so it is possible to just do raw SQL queries using the sqlite3 CLI. The worgmap get interface saves a few keystrokes.

When worgmap is run, it is assumed the database is in the current working directory, and that the name of the database is named "a.db". In the future, this will be more customizable.

To get a list of files from the database, run

 worgmap get filelist

This will return the list of files tangled by Worgle:

 worgle.c
 worgle.h
 worgle_private.h

The program ffile can be used to get metadata on the file "worgle.c":

 worgmat get ffile worgle.c

This returns the following:

 id = 2
 filename = worgle.c
 top = 1
 next_file = 29

The id is the UUID associated with this resource. filename is the stored filename (duh). top refers to the top-level code block represented. next_file is the UUID of the next file in the list.

To get more information on the top level block:

 worgmap get blk 1

Which returns:

 1 3 worgle-top

This displays in order: the UUID (1), the UUID of the top level segment, and the name of the block (worgle-top). "worgle-top" is the block that contains the entire structure of the tangled C file "worgle.c". A tree view of this block can be printed using:

 worgmap get tree worgle-top

The results:

 global_variables
 enums
  parse_modes
 static_function_declarations
 functions
  loadfile_localvars
  loadfile
  parser_local_variables
  parser_initialization
  getline
  parse_mode_org
  parse_mode_code
  parse_mode_begincode
   begin_the_code
  worgle_block_set_id
  worgle_file_set_id
  worgle_segment_string_set_id
  worgle_segment_reference_set_id
  worgle_init
  worgle_free
  worgle_string_init
  worgle_segment_init
  worgle_block_init
  hashmap_hasher
  worgle_file_init_id
 local_variables
 initialization
  parse_cli_args
   append_filename
   turn_on_debug_macros
   turn_on_warnings
   map_source_code
   generate_database
  check_filename
 loading
 parsing
 generation
 mapping
 database
 cleanup

Future Plans

Lots of things to be done here, really. Using something like SQLite allows me to dump way more metadata than I know what to do with right now.

For starters, I'd like to parse save org structure in addition to tangled code structure. I'm hoping to build more utilities that generate interesting representations of the document. Hoping to build a better static HTML generator than the simple one I have currently written. I also want to build a simple HTTP server that dynamic generates HTML content. Maybe throw in a few dot graph generators for good measure?

Being able to write multiple worgle programs into one database is important to me as well, as this would allow more incremental (hopefully faster) development to happen.