Worgle
1 What is Worgle?
Hello, fellow human. I'm glad you could stop by.
This is a document written
in Org markup, talking about a thing I'm building called Worgle. The name
Worgle is derived from what it is: a Worg Tangler. Worg
is the name of this
project. It too gets its name from what it is: a WEB + Org project.
Org is the very decent markup language from org-mode.
WEB is
the name of the first literate programming tool ever created by Donald Knuth.
In literate programming,
one writes language and code together in a markup language, which can
then be parsed two ways. The weaver parses the markup to produce a
human readable document, usually a (La)TeX or HTML file. The tangler
parses the markup and produces computer code that can be read by a computer
to run or compile the program.
In other words, Worgle is a literate programming tangler used to convert org-like markup into (primarily) code.
Worgle itself is a literate program, so what tangles the worgle code? Orgle does! Orgle is a program written in C without literate programming. It is designed to be just enough of a program to bootstrap Worgle. Worgle will then be used as the tangler for the rest of Worg.
Worgle will initially start out as a literate program of Orgle. In fact, this document will initially start out as an outline for the Orgle program. The Orgle program will be considered done when it is able to produce a similar program by parsing this Worgle document. After that is done, more work will be put into Worgle to make it more suitable for managing larger projects written in C.
Following me so far? No? Yes? Great, let's get started.
2 Top-level files
Like Orgle, Worgle is self contained inside of a single C file. For the time being, this is suitable enough. The current scope of Worgle is to be a self-contained standalone CLI application.
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "parg.h"
<<global_variables>>
<<enums>>
<<structs>>
<<static_function_declarations>>
<<function_declarations>>
<<functions>>
int main(int argc, char *argv[])
{
<<local_variables>>
<<initialization>>
<<loading>>
<<parsing>>
<<generation>>
<<cleanup>>
}
3 An Outline of What Worgle does
This aims to show a broad overview of how Orgle (and Worgle) will work essentially. Orgle is a bootstrap program written in C, used to generate C code for Worgle (this program here). At the highest level, the two programs share the same basic program structure.
3.1 Initialization
3.1.1 Initialize worgle data
Worgle is initialized before stuff is loaded.
worgle_d worg;
worgle_init(&worg);
3.1.2 Get and set filename
The file name is currently aqcuired from the command line, so the program must check and make sure that there are the right number of arguments. If there isn't, return an error.
char *filename;
filename = NULL;
if(argc < 2) {
fprintf(stderr, "Usage: %s filename.org\n", argv[0]);
return 1;
}
<<parse_cli_args>>
<<check_filename>>
Check the filename. If the filename is not set inside by the command line, return an error,
if(filename == NULL) {
fprintf(stderr, "No filename specified\n");
return 1;
}
The filename is then set inside of the Worgle struct.
worg.filename.str = filename;
worg.filename.size = strlen(filename);
3.1.3 Initialize return codes
3.1.3.1 Main Return Code
The main return code determines the overall state of the program.
int rc;
By default, it is set to be okay, which is 0 on POSIX systems.
rc = 0;
3.1.3.2 Line Satus Code
The getline function used by the parser returns a status code, which tells the program when it has reached the end of the file.
int status;
This is set to be TRUE (1) by default.
status = 0;
3.1.3.3 Mode
The overall parser mode state is set by the local variable mode
.
int mode;
It is set to be the initial mode of MODE_ORG
.
mode = MODE_ORG;
3.2 Load file into memory
The first thing the program will do is load the file.
While most parsers tend to parse things on a line by line basis via a file stream, this parser will load the entire file into memory. This is done due to the textual nature of the program. It is much easier to simply allocate everything in one big block and reference chunks, then to allocate smaller chunks as you go.
3.2.1 Open file
File is loaded into a local file handle fp
.
FILE *fp;
fp = fopen(filename, "r");
if(fp == NULL) {
fprintf(stderr, "Could not find file %s\n", argv[1]);
return 1;
}
3.2.2 Get file size
The size is acquired by going to the end of the file and getting the current file position.
size_t size;
fseek(fp, 0, SEEK_END);
size = ftell(fp);
3.2.3 Allocate memory, read, and close
Memory is allocated in a local buffer variable via calloc
. The buffer
is then stored inside of the worg struct.
char *buf;
buf = calloc(1, size);
worg.buf = buf;
The file is rewound back to the beginning and then read into the buffer. The file is no longer needed at this point, so it is closed.
fseek(fp, 0, SEEK_SET);
fread(buf, size, 1, fp);
fclose(fp);
3.3 Parsing
The second phase of the program is the parsing stage.
The parsing stage will parse files line-by-line. The program will find a line by skimming through the block up to a line break character, then pass that off to be parsed. Line by line, the parser will read the program and produce a structure of the tangled code in memory.
while(1) {
<<getline>>
if(mode == MODE_ORG) {
<<parse_mode_org>>
} else if(mode == MODE_CODE) {
<<parse_mode_code>>
} else if(mode == MODE_BEGINCODE) {
<<parse_mode_begincode>>
}
}
3.3.1 Parser Local Variables
The parsing stage requires a local variable called str
to be used from time
to time. Not sure where else to put this.
worgle_string str;
worgle_string_init(&str);
line
refers to the pointer address that the line will write to.
char *line;
line = NULL;
pos
refers to the current buffer position.
size_t pos;
pos = 0;
This is the local variable read
.
size_t read;
3.3.2 Reading a line at a time
Despite being loaded into memory, the program still reads in code one line at a time. The parsing relies on new line feeds to denote the beginnings and endings of sections and code references.
Before reading the line, the line number inside worgle is incremented.
A special readline function has been written based on getline
that reads
lines of text from an allocated block of text. This function is called
worgle_getline
.
After the line has been read, the program checks the return code status
.
If all the lines of text have been read, the program breaks out of the
while loop.
worg.linum++;
status = worgle_getline(buf, &line, &pos, &read, size);
if(!status) break;
static int worgle_getline(char *fullbuf,
char **line,
size_t *pos,
size_t *line_size,
size_t buf_size);
fullbuf
refers to the full text buffer.
line
is a pointer where the current line will be stored.
pos
is the current buffer position.
line_size
is a variable written to that returns the size of the line. This
includes the line break character.
buf_size
is the size of the whole buffer.
static int worgle_getline(char *fullbuf,
char **line,
size_t *pos,
size_t *line_size,
size_t buf_size)
{
size_t p;
size_t s;
*line_size = 0;
p = *pos;
*line = &fullbuf[p];
s = 0;
while(1) {
s++;
if(p >= buf_size) return 0;
if(fullbuf[p] == '\n') {
*pos = p + 1;
*line_size = s;
return 1;
}
p++;
}
}
3.3.3 Parsing Modes
The parser is implemented as a relatively simple state machine, whose behavior
shifts between parsing org-mode markup (MODE_ORG
), and code blocks
(MODE_BEGINCODE
and MODE_CODE
).
The state machine makes a distinction between the start of a new code
block (MODE_BEGINCODE
), which provides information like the name of
the code block and optionally the name of the file to tangle to, and
the code block itself (MODE_CODE
).
enum {
<<parse_modes>>
};
3.3.3.1 MODE_ORG
MODE_ORG,
When the parser is in MODE_ORG
, it is only searching for the start
of the next named block. When it finds a match, it extracts the name,
gets ready to begin the a new block, and changes the mode MODE_BEGINCODE
.
if(read >= 7) {
if(!strncmp(line, "#+NAME:",7)) {
mode = MODE_BEGINCODE;
parse_name(line, read, &str);
worgle_begin_block(&worg, &str);
}
}
3.3.3.1.1 Extracting information from #+NAME
Name extraction of the current line is done with a function called parse_name
.
static int parse_name(char *line, size_t len, worgle_string *str);
static int parse_name(char *line, size_t len, worgle_string *str)
{
size_t n;
size_t pos;
int mode;
line+=7;
len-=7;
/* *namelen = 0; */
str->size = 0;
str->str = NULL;
if(len <= 0) return 1;
pos = 0;
mode = 0;
for(n = 0; n < len; n++) {
if(mode == 2) break;
switch(mode) {
case 0:
if(line[n] == ' ') {
} else {
str->str = &line[n];
str->size++;
pos++;
mode = 1;
}
break;
case 1:
if(line[n] == 0xa) {
mode = 2;
break;
}
pos++;
str->size++;
break;
default:
break;
}
}
/* *namelen = pos; */
return 1;
}
3.3.3.1.2 Beginning a new block
A new code block is started with the function worgle_begin_block
.
void worgle_begin_block(worgle_d *worg, worgle_string *name);
When a new block begins, the current block in Worgle is set to be a value retrieved from the block dictionary.
void worgle_begin_block(worgle_d *worg, worgle_string *name)
{
worg->curblock = worgle_hashmap_get(&worg->dict, name);
}
3.3.3.2 MODE_BEGINCODE
MODE_BEGINCODE,
A parser set to mode MODE_BEGINCODE
is only interested in finding the
beginning block. If it doesn't, it returns a syntax error. If it does,
it goes on to extract a potential new filename to tangle, which then
gets appended to the Worgle file list.
if(read >= 11) {
if(!strncmp(line, "#+BEGIN_SRC",11)) {
<<begin_the_code>>
if(parse_begin(line, read, &str) == 2) {
worgle_append_file(&worg, &str);
}
continue;
} else {
fwrite(line, read, 1, stderr);
fprintf(stderr, "line %lu: Expected #+BEGIN_SRC\n", worg.linum);
rc = 1;
break;
}
}
fprintf(stderr, "line %lu: Expected #+BEGIN_SRC\n", worg.linum);
rc = 1;
3.3.3.2.1 Extracting information from #+BEGIN_SRC
The begin source flag in org-mode can have a number of options, but the only one we really care about for this tangler is the ":tangle" option.
static int parse_begin(char *line, size_t len, worgle_string *str);
The state machine begins right after the BEGIN_SRC declaration, which is why the string is offset by 11.
The state machine for this parser is linear, and has 5 modes:
mode 0: Skip whitespace after BEGIN_SRC
mode 1: Find ":tangle" pattern
mode 2: Ignore imediate whitespace after "tangle", and begin getting filename
mode 3: Get filename size by reading up to the next space or line break
mode 4: Don't do anything, wait for line to end.
static int parse_begin(char *line, size_t len, worgle_string *str)
{
size_t n;
int mode;
int rc;
line += 11;
len -= 11;
if(len <= 0) return 0;
mode = 0;
n = 0;
rc = 1;
str->str = NULL;
str->size = 0;
while(n < len) {
switch(mode) {
case 0: /* initial spaces after BEGIN_SRC */
if(line[n] == ' ') {
n++;
} else {
mode = 1;
}
break;
case 1: /* look for :tangle */
if(line[n] == ' ') {
mode = 0;
n++;
} else {
if(line[n] == ':') {
if(!strncmp(line + n + 1, "tangle", 6)) {
n+=7;
mode = 2;
rc = 2;
}
}
n++;
}
break;
case 2: /* save file name, spaces after tangle */
if(line[n] != ' ') {
str->str = &line[n];
str->size++;
mode = 3;
}
n++;
break;
case 3: /* read up to next space or line break */
if(line[n] == ' ' || line[n] == '\n') {
mode = 4;
} else {
str->size++;
}
n++;
break;
case 4: /* countdown til end */
n++;
break;
}
}
return rc;
}
3.3.3.2.2 Setting up code for a new read
When a new codeblock has indeed been found, the mode is switched to MODE_CODE
,
and the block_started
boolean flag gets set. In addition, the string used
to keep track of the new block is reset.
mode = MODE_CODE;
worg.block_started = 1;
worgle_string_reset(&worg.block);
3.3.3.2.3 Appending a new file
If a new file is found, the filename gets appended to the file list
via the function worgle_append_file
.
void worgle_append_file(worgle_d *worg, worgle_string *filename);
void worgle_append_file(worgle_d *worg, worgle_string *filename)
{
worgle_filelist_append(&worg->flist, filename, worg->curblock);
}
3.3.3.3 MODE_CODE
MODE_CODE
In MODE_CODE
, actual code is parsed inside of the code block. The parser will
keep reading chunks of code until one of two things happen: a code reference
is found, or the END_SRC
command is found.
if(read >= 9) {
if(!strncmp(line, "#+END_SRC", 9)) {
mode = MODE_ORG;
worg.block_started = 0;
worgle_append_string(&worg);
continue;
}
}
if(check_for_reference(line, read, &str)) {
worgle_append_string(&worg);
worgle_append_reference(&worg, &str);
worg.block_started = 1;
worgle_string_reset(&worg.block);
continue;
}
worg.block.size += read;
if(worg.block_started) {
worg.block.str = line;
worg.block_started = 0;
worg.curline = worg.linum;
}
void worgle_append_string(worgle_d *worg);
void worgle_append_string(worgle_d *worg)
{
if(worg->curblock == NULL) return;
worgle_block_append_string(worg->curblock,
&worg->block,
worg->curline,
&worg->filename);
}
void worgle_append_reference(worgle_d *worg, worgle_string *ref);
void worgle_append_reference(worgle_d *worg, worgle_string *ref)
{
if(worg->curblock == NULL) return;
worgle_block_append_reference(worg->curblock,
ref,
worg->linum,
&worg->filename);
}
static int check_for_reference(char *line , size_t size, worgle_string *str);
static int check_for_reference(char *line , size_t size, worgle_string *str)
{
int mode;
size_t n;
mode = 0;
str->size = 0;
str->str = NULL;
for(n = 0; n < size; n++) {
if(mode < 0) break;
switch(mode) {
case 0: /* spaces */
if(line[n] == ' ') continue;
else if(line[n] == '<') mode = 1;
else mode = -1;
break;
case 1: /* second < */
if(line[n] == '<') mode = 2;
else mode = -1;
break;
case 2: /* word setup */
str->str = &line[n];
str->size++;
mode = 3;
break;
case 3: /* the word */
if(line[n] == '>') {
mode = 4;
break;
}
str->size++;
break;
case 4: /* last > */
if(line[n] == '>') mode = 5;
else mode = -1;
break;
}
}
return (mode == 5);
}
3.4 Generation
The last phase of the program is code generation.
A parsed file generates a structure of how the code will look. The generation stage involves iterating through the structure and producing the code.
Due to the hierarchical nature of the data structures, the generation stage is surprisingly elegant with a single expanding entry point.
At the very top, generation consists of writing all the files in the filelist. Each file will then go and write the top-most block associated with that file. A block will then write the segment list it has embedded inside of it. A segment will either write a string literal to disk, or a recursively expand block reference.
if(!rc) if(!worgle_generate(&worg)) rc = 1;
If the use_warnings
flag is turned on, Worgle will scan the dictionary
after generation and flag warnings about any unused blocks.
if(use_warnings) rc = worgle_warn_unused(&worg);
3.5 Cleanup
At the end up the program, all allocated memory is freed via worgle_free
.
worgle_free(&worg);
return rc;
int worgle_generate(worgle_d *worg);
int worgle_generate(worgle_d *worg)
{
return worgle_filelist_write(&worg->flist, &worg->dict);
}
4 Core Data Structures
The Worgle/Orgle program is very much a data-structure driven program. Understanding the hierarchy of data here will provide a clear picture for how the tangling works.
<<worgle_string>>
<<worgle_segment>>
<<worgle_block>>
<<worgle_blocklist>>
<<worgle_hashmap>>
<<worgle_file>>
<<worgle_filelist>>
<<worgle_struct>>
4.1 Top Level Struct
All Worgle operations are contained in a top-level struct called worgle_d
.
For the most part, this struct aims to be dynamically populated.
typedef struct {
<<worgle_struct_contents>>
} worgle_d;
4.1.1 Worgle Initialization
Worgle data is initialized using the function worgle_init
.
void worgle_init(worgle_d *worg);
void worgle_init(worgle_d *worg)
{
<<worgle_init>>
}
4.1.2 Worgle Deallocation
When worgle is done, the program deallocates memory using the function
worgle_free
.
void worgle_free(worgle_d *worg);
void worgle_free(worgle_d *worg)
{
<<worgle_free>>
}
4.1.3 Worgle Data
4.1.3.1 Current Block Name
The name of current block being parsed is stored in a variable called
block
.
worgle_string block; /* TODO: rename */
It is initialized to be an empty string.
worgle_string_init(&worg->block);
4.1.3.2 Current Line
The starting line number of the current block is stored in a variable called
curline
.
size_t curline;
The current line is initialized to be negative value to mark that it has not been set yet.
worg->curline = -1;
4.1.3.3 Block Started Flag
The block started flag is used by the parser to check whether or not a code block was started on the last iteration.
int block_started;
It is set to be FALSE (0).
worg->block_started = 0;
4.1.3.4 Dictionary
All code blocks are stored in a dictionary, also referred to here as a hash map.
worgle_hashmap dict;
The dictionary is initialized using the function worgle_hashmap_init
.
worgle_hashmap_init(&worg->dict);
When free-ing time comes around, the hashmap will free itself using the function
worgle_hashmap_free
.
worgle_hashmap_free(&worg->dict);
4.1.3.5 File List
All files to be written to are stored in a local file list called flist
.
worgle_filelist flist;
Initialization.
worgle_filelist_init(&worg->flist);
Destruction.
worgle_filelist_free(&worg->flist);
4.1.3.6 Text Buffer
The text file parsed is loaded into memory and stored into a buffer called buf
char *buf;
The loaded happens after initialization, so the buffer is set to be NULL for now.
worg->buf = NULL;
If the buffer is non-null, the memory will be freed.
if(worg->buf != NULL) free(worg->buf);
4.1.3.7 Current Block
A pointer to the currently populated code block is stored in a variable called
curblock
.
worgle_block *curblock;
There is no block on startup, so set it to be NULL.
worg->curblock = NULL;
4.1.3.8 Line Number
The currently parsed line number is stored in a variable called linum
.
size_t linum;
The line number is incremented, so the starting value starts at 0. Line 1 is the first line. Do not be tempted to set this to be -1, because it won't work.
worg->linum = 0;
4.1.3.9 Filename
The filename is stored inside of a worgle string called filename
.
worgle_string filename;
This values does not get set on init, but it is zeroed out and initialized.
worgle_string_init(&worg->filename);
4.2 String
A string is a wrapper around a raw char
pointer and a size. This is used
as the base string literal.
typedef struct {
char *str;
size_t size;
} worgle_string;
4.2.1 Reset or initialize a string
Strings in worgle are reset with the function worgle_string_reset
.
void worgle_string_reset(worgle_string *str);
void worgle_string_reset(worgle_string *str)
{
str->str = NULL;
str->size = 0;
}
A string being initialized is identical to a string being reset. The function
worgle_string_init
is just a wrapper around worgle_string_reset
.
void worgle_string_init(worgle_string *str);
void worgle_string_init(worgle_string *str)
{
worgle_string_reset(str);
}
4.2.2 Writing a String
A string is written to a particular filehandle with the function
worgle_string_write
.Worgle strings are not zero-terminated
and can't be used in functions like printf.
int worgle_string_write(FILE *fp, worgle_string *str);
This function is a wrapper around a call to fwrite
.
int worgle_string_write(FILE *fp, worgle_string *str)
{
return fwrite(str->str, 1, str->size, fp);
}
4.3 Segment
A segment turns a string into a linked list component that has a type. A segment type flag can either be a text chunk or a reference.
enum {
<<worgle_segment_types>>
};
typedef struct worgle_segment {
int type;
worgle_string str;
<<worgle_segment_line_control>>
struct worgle_segment *nxt;
} worgle_segment;
Segments also keep track of where they are in the original org file. This information can be used to generate line control preprocessor commands for C/C++.
size_t linum;
worgle_string *filename;
4.3.1 Text Chunk Type
A text chunk is a literal string of text.
When a text chunk segment is processed, it gets written to file directly.
SEGTYPE_TEXT,
4.3.2 Reference Type
A reference contains a string reference to another block.
When a reference segment gets processed, it looks up the reference and processes all the segements in that code block.
SEGTYPE_REFERENCE
4.3.3 Initializing a Segment
A segment is initialized with the function worgle_segment_init
.
void worgle_segment_init(worgle_segment *s,
int type,
worgle_string *str,
worgle_string *filename,
size_t linum);
void worgle_segment_init(worgle_segment *s,
int type,
worgle_string *str,
worgle_string *filename,
size_t linum)
{
s->type = type;
s->str = *str;
s->filename = filename;
s->linum = linum;
}
4.3.4 Writing a Segment
A segment is written to a file handle using the function worgle_segment_write
.
In addition to taking in a filehandle and segment, a hashmap is also passed
in in the event that the segment is a reference.
On sucess, the function returns TRUE (1). On failure, FALSE (0).
int worgle_segment_write(worgle_segment *s, worgle_hashmap *h, FILE *fp);
Different behaviors happen depending on the segment type.
If the segment is a chunk of text (SEGTYPE_TEXT
), then the string
is written. If the use_debug
global variable is enabled, then C preprocessor
macros are written indicating the position from the original file. This
only needs to happen for text blocks and not references.
If the segment is a reference (SEGTYPE_REFERENCE
), the
function attempts to look up a block and write it to disk. If it cannot
find the reference, a warning is flashed to screen. If the warning
mode is soft, the error code returns TRUE. If warning errors are turned on,
it returns FALSE.
int worgle_segment_write(worgle_segment *s, worgle_hashmap *h, FILE *fp)
{
worgle_block *b;
if(s->type == SEGTYPE_TEXT) {
if(use_debug) {
fprintf(fp, "#line %lu \"", s->linum);
worgle_string_write(fp, s->filename);
fprintf(fp, "\"\n");
}
worgle_string_write(fp, &s->str);
} else {
if(!worgle_hashmap_find(h, &s->str, &b)) {
fprintf(stderr, "Warning: could not find reference segment '");
worgle_string_write(stderr, &s->str);
fprintf(stderr, "'\n");
if(use_warnings == 2) {
return 0;
} else {
return 1;
}
}
return worgle_block_write(b, h, fp);
}
return 1;
}
4.4 Code Block
A code block is a top-level unit that stores some amount of code. It is made up of a list of segments. Every code block has a unique name.
typedef struct worgle_block {
int nsegs;
worgle_segment *head;
worgle_segment *tail;
worgle_string name;
int am_i_used;
struct worgle_block *nxt;
} worgle_block;
4.4.1 Initializing a code block
A worgle code block is initialized using the function worgle_block_init
.
void worgle_block_init(worgle_block *b);
The initialization will zero out all the variables related to the segment linked list, as well as initialize the string holding the name of the block.
void worgle_block_init(worgle_block *b)
{
b->nsegs = 0;
b->head = NULL;
b->tail = NULL;
b->nxt = NULL;
b->am_i_used = 0;
worgle_string_init(&b->name);
}
4.4.2 Freeing a code block
A code block is freed using the function worgle_block_free
.
void worgle_block_free(worgle_block *lst);
This function iterates through the segment linked list contained inside the block, and frees each one. Since there is nothing to free below a segment, the standard free function is called directly.
void worgle_block_free(worgle_block *lst)
{
worgle_segment *s;
worgle_segment *nxt;
int n;
s = lst->head;
for(n = 0; n < lst->nsegs; n++) {
nxt = s->nxt;
free(s);
s = nxt;
}
}
4.4.3 Appending a segment to a code block
A generic segment is appended to a code block with the function.
worgle_block_append_segment
.
The block b
, name of the block str
, and type type
are mandatory parameters
which describe the segment. The location in the file is also required, so the
line number linum
and name of file filename
are also provided as well.
This function is called inside of a type-specific append function instead of
being called directly.
void worgle_block_append_segment(worgle_block *b,
worgle_string *str,
int type,
size_t linum,
worgle_string *filename);
It is worthwhile to note that it is in this function that a data segment type gets allocated.
void worgle_block_append_segment(worgle_block *b,
worgle_string *str,
int type,
size_t linum,
worgle_string *filename)
{
worgle_segment *s;
s = malloc(sizeof(worgle_segment));
if(b->nsegs == 0) {
b->head = s;
b->tail = s;
}
worgle_segment_init(s, type, str, filename, linum);
b->tail->nxt = s;
b->tail = s;
b->nsegs++;
}
4.4.3.1 Appending a string segment
A string segment is appended to a code block using the function
worgle_block_append_string
.
void worgle_block_append_string(worgle_block *b,
worgle_string *str,
size_t linum,
worgle_string *filename);
void worgle_block_append_string(worgle_block *b,
worgle_string *str,
size_t linum,
worgle_string *filename)
{
worgle_block_append_segment(b, str, SEGTYPE_TEXT, linum, filename);
}
4.4.3.2 Appending a reference segment
A reference segment is appended to a code block using the function
worgle_block_append_reference
.
void worgle_block_append_reference(worgle_block *b,
worgle_string *str,
size_t linum,
worgle_string *filename);
void worgle_block_append_reference(worgle_block *b,
worgle_string *str,
size_t linum,
worgle_string *filename)
{
worgle_block_append_segment(b, str, SEGTYPE_REFERENCE, linum, filename);
}
4.4.4 Appending a code block to a code block
In both CWEB and Org-tangle, existing code blocks can be appeneded to in different sections. Because of how this program works, we get this functionality for free!
4.4.5 Writing a code block to filehandle
Writing a code block to a filehandle can be done using the function
worgle_block_write
. In addition to the file handle fp
, an org block
requires a hashmap, which is required in the lower level function
orgle_segment_write
for expanding code references.
This function returns a boolean TRUE (1) on success or FALSE (0) on failure.
int worgle_block_write(worgle_block *b, worgle_hashmap *h, FILE *fp);
A code block iterates it's segment list, writing each segment to disk. A block will also be marked as being used, which is useful for supplying warning information later.
int worgle_block_write(worgle_block *b, worgle_hashmap *h, FILE *fp)
{
worgle_segment *s;
int n;
s = b->head;
b->am_i_used = 1;
for(n = 0; n < b->nsegs; n++) {
if(!worgle_segment_write(s, h, fp)) return 0;
s = s->nxt;
}
return 1;
}
4.5 Code Block List
A code block list is a linked list of blocks, which is used inside of a hash map.
typedef struct {
int nblocks;
worgle_block *head;
worgle_block *tail;
} worgle_blocklist;
4.5.1 Block List Initialization
A block list is initialized using the function worgle_blocklist_init
.
void worgle_blocklist_init(worgle_blocklist *lst);
void worgle_blocklist_init(worgle_blocklist *lst)
{
lst->head = NULL;
lst->tail = NULL;
lst->nblocks = 0;
}
4.5.2 Freeing a Block List
Blocks allocated by the block list are freed using the function
worgle_blocklist_free
.
void worgle_blocklist_free(worgle_blocklist *lst);
void worgle_blocklist_free(worgle_blocklist *lst)
{
worgle_block *b;
worgle_block *nxt;
int n;
b = lst->head;
for(n = 0; n < lst->nblocks; n++) {
nxt = b->nxt;
worgle_block_free(b);
free(b);
b = nxt;
}
}
4.5.3 Appending a Block
An allocated block is appended to a block list using the function
worgle_blocklist_append
.
void worgle_blocklist_append(worgle_blocklist *lst, worgle_block *b);
void worgle_blocklist_append(worgle_blocklist *lst, worgle_block *b)
{
if(lst->nblocks == 0) {
lst->head = b;
lst->tail = b;
}
lst->tail->nxt = b;
lst->tail = b;
lst->nblocks++;
}
4.6 Hash Map
A hash map is a key-value data structure used as a dictionary for storing references to code blocks.
#define HASH_SIZE 256
typedef struct {
worgle_blocklist blk[HASH_SIZE];
int nwords;
} worgle_hashmap;
4.6.1 Hash map Initialization
A hash map is initialized using the function worgle_hashmap_init
void worgle_hashmap_init(worgle_hashmap *h);
A hashmap is composed of an array of block lists which must be initialized.
void worgle_hashmap_init(worgle_hashmap *h)
{
int n;
h->nwords = 0;
for(n = 0; n < HASH_SIZE; n++) {
worgle_blocklist_init(&h->blk[n]);
}
}
4.6.2 Freeing a Hash Map
Information allocated inside the hash map is freed using the function
worgle_hashmap_free
.
void worgle_hashmap_free(worgle_hashmap *h);
To free a hash map is to free each block list in the array.
void worgle_hashmap_free(worgle_hashmap *h)
{
int n;
for(n = 0; n < HASH_SIZE; n++) {
worgle_blocklist_free(&h->blk[n]);
}
}
4.6.3 Looking up an entry
A hashmap lookup can be done with the function worgle_hashmap_find
.
This will attempt to look for a value with the key value name
, and
save it in the block pointer b
. If nothing is found, the function returns
FALSE (0). On success, TRUE (1).
int worgle_hashmap_find(worgle_hashmap *h, worgle_string *name, worgle_block **b);
<<hashmap_hasher>>
int worgle_hashmap_find(worgle_hashmap *h, worgle_string *name, worgle_block **b)
{
int pos;
worgle_blocklist *lst;
int n;
worgle_block *blk;
pos = hash(name->str, name->size);
lst = &h->blk[pos];
blk = lst->head;
for(n = 0; n < lst->nblocks; n++) {
if(name->size == blk->name.size) {
if(!strncmp(name->str, blk->name.str, name->size)) {
*b = blk;
return 1;
}
}
blk = blk->nxt;
}
return 0;
}
Like any hashmap, a hashing algorithm is used to to compute which list to place the entry in. This is one I've used on a number of projects now.
static int hash(const char *str, size_t size)
{
unsigned int h = 5381;
size_t i = 0;
for(i = 0; i < size; i++) {
h = ((h << 5) + h) ^ str[i];
h %= 0x7FFFFFFF;
}
return h % HASH_SIZE;
}
4.6.4 Getting an entry
To "get" an entry means to return a block if it exists or not. Return
an entry that exists, or make a new one. This can be done with the function
worgle_hashmap_get
.
worgle_block * worgle_hashmap_get(worgle_hashmap *h, worgle_string *name);
worgle_block * worgle_hashmap_get(worgle_hashmap *h, worgle_string *name)
{
worgle_block *b;
worgle_blocklist *lst;
int pos;
if(worgle_hashmap_find(h, name, &b)) return b;
pos = hash(name->str, name->size);
b = NULL;
b = malloc(sizeof(worgle_block));
worgle_block_init(b);
b->name = *name;
lst = &h->blk[pos];
worgle_blocklist_append(lst, b);
return b;
}
4.7 File
A worgle file is an abstraction for a single file worgle will write to. Every file has a filename, and a top-level code block. A worgle does not have a filehandle. Files will only be created at the generation stage.
typedef struct worgle_file {
worgle_string filename;
worgle_block *top;
struct worgle_file *nxt;
} worgle_file;
4.7.1 Writing A File to a filehandle
A file is writen to a filehandle using the function worgle_file_write
.
A hashmap is also required because it contains all the named code blocks
needed for any code expansion.
int worgle_file_write(worgle_file *f, worgle_hashmap *h);
A filehandle is opened, the top-most code block is written using
worgle_block_write
, and then the file is closed.
Because worgle strings are not zero terminated, they must be copied to a temporary string buffer with a null terminator. Any filename greater than 127 characters will be truncated.
int worgle_file_write(worgle_file *f, worgle_hashmap *h)
{
FILE *fp;
char tmp[128];
size_t n;
size_t size;
int rc;
if(f->filename.size > 128) size = 127;
else size = f->filename.size;
for(n = 0; n < size; n++) tmp[n] = f->filename.str[n];
tmp[size] = 0;
fp = fopen(tmp, "w");
rc = worgle_block_write(f->top, h, fp);
fclose(fp);
return rc;
}
4.8 The File List
A file list is a linked list of worgle files.
typedef struct {
worgle_file *head;
worgle_file *tail;
int nfiles;
} worgle_filelist;
4.8.1 Initializing a file list
A file list is zeroed out and initialized using the function
worgle_filelist_init
.
void worgle_filelist_init(worgle_filelist *flist);
void worgle_filelist_init(worgle_filelist *flist)
{
flist->head = NULL;
flist->tail = NULL;
flist->nfiles = 0;
}
4.8.2 Freeing a file list
A filelist is freed using the function worgle_filelist_free
.
void worgle_filelist_free(worgle_filelist *flist);
void worgle_filelist_free(worgle_filelist *flist)
{
worgle_file *f;
worgle_file *nxt;
int n;
f = flist->head;
for(n = 0; n < flist->nfiles; n++) {
nxt = f->nxt;
free(f);
f = nxt;
}
}
4.8.3 Appending a file to a file list
A file is appended to the file list using the function worgle_filelist_append
.
The name, as well as the well as the top-level code block are required here.
void worgle_filelist_append(worgle_filelist *flist,
worgle_string *name,
worgle_block *top);
void worgle_filelist_append(worgle_filelist *flist,
worgle_string *name,
worgle_block *top)
{
worgle_file *f;
f = malloc(sizeof(worgle_file));
f->filename = *name;
f->top = top;
if(flist->nfiles == 0) {
flist->head = f;
flist->tail = f;
}
flist->tail->nxt = f;
flist->tail = f;
flist->nfiles++;
}
4.8.4 Writing a filelist to disk
A file list can be appended using the function worgle_filelist_write
.
A hashmap containing all named code blocks all that is required.
int worgle_filelist_write(worgle_filelist *flist, worgle_hashmap *h);
int worgle_filelist_write(worgle_filelist *flist, worgle_hashmap *h)
{
worgle_file *f;
int n;
f = flist->head;
for(n = 0; n < flist->nfiles; n++) {
if(!worgle_file_write(f, h)) return 0;
f = f->nxt;
}
return 1;
}
5 Command Line Arguments
This section outlines command line arguments in Worgle.
5.1 Parsing command line flags
Command line argument parsing is done using the third-party library parg, included in this source distribution.
struct parg_state ps;
int c;
parg_init(&ps);
while((c = parg_getopt(&ps, argc, argv, "gW:")) != -1) {
switch(c) {
case 1:
filename = (char *)ps.optarg;
break;
case 'g':
<<turn_on_debug_macros>>
break;
case 'W':
<<turn_on_warnings>>
break;
default:
fprintf(stderr, "Unknown option -%c\n", c);
return 1;
}
}
5.2 Turning on debug macros (-g)
Worgle has the ability to generate debug macros when generating C files.
This will turn on a boolean flag called use_debug
inside the worgle struct.
use_debug = 1;
By default, use_debug
is set to be false in order to allow other non-C
languages to be used.
static int use_debug = 0;
5.3 Turning on Warnings (-W)
Worgle can print out warnings about things like unused sections of code. By default, this is turned off.
static int use_warnings = 0;
if(!strncmp(ps.optarg, "soft", 4)) {
use_warnings = 1;
} else if(!strncmp(ps.optarg, "error", 5)) {
use_warnings = 2;
} else {
fprintf(stderr, "Unidentified warning mode '%s'\n", ps.optarg);
return 1;
}
5.3.1 Checking for unused blocks
One thing that warnings can do is check for unused blocks.
This is done after the files are generated with the function
worgle_warn_unused
.
int worgle_warn_unused(worgle_d *worg);
int worgle_warn_unused(worgle_d *worg)
{
worgle_hashmap *dict;
worgle_block *blk;
worgle_blocklist *lst;
int n;
int b;
int rc;
dict = &worg->dict;
rc = 0;
for(n = 0; n < HASH_SIZE; n++) {
lst = &dict->blk[n];
blk = lst->head;
for(b = 0; b < lst->nblocks; b++) {
if(blk->am_i_used == 0) {
fprintf(stderr, "Warning: block '");
worgle_string_write(stderr, &blk->name);
fprintf(stderr, "' unused.\n");
if(use_warnings == 2) rc = 1;
}
blk = blk->nxt;
}
}
return rc;
}