Worgle

1 What is Worgle?

Hello, fellow human. I'm glad you could stop by.

This is a document written in Org markup, talking about a thing I'm building called Worgle. The name Worgle is derived from what it is: a Worg Tangler. Worg is the name of this project. It too gets its name from what it is: a WEB + Org project. Org is the very decent markup language from org-mode. WEB is the name of the first literate programming tool ever created by Donald Knuth. In literate programming, one writes language and code together in a markup language, which can then be parsed two ways. The weaver parses the markup to produce a human readable document, usually a (La)TeX or HTML file. The tangler parses the markup and produces computer code that can be read by a computer to run or compile the program.

In other words, Worgle is a literate programming tangler used to convert org-like markup into (primarily) code.

Worgle itself is a literate program, so what tangles the worgle code? Orgle does! Orgle is a program written in C without literate programming. It is designed to be just enough of a program to bootstrap Worgle. Worgle will then be used as the tangler for the rest of Worg.

Worgle will initially start out as a literate program of Orgle. In fact, this document will initially start out as an outline for the Orgle program. The Orgle program will be considered done when it is able to produce a similar program by parsing this Worgle document. After that is done, more work will be put into Worgle to make it more suitable for managing larger projects written in C.

Following me so far? No? Yes? Great, let's get started.

2 Top-level files

Like Orgle, Worgle is self contained inside of a single C file. For the time being, this is suitable enough. The current scope of Worgle is to be a self-contained standalone CLI application.

<<worgle-top>>=

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "parg.h"
<<global_variables>>
<<enums>>
<<structs>>
<<static_function_declarations>>
<<function_declarations>>
<<functions>>
int main(int argc, char *argv[])
{
    <<local_variables>>
    <<initialization>>
    <<loading>>
    <<parsing>>
    <<generation>>
    <<cleanup>>
}

3 An Outline of What Worgle does

This aims to show a broad overview of how Orgle (and Worgle) will work essentially. Orgle is a bootstrap program written in C, used to generate C code for Worgle (this program here). At the highest level, the two programs share the same basic program structure.

3.1 Initialization

3.1.1 Initialize worgle data

Worgle is initialized before stuff is loaded.

<<local_variables>>=

worgle_d worg;

<<initialization>>=

worgle_init(&worg);

3.1.2 Get and set filename

The file name is currently aqcuired from the command line, so the program must check and make sure that there are the right number of arguments. If there isn't, return an error.

<<local_variables>>=

char *filename;

<<initialization>>=

filename = NULL;
if(argc < 2) {
    fprintf(stderr, "Usage: %s filename.org\n", argv[0]);
    return 1;
}
<<parse_cli_args>>
<<check_filename>>

Check the filename. If the filename is not set inside by the command line, return an error,

<<check_filename>>=

if(filename == NULL) {
    fprintf(stderr, "No filename specified\n");
    return 1;
}

The filename is then set inside of the Worgle struct.

<<initialization>>=

worg.filename.str = filename;
worg.filename.size = strlen(filename);

3.1.3 Initialize return codes

3.1.3.1 Main Return Code

The main return code determines the overall state of the program.

<<local_variables>>=

int rc;

By default, it is set to be okay, which is 0 on POSIX systems.

<<initialization>>=

rc = 0;

3.1.3.2 Line Satus Code

The getline function used by the parser returns a status code, which tells the program when it has reached the end of the file.

<<local_variables>>=

int status;

This is set to be TRUE (1) by default.

<<initialization>>=

status = 0;

3.1.3.3 Mode

The overall parser mode state is set by the local variable mode.

<<local_variables>>=

int mode;

It is set to be the initial mode of MODE_ORG.

<<initialization>>=

mode = MODE_ORG;

3.2 Load file into memory

The first thing the program will do is load the file.

While most parsers tend to parse things on a line by line basis via a file stream, this parser will load the entire file into memory. This is done due to the textual nature of the program. It is much easier to simply allocate everything in one big block and reference chunks, then to allocate smaller chunks as you go.

3.2.1 Open file

File is loaded into a local file handle fp.

<<local_variables>>=

FILE *fp;

<<loading>>=

fp = fopen(filename, "r");
if(fp == NULL) {
    fprintf(stderr, "Could not find file %s\n", argv[1]);
    return 1;
}

3.2.2 Get file size

The size is acquired by going to the end of the file and getting the current file position.

<<local_variables>>=

size_t size;

<<loading>>=

fseek(fp, 0, SEEK_END);
size = ftell(fp);

3.2.3 Allocate memory, read, and close

Memory is allocated in a local buffer variable via calloc. The buffer is then stored inside of the worg struct.

<<local_variables>>=

char *buf;

<<loading>>=

buf = calloc(1, size);
worg.buf = buf;

The file is rewound back to the beginning and then read into the buffer. The file is no longer needed at this point, so it is closed.

<<loading>>=

fseek(fp, 0, SEEK_SET);
fread(buf, size, 1, fp);
fclose(fp);

3.3 Parsing

The second phase of the program is the parsing stage.

The parsing stage will parse files line-by-line. The program will find a line by skimming through the block up to a line break character, then pass that off to be parsed. Line by line, the parser will read the program and produce a structure of the tangled code in memory.

<<parsing>>=

while(1) {
    <<getline>>
    if(mode == MODE_ORG) {
        <<parse_mode_org>>
    } else if(mode == MODE_CODE) {
        <<parse_mode_code>>
    } else if(mode == MODE_BEGINCODE) {
        <<parse_mode_begincode>>
    }
}

3.3.1 Parser Local Variables

The parsing stage requires a local variable called str to be used from time to time. Not sure where else to put this.

<<local_variables>>=

worgle_string str;

<<initialization>>=

worgle_string_init(&str);

line refers to the pointer address that the line will write to.

<<local_variables>>=

char *line;

<<initialization>>=

line = NULL;

pos refers to the current buffer position.

<<local_variables>>=

size_t pos;

<<initialization>>=

pos = 0;

This is the local variable read.

<<local_variables>>=

size_t read;

3.3.2 Reading a line at a time

Despite being loaded into memory, the program still reads in code one line at a time. The parsing relies on new line feeds to denote the beginnings and endings of sections and code references.

Before reading the line, the line number inside worgle is incremented.

A special readline function has been written based on getline that reads lines of text from an allocated block of text. This function is called worgle_getline.

After the line has been read, the program checks the return code status. If all the lines of text have been read, the program breaks out of the while loop.

<<getline>>=

worg.linum++;
status = worgle_getline(buf, &line, &pos, &read, size);
if(!status) break;

<<function_declarations>>=

static int worgle_getline(char *fullbuf,
                  char **line,
                  size_t *pos,
                  size_t *line_size,
                  size_t buf_size);

fullbuf refers to the full text buffer.

line is a pointer where the current line will be stored.

pos is the current buffer position.

line_size is a variable written to that returns the size of the line. This includes the line break character.

buf_size is the size of the whole buffer.

<<functions>>=

static int worgle_getline(char *fullbuf,
                  char **line,
                  size_t *pos,
                  size_t *line_size,
                  size_t buf_size)
{
    size_t p;
    size_t s;
    *line_size = 0;
    p = *pos;
    *line = &fullbuf[p];
    s = 0;
    while(1) {
        s++;
        if(p >= buf_size) return 0;
        if(fullbuf[p] == '\n') {
            *pos = p + 1;
            *line_size = s;
            return 1;
        }
        p++;
    }
}

3.3.3 Parsing Modes

The parser is implemented as a relatively simple state machine, whose behavior shifts between parsing org-mode markup (MODE_ORG), and code blocks (MODE_BEGINCODE and MODE_CODE). The state machine makes a distinction between the start of a new code block (MODE_BEGINCODE), which provides information like the name of the code block and optionally the name of the file to tangle to, and the code block itself (MODE_CODE).

<<enums>>=

enum {
<<parse_modes>>
};

3.3.3.1 MODE_ORG
<<parse_modes>>=

MODE_ORG,

When the parser is in MODE_ORG, it is only searching for the start of the next named block. When it finds a match, it extracts the name, gets ready to begin the a new block, and changes the mode MODE_BEGINCODE.

<<parse_mode_org>>=

if(read >= 7) {
    if(!strncmp(line, "#+NAME:",7)) {
        mode = MODE_BEGINCODE;
        parse_name(line, read, &str);
        worgle_begin_block(&worg, &str);
    }
}

3.3.3.1.1 Extracting information from #+NAME

Name extraction of the current line is done with a function called parse_name.

<<static_function_declarations>>=

static int parse_name(char *line, size_t len, worgle_string *str);

# TODO: words here

<<functions>>=

static int parse_name(char *line, size_t len, worgle_string *str)
{
    size_t n;
    size_t pos;
    int mode;
    line+=7;
    len-=7;
    /* *namelen = 0; */
    str->size = 0;
    str->str = NULL;
    if(len <= 0) return 1;
    pos = 0;
    mode = 0;
    for(n = 0; n < len; n++) {
        if(mode == 2) break;
        switch(mode) {
            case 0:
                if(line[n] == ' ') {
                } else {
                    str->str = &line[n];
                    str->size++;
                    pos++;
                    mode = 1;
                }
                break;
            case 1:
                if(line[n] == 0xa) {
                    mode = 2;
                    break;
                }
                pos++;
                str->size++;
                break;
            default:
                break;
        }
    }
    /* *namelen = pos; */
    return 1;
}

3.3.3.1.2 Beginning a new block

A new code block is started with the function worgle_begin_block.

<<function_declarations>>=

void worgle_begin_block(worgle_d *worg, worgle_string *name);

When a new block begins, the current block in Worgle is set to be a value retrieved from the block dictionary.

<<functions>>=

void worgle_begin_block(worgle_d *worg, worgle_string *name)
{
    worg->curblock = worgle_hashmap_get(&worg->dict, name);
}

3.3.3.2 MODE_BEGINCODE
<<parse_modes>>=

MODE_BEGINCODE,

A parser set to mode MODE_BEGINCODE is only interested in finding the beginning block. If it doesn't, it returns a syntax error. If it does, it goes on to extract a potential new filename to tangle, which then gets appended to the Worgle file list.

<<parse_mode_begincode>>=

if(read >= 11) {
    if(!strncmp(line, "#+BEGIN_SRC",11)) {
        <<begin_the_code>>
        if(parse_begin(line, read, &str) == 2) {
            worgle_append_file(&worg, &str);
        }
        continue;
    } else {
        fwrite(line, read, 1, stderr);
        fprintf(stderr, "line %lu: Expected #+BEGIN_SRC\n", worg.linum);
        rc = 1;
        break;
    }
}
fprintf(stderr, "line %lu: Expected #+BEGIN_SRC\n", worg.linum);
rc = 1;

3.3.3.2.1 Extracting information from #+BEGIN_SRC

The begin source flag in org-mode can have a number of options, but the only one we really care about for this tangler is the ":tangle" option.

<<function_declarations>>=

static int parse_begin(char *line, size_t len, worgle_string *str);

The state machine begins right after the BEGIN_SRC declaration, which is why the string is offset by 11.

The state machine for this parser is linear, and has 5 modes:

- mode 0: Skip whitespace after BEGIN_SRC - mode 1: Find ":tangle" pattern - mode 2: Ignore imediate whitespace after "tangle", and begin getting filename - mode 3: Get filename size by reading up to the next space or line break - mode 4: Don't do anything, wait for line to end.

<<functions>>=

static int parse_begin(char *line, size_t len, worgle_string *str)
{
    size_t n;
    int mode;
    int rc;
    line += 11;
    len -= 11;
    if(len <= 0) return 0;
    mode = 0;
    n = 0;
    rc = 1;
    str->str = NULL;
    str->size = 0;
    while(n < len) {
        switch(mode) {
            case 0: /* initial spaces after BEGIN_SRC */
                if(line[n] == ' ') {
                    n++;
                } else {
                    mode = 1;
                }
                break;
            case 1: /* look for :tangle */
                if(line[n] == ' ') {
                    mode = 0;
                    n++;
                } else {
                    if(line[n] == ':') {
                        if(!strncmp(line + n + 1, "tangle", 6)) {
                            n+=7;
                            mode = 2;
                            rc = 2;
                        }
                    }
                    n++;
                }
                break;
            case 2: /* save file name, spaces after tangle */
                if(line[n] != ' ') {
                    str->str = &line[n];
                    str->size++;
                    mode = 3;
                }
                n++;
                break;
            case 3: /* read up to next space or line break */
                if(line[n] == ' ' || line[n] == '\n') {
                    mode = 4;
                } else {
                    str->size++;
                }
                n++;
                break;
            case 4: /* countdown til end */
                n++;
                break;
        }
    }
    return rc;
}

3.3.3.2.2 Setting up code for a new read

When a new codeblock has indeed been found, the mode is switched to MODE_CODE, and the block_started boolean flag gets set. In addition, the string used to keep track of the new block is reset.

<<begin_the_code>>=

mode = MODE_CODE;
worg.block_started = 1;
worgle_string_reset(&worg.block);

3.3.3.2.3 Appending a new file

If a new file is found, the filename gets appended to the file list via the function worgle_append_file.

<<function_declarations>>=

void worgle_append_file(worgle_d *worg, worgle_string *filename);

<<functions>>=

void worgle_append_file(worgle_d *worg, worgle_string *filename)
{
    worgle_filelist_append(&worg->flist, filename, worg->curblock);
}

3.3.3.3 MODE_CODE
<<parse_modes>>=

MODE_CODE

In MODE_CODE, actual code is parsed inside of the code block. The parser will keep reading chunks of code until one of two things happen: a code reference is found, or the END_SRC command is found.

<<parse_mode_code>>=

if(read >= 9) {
    if(!strncmp(line, "#+END_SRC", 9)) {
        mode = MODE_ORG;
        worg.block_started = 0;
        worgle_append_string(&worg);
        continue;
    }
}
if(check_for_reference(line, read, &str)) {
    worgle_append_string(&worg);
    worgle_append_reference(&worg, &str);
    worg.block_started = 1;
    worgle_string_reset(&worg.block);
    continue;
}
worg.block.size += read;
if(worg.block_started) {
    worg.block.str = line;
    worg.block_started = 0;
    worg.curline = worg.linum;
}

<<function_declarations>>=

void worgle_append_string(worgle_d *worg);

<<functions>>=

void worgle_append_string(worgle_d *worg)
{
    if(worg->curblock == NULL) return;
    worgle_block_append_string(worg->curblock,
                              &worg->block,
                              worg->curline,
                              &worg->filename);
}

<<function_declarations>>=

void worgle_append_reference(worgle_d *worg, worgle_string *ref);

<<functions>>=

void worgle_append_reference(worgle_d *worg, worgle_string *ref)
{
    if(worg->curblock == NULL) return;
    worgle_block_append_reference(worg->curblock,
                                 ref,
                                 worg->linum,
                                 &worg->filename);
}

<<static_function_declarations>>=

static int check_for_reference(char *line , size_t size, worgle_string *str);

<<functions>>=

static int check_for_reference(char *line , size_t size, worgle_string *str)
{
    int mode;
    size_t n;
    mode = 0;
    str->size = 0;
    str->str = NULL;
    for(n = 0; n < size; n++) {
        if(mode < 0) break;
        switch(mode) {
            case 0: /* spaces */
                if(line[n] == ' ') continue;
                else if(line[n] == '<') mode = 1;
                else mode = -1;
                break;
            case 1: /* second < */
                if(line[n] == '<') mode = 2;
                else mode = -1;
                break;
            case 2: /* word setup */
                str->str = &line[n];
                str->size++;
                mode = 3;
                break;
            case 3: /* the word */
                if(line[n] == '>') {
                    mode = 4;
                    break;
                }
                str->size++;
                break;
            case 4: /* last > */
                if(line[n] == '>') mode = 5;
                else mode = -1;
                break;
        }
    }
    return (mode == 5);
}

3.4 Generation

The last phase of the program is code generation.

A parsed file generates a structure of how the code will look. The generation stage involves iterating through the structure and producing the code.

Due to the hierarchical nature of the data structures, the generation stage is surprisingly elegant with a single expanding entry point.

At the very top, generation consists of writing all the files in the filelist. Each file will then go and write the top-most block associated with that file. A block will then write the segment list it has embedded inside of it. A segment will either write a string literal to disk, or a recursively expand block reference.

<<generation>>=

if(!rc) if(!worgle_generate(&worg)) rc = 1;

If the use_warnings flag is turned on, Worgle will scan the dictionary after generation and flag warnings about any unused blocks.

<<generation>>=

if(use_warnings) rc = worgle_warn_unused(&worg);

3.5 Cleanup

At the end up the program, all allocated memory is freed via worgle_free.

<<cleanup>>=

worgle_free(&worg);
return rc;

<<function_declarations>>=

int worgle_generate(worgle_d *worg);

<<functions>>=

int worgle_generate(worgle_d *worg)
{
    return worgle_filelist_write(&worg->flist, &worg->dict);
}

4 Core Data Structures

The Worgle/Orgle program is very much a data-structure driven program. Understanding the hierarchy of data here will provide a clear picture for how the tangling works.

<<structs>>=

<<worgle_string>>
<<worgle_segment>>
<<worgle_block>>
<<worgle_blocklist>>
<<worgle_hashmap>>
<<worgle_file>>
<<worgle_filelist>>
<<worgle_struct>>

4.1 Top Level Struct

All Worgle operations are contained in a top-level struct called worgle_d. For the most part, this struct aims to be dynamically populated.

<<worgle_struct>>=

typedef struct {
    <<worgle_struct_contents>>
} worgle_d;

4.1.1 Worgle Initialization

Worgle data is initialized using the function worgle_init.

<<function_declarations>>=

void worgle_init(worgle_d *worg);

<<functions>>=

void worgle_init(worgle_d *worg)
{
<<worgle_init>>
}

4.1.2 Worgle Deallocation

When worgle is done, the program deallocates memory using the function worgle_free.

<<function_declarations>>=

void worgle_free(worgle_d *worg);

<<functions>>=

void worgle_free(worgle_d *worg)
{
<<worgle_free>>
}

4.1.3 Worgle Data

4.1.3.1 Current Block Name

The name of current block being parsed is stored in a variable called block. # this needs to be renamed. # this needs to be explained better: why not use the name in curblock?

<<worgle_struct_contents>>=

worgle_string block; /* TODO: rename */

It is initialized to be an empty string.

<<worgle_init>>=

worgle_string_init(&worg->block);

4.1.3.2 Current Line

The starting line number of the current block is stored in a variable called curline.

<<worgle_struct_contents>>=

size_t curline;

The current line is initialized to be negative value to mark that it has not been set yet.

<<worgle_init>>=

worg->curline = -1;

4.1.3.3 Block Started Flag

The block started flag is used by the parser to check whether or not a code block was started on the last iteration.

<<worgle_struct_contents>>=

int block_started;

It is set to be FALSE (0).

<<worgle_init>>=

worg->block_started = 0;

4.1.3.4 Dictionary

All code blocks are stored in a dictionary, also referred to here as a hash map.

<<worgle_struct_contents>>=

worgle_hashmap dict;

The dictionary is initialized using the function worgle_hashmap_init.

<<worgle_init>>=

worgle_hashmap_init(&worg->dict);

When free-ing time comes around, the hashmap will free itself using the function worgle_hashmap_free.

<<worgle_free>>=

worgle_hashmap_free(&worg->dict);

4.1.3.5 File List

All files to be written to are stored in a local file list called flist.

<<worgle_struct_contents>>=

worgle_filelist flist;

Initialization.

<<worgle_init>>=

worgle_filelist_init(&worg->flist);

Destruction.

<<worgle_free>>=

worgle_filelist_free(&worg->flist);

4.1.3.6 Text Buffer

The text file parsed is loaded into memory and stored into a buffer called buf

# TODO: put this into a data struct so that multiple org files can be read at # once

<<worgle_struct_contents>>=

char *buf;

The loaded happens after initialization, so the buffer is set to be NULL for now.

<<worgle_init>>=

worg->buf = NULL;

If the buffer is non-null, the memory will be freed.

<<worgle_free>>=

if(worg->buf != NULL) free(worg->buf);

4.1.3.7 Current Block

A pointer to the currently populated code block is stored in a variable called curblock.

<<worgle_struct_contents>>=

worgle_block *curblock;

There is no block on startup, so set it to be NULL.

<<worgle_init>>=

worg->curblock = NULL;

4.1.3.8 Line Number

The currently parsed line number is stored in a variable called linum.

<<worgle_struct_contents>>=

size_t linum;

The line number is incremented, so the starting value starts at 0. Line 1 is the first line. Do not be tempted to set this to be -1, because it won't work.

<<worgle_init>>=

worg->linum = 0;

4.1.3.9 Filename

The filename is stored inside of a worgle string called filename.

<<worgle_struct_contents>>=

worgle_string filename;

This values does not get set on init, but it is zeroed out and initialized.

<<worgle_init>>=

worgle_string_init(&worg->filename);

4.2 String

A string is a wrapper around a raw char pointer and a size. This is used as the base string literal.

<<worgle_string>>=

typedef struct {
    char *str;
    size_t size;
} worgle_string;

4.2.1 Reset or initialize a string

Strings in worgle are reset with the function worgle_string_reset.

<<function_declarations>>=

void worgle_string_reset(worgle_string *str);

<<functions>>=

void worgle_string_reset(worgle_string *str)
{
    str->str = NULL;
    str->size = 0;
}

A string being initialized is identical to a string being reset. The function worgle_string_init is just a wrapper around worgle_string_reset.

<<function_declarations>>=

void worgle_string_init(worgle_string *str);

<<functions>>=

void worgle_string_init(worgle_string *str)
{
    worgle_string_reset(str);
}

4.2.2 Writing a String

A string is written to a particular filehandle with the function worgle_string_write.Worgle strings are not zero-terminated and can't be used in functions like printf.

<<function_declarations>>=

int worgle_string_write(FILE *fp, worgle_string *str);

This function is a wrapper around a call to fwrite.

<<functions>>=

int worgle_string_write(FILE *fp, worgle_string *str)
{
    return fwrite(str->str, 1, str->size, fp);
}

4.3 Segment

A segment turns a string into a linked list component that has a type. A segment type flag can either be a text chunk or a reference.

<<worgle_segment>>=

enum {
<<worgle_segment_types>>
};
typedef struct worgle_segment {
    int type;
    worgle_string str;
    <<worgle_segment_line_control>>
    struct worgle_segment *nxt;
} worgle_segment;

Segments also keep track of where they are in the original org file. This information can be used to generate line control preprocessor commands for C/C++.

<<worgle_segment_line_control>>=

size_t linum;
worgle_string *filename;

4.3.1 Text Chunk Type

A text chunk is a literal string of text.

When a text chunk segment is processed, it gets written to file directly.

<<worgle_segment_types>>=

SEGTYPE_TEXT,

4.3.2 Reference Type

A reference contains a string reference to another block.

When a reference segment gets processed, it looks up the reference and processes all the segements in that code block.

<<worgle_segment_types>>=

SEGTYPE_REFERENCE

4.3.3 Initializing a Segment

A segment is initialized with the function worgle_segment_init.

<<function_declarations>>=

void worgle_segment_init(worgle_segment *s,
                        int type,
                        worgle_string *str,
                        worgle_string *filename,
                        size_t linum);

<<functions>>=

void worgle_segment_init(worgle_segment *s,
                        int type,
                        worgle_string *str,
                        worgle_string *filename,
                        size_t linum)
{
   s->type = type;
   s->str = *str;
   s->filename = filename;
   s->linum = linum;
}

4.3.4 Writing a Segment

A segment is written to a file handle using the function worgle_segment_write. In addition to taking in a filehandle and segment, a hashmap is also passed in in the event that the segment is a reference.

On sucess, the function returns TRUE (1). On failure, FALSE (0).

<<function_declarations>>=

int worgle_segment_write(worgle_segment *s, worgle_hashmap *h, FILE *fp);

Different behaviors happen depending on the segment type.

If the segment is a chunk of text (SEGTYPE_TEXT), then the string is written. If the use_debug global variable is enabled, then C preprocessor macros are written indicating the position from the original file. This only needs to happen for text blocks and not references.

If the segment is a reference (SEGTYPE_REFERENCE), the function attempts to look up a block and write it to disk. If it cannot find the reference, a warning is flashed to screen. If the warning mode is soft, the error code returns TRUE. If warning errors are turned on, it returns FALSE.

<<functions>>=

int worgle_segment_write(worgle_segment *s, worgle_hashmap *h, FILE *fp)
{
    worgle_block *b;
    if(s->type == SEGTYPE_TEXT) {
        if(use_debug) {
            fprintf(fp, "#line %lu \"", s->linum);
            worgle_string_write(fp, s->filename);
            fprintf(fp, "\"\n");
        }
        worgle_string_write(fp, &s->str);
    } else {
        if(!worgle_hashmap_find(h, &s->str, &b)) {
            fprintf(stderr, "Warning: could not find reference segment '");
            worgle_string_write(stderr, &s->str);
            fprintf(stderr, "'\n");
            if(use_warnings == 2) {
                return 0;
            } else {
                return 1;
            }
        }
        return worgle_block_write(b, h, fp);
    }
    return 1;
}

4.4 Code Block

A code block is a top-level unit that stores some amount of code. It is made up of a list of segments. Every code block has a unique name.

<<worgle_block>>=

typedef struct worgle_block {
    int nsegs;
    worgle_segment *head;
    worgle_segment *tail;
    worgle_string name;
    int am_i_used;
    struct worgle_block *nxt;
} worgle_block;

4.4.1 Initializing a code block

A worgle code block is initialized using the function worgle_block_init.

<<function_declarations>>=

void worgle_block_init(worgle_block *b);

The initialization will zero out all the variables related to the segment linked list, as well as initialize the string holding the name of the block.

<<functions>>=

void worgle_block_init(worgle_block *b)
{
    b->nsegs = 0;
    b->head = NULL;
    b->tail = NULL;
    b->nxt = NULL;
    b->am_i_used = 0;
    worgle_string_init(&b->name);
}

4.4.2 Freeing a code block

A code block is freed using the function worgle_block_free.

<<function_declarations>>=

void worgle_block_free(worgle_block *lst);

This function iterates through the segment linked list contained inside the block, and frees each one. Since there is nothing to free below a segment, the standard free function is called directly.

<<functions>>=

void worgle_block_free(worgle_block *lst)
{
    worgle_segment *s;
    worgle_segment *nxt;
    int n;
    s = lst->head;
    for(n = 0; n < lst->nsegs; n++) {
        nxt = s->nxt;
        free(s);
        s = nxt;
    }
}

4.4.3 Appending a segment to a code block

A generic segment is appended to a code block with the function. worgle_block_append_segment. The block b, name of the block str, and type type are mandatory parameters which describe the segment. The location in the file is also required, so the line number linum and name of file filename are also provided as well. This function is called inside of a type-specific append function instead of being called directly.

<<function_declarations>>=

void worgle_block_append_segment(worgle_block *b,
                                worgle_string *str,
                                int type,
                                size_t linum,
                                worgle_string *filename);

It is worthwhile to note that it is in this function that a data segment type gets allocated.

<<functions>>=

void worgle_block_append_segment(worgle_block *b,
                                worgle_string *str,
                                int type,
                                size_t linum,
                                worgle_string *filename)
{
    worgle_segment *s;
    s = malloc(sizeof(worgle_segment));
    if(b->nsegs == 0) {
        b->head = s;
        b->tail = s;
    }
    worgle_segment_init(s, type, str, filename, linum);
    b->tail->nxt = s;
    b->tail = s;
    b->nsegs++;
}

4.4.3.1 Appending a string segment

A string segment is appended to a code block using the function worgle_block_append_string.

<<function_declarations>>=

void worgle_block_append_string(worgle_block *b,
                               worgle_string *str,
                               size_t linum,
                               worgle_string *filename);

<<functions>>=

void worgle_block_append_string(worgle_block *b,
                               worgle_string *str,
                               size_t linum,
                               worgle_string *filename)
{
    worgle_block_append_segment(b, str, SEGTYPE_TEXT, linum, filename);
}

4.4.3.2 Appending a reference segment

A reference segment is appended to a code block using the function worgle_block_append_reference.

<<function_declarations>>=

void worgle_block_append_reference(worgle_block *b,
                                  worgle_string *str,
                                  size_t linum,
                                  worgle_string *filename);

<<functions>>=

void worgle_block_append_reference(worgle_block *b,
                                  worgle_string *str,
                                  size_t linum,
                                  worgle_string *filename)
{
    worgle_block_append_segment(b, str, SEGTYPE_REFERENCE, linum, filename);
}

4.4.4 Appending a code block to a code block

In both CWEB and Org-tangle, existing code blocks can be appeneded to in different sections. Because of how this program works, we get this functionality for free!

4.4.5 Writing a code block to filehandle

Writing a code block to a filehandle can be done using the function worgle_block_write. In addition to the file handle fp, an org block requires a hashmap, which is required in the lower level function orgle_segment_write for expanding code references.

This function returns a boolean TRUE (1) on success or FALSE (0) on failure.

<<function_declarations>>=

int worgle_block_write(worgle_block *b, worgle_hashmap *h, FILE *fp);

A code block iterates it's segment list, writing each segment to disk. A block will also be marked as being used, which is useful for supplying warning information later.

<<functions>>=

int worgle_block_write(worgle_block *b, worgle_hashmap *h, FILE *fp)
{
    worgle_segment *s;
    int n;
    s = b->head;
    b->am_i_used = 1;
    for(n = 0; n < b->nsegs; n++) {
        if(!worgle_segment_write(s, h, fp)) return 0;
        s = s->nxt;
    }
    return 1;
}

4.5 Code Block List

A code block list is a linked list of blocks, which is used inside of a hash map.

<<worgle_blocklist>>=

typedef struct {
    int nblocks;
    worgle_block *head;
    worgle_block *tail;
} worgle_blocklist;

4.5.1 Block List Initialization

A block list is initialized using the function worgle_blocklist_init.

<<function_declarations>>=

void worgle_blocklist_init(worgle_blocklist *lst);

<<functions>>=

void worgle_blocklist_init(worgle_blocklist *lst)
{
    lst->head = NULL;
    lst->tail = NULL;
    lst->nblocks = 0;
}

4.5.2 Freeing a Block List

Blocks allocated by the block list are freed using the function worgle_blocklist_free.

<<function_declarations>>=

void worgle_blocklist_free(worgle_blocklist *lst);

<<functions>>=

void worgle_blocklist_free(worgle_blocklist *lst)
{
    worgle_block *b;
    worgle_block *nxt;
    int n;
    b = lst->head;
    for(n = 0; n < lst->nblocks; n++) {
        nxt = b->nxt;
        worgle_block_free(b);
        free(b);
        b = nxt;
    }
}

4.5.3 Appending a Block

An allocated block is appended to a block list using the function worgle_blocklist_append.

<<function_declarations>>=

void worgle_blocklist_append(worgle_blocklist *lst, worgle_block *b);

<<functions>>=

void worgle_blocklist_append(worgle_blocklist *lst, worgle_block *b)
{
    if(lst->nblocks == 0) {
        lst->head = b;
        lst->tail = b;
    }
    lst->tail->nxt = b;
    lst->tail = b;
    lst->nblocks++;
}

4.6 Hash Map

A hash map is a key-value data structure used as a dictionary for storing references to code blocks.

<<worgle_hashmap>>=

#define HASH_SIZE 256
typedef struct {
    worgle_blocklist blk[HASH_SIZE];
    int nwords;
} worgle_hashmap;

4.6.1 Hash map Initialization

A hash map is initialized using the function worgle_hashmap_init

<<function_declarations>>=

void worgle_hashmap_init(worgle_hashmap *h);

A hashmap is composed of an array of block lists which must be initialized.

<<functions>>=

void worgle_hashmap_init(worgle_hashmap *h)
{
    int n;
    h->nwords = 0;
    for(n = 0; n < HASH_SIZE; n++) {
        worgle_blocklist_init(&h->blk[n]);
    }
}

4.6.2 Freeing a Hash Map

Information allocated inside the hash map is freed using the function worgle_hashmap_free.

<<function_declarations>>=

void worgle_hashmap_free(worgle_hashmap *h);

To free a hash map is to free each block list in the array.

<<functions>>=

void worgle_hashmap_free(worgle_hashmap *h)
{
    int n;
    for(n = 0; n < HASH_SIZE; n++) {
        worgle_blocklist_free(&h->blk[n]);
    }
}

4.6.3 Looking up an entry

A hashmap lookup can be done with the function worgle_hashmap_find. This will attempt to look for a value with the key value name, and save it in the block pointer b. If nothing is found, the function returns FALSE (0). On success, TRUE (1).

<<function_declarations>>=

int worgle_hashmap_find(worgle_hashmap *h, worgle_string *name, worgle_block **b);

<<functions>>=

<<hashmap_hasher>>
int worgle_hashmap_find(worgle_hashmap *h, worgle_string *name, worgle_block **b)
{
    int pos;
    worgle_blocklist *lst;
    int n;
    worgle_block *blk;
    pos = hash(name->str, name->size);
    lst = &h->blk[pos];
    blk = lst->head;
    for(n = 0; n < lst->nblocks; n++) {
        if(name->size == blk->name.size) {
            if(!strncmp(name->str, blk->name.str, name->size)) {
                *b = blk;
                return 1;
            }
        }
        blk = blk->nxt;
    }
    return 0;
}

Like any hashmap, a hashing algorithm is used to to compute which list to place the entry in. This is one I've used on a number of projects now.

<<hashmap_hasher>>=

static int hash(const char *str, size_t size)
{
    unsigned int h = 5381;
    size_t i = 0;
    for(i = 0; i < size; i++) {
        h = ((h << 5) + h) ^ str[i];
        h %= 0x7FFFFFFF;
    }
    return h % HASH_SIZE;
}

4.6.4 Getting an entry

To "get" an entry means to return a block if it exists or not. Return an entry that exists, or make a new one. This can be done with the function worgle_hashmap_get.

<<function_declarations>>=

worgle_block * worgle_hashmap_get(worgle_hashmap *h, worgle_string *name);

<<functions>>=

worgle_block * worgle_hashmap_get(worgle_hashmap *h, worgle_string *name)
{
    worgle_block *b;
    worgle_blocklist *lst;
    int pos;
    if(worgle_hashmap_find(h, name, &b)) return b;
    pos = hash(name->str, name->size);
    b = NULL;
    b = malloc(sizeof(worgle_block));
    worgle_block_init(b);
    b->name = *name;
    lst = &h->blk[pos];
    worgle_blocklist_append(lst, b);
    return b;
}

4.7 File

A worgle file is an abstraction for a single file worgle will write to. Every file has a filename, and a top-level code block. A worgle does not have a filehandle. Files will only be created at the generation stage.

<<worgle_file>>=

typedef struct worgle_file {
    worgle_string filename;
    worgle_block *top;
    struct worgle_file *nxt;
} worgle_file;

4.7.1 Writing A File to a filehandle

A file is writen to a filehandle using the function worgle_file_write. A hashmap is also required because it contains all the named code blocks needed for any code expansion.

<<function_declarations>>=

int worgle_file_write(worgle_file *f, worgle_hashmap *h);

A filehandle is opened, the top-most code block is written using worgle_block_write, and then the file is closed.

Because worgle strings are not zero terminated, they must be copied to a temporary string buffer with a null terminator. Any filename greater than 127 characters will be truncated.

<<functions>>=

int worgle_file_write(worgle_file *f, worgle_hashmap *h)
{
    FILE *fp;
    char tmp[128];
    size_t n;
    size_t size;
    int rc;
    if(f->filename.size > 128) size = 127;
    else size = f->filename.size;
    for(n = 0; n < size; n++) tmp[n] = f->filename.str[n];
    tmp[size] = 0;
    fp = fopen(tmp, "w");
    rc = worgle_block_write(f->top, h, fp);
    fclose(fp);
    return rc;
}

4.8 The File List

A file list is a linked list of worgle files.

<<worgle_filelist>>=

typedef struct {
    worgle_file *head;
    worgle_file *tail;
    int nfiles;
} worgle_filelist;

4.8.1 Initializing a file list

A file list is zeroed out and initialized using the function worgle_filelist_init.

<<function_declarations>>=

void worgle_filelist_init(worgle_filelist *flist);

<<functions>>=

void worgle_filelist_init(worgle_filelist *flist)
{
    flist->head = NULL;
    flist->tail = NULL;
    flist->nfiles = 0;
}

4.8.2 Freeing a file list

A filelist is freed using the function worgle_filelist_free.

<<function_declarations>>=

void worgle_filelist_free(worgle_filelist *flist);

<<functions>>=

void worgle_filelist_free(worgle_filelist *flist)
{
    worgle_file *f;
    worgle_file *nxt;
    int n;
    f = flist->head;
    for(n = 0; n < flist->nfiles; n++) {
        nxt = f->nxt;
        free(f);
        f = nxt;
    }
}

4.8.3 Appending a file to a file list

A file is appended to the file list using the function worgle_filelist_append. The name, as well as the well as the top-level code block are required here.

<<function_declarations>>=

void worgle_filelist_append(worgle_filelist *flist,
                           worgle_string *name,
                           worgle_block *top);

<<functions>>=

void worgle_filelist_append(worgle_filelist *flist,
                           worgle_string *name,
                           worgle_block *top)
{
    worgle_file *f;
    f = malloc(sizeof(worgle_file));
    f->filename = *name;
    f->top = top;
    if(flist->nfiles == 0) {
        flist->head = f;
        flist->tail = f;
    }
    flist->tail->nxt = f;
    flist->tail = f;
    flist->nfiles++;
}

4.8.4 Writing a filelist to disk

A file list can be appended using the function worgle_filelist_write.

A hashmap containing all named code blocks all that is required.

<<function_declarations>>=

int worgle_filelist_write(worgle_filelist *flist, worgle_hashmap *h);

<<functions>>=

int worgle_filelist_write(worgle_filelist *flist, worgle_hashmap *h)
{
    worgle_file *f;
    int n;
    f = flist->head;
    for(n = 0; n < flist->nfiles; n++) {
        if(!worgle_file_write(f, h)) return 0;
        f = f->nxt;
    }
    return 1;
}

5 Command Line Arguments

This section outlines command line arguments in Worgle.

5.1 Parsing command line flags

Command line argument parsing is done using the third-party library parg, included in this source distribution.

<<local_variables>>=

struct parg_state ps;
int c;

<<parse_cli_args>>=

parg_init(&ps);
while((c = parg_getopt(&ps, argc, argv, "gW:")) != -1) {
    switch(c) {
        case 1:
            filename = (char *)ps.optarg;
            break;
        case 'g':
            <<turn_on_debug_macros>>
            break;
        case 'W':
            <<turn_on_warnings>>
            break;
        default:
            fprintf(stderr, "Unknown option -%c\n", c);
            return 1;
    }
}

5.2 Turning on debug macros (-g)

Worgle has the ability to generate debug macros when generating C files.

This will turn on a boolean flag called use_debug inside the worgle struct.

<<turn_on_debug_macros>>=

use_debug = 1;

By default, use_debug is set to be false in order to allow other non-C languages to be used.

<<global_variables>>=

static int use_debug = 0;

5.3 Turning on Warnings (-W)

Worgle can print out warnings about things like unused sections of code. By default, this is turned off.

<<global_variables>>=

static int use_warnings = 0;

<<turn_on_warnings>>=

if(!strncmp(ps.optarg, "soft", 4)) {
    use_warnings = 1;
} else if(!strncmp(ps.optarg, "error", 5)) {
    use_warnings = 2;
} else {
    fprintf(stderr, "Unidentified warning mode '%s'\n", ps.optarg);
    return 1;
}

5.3.1 Checking for unused blocks

One thing that warnings can do is check for unused blocks. This is done after the files are generated with the function worgle_warn_unused.

<<function_declarations>>=

int worgle_warn_unused(worgle_d *worg);

<<functions>>=

int worgle_warn_unused(worgle_d *worg)
{
    worgle_hashmap *dict;
    worgle_block *blk;
    worgle_blocklist *lst;
    int n;
    int b;
    int rc;
    dict = &worg->dict;
    rc = 0;
    for(n = 0; n < HASH_SIZE; n++) {
        lst = &dict->blk[n];
        blk = lst->head;
        for(b = 0; b < lst->nblocks; b++) {
            if(blk->am_i_used == 0) {
                fprintf(stderr, "Warning: block '");
                worgle_string_write(stderr, &blk->name);
                fprintf(stderr, "' unused.\n");
                if(use_warnings == 2) rc = 1;
            }
            blk = blk->nxt;
        }
    }
    return rc;
}