Port dagzet to rust

Port dagzet to rust

task id: dagzet-rust

2024-07-03 08:35: I really need to rewrite dagzet #dagzet-rust

My lua implementation was always designed to just be a prototype, and nowadays it's getting quite heavy. Now that I'm a bit more familiar with the language, I don't think it'd be too difficult to build it in Rust, tbh. I think the standard library has enough rich data types that it should be pretty straightforward.

2024-07-11 08:39: Some initial boilerplate rust code would be nice to set up today if there's time #dagzet-rust

2024-07-12 08:14: Did not get to this yesterday. Today maybe? #dagzet-rust

2024-07-12 09:16: initial thoughts #dagzet-rust #timelog:00:18:23

My hope is that the lua implementatoin is trivial enough that I can bring it over to rust without too many complications. I will outline some of the broad strokes steps required to get this program up and running.

While writing these thoughts

The first thing I'll need to be able to do is read a file from disk, possibly entirely into memory. The dagzet parser works line by by line, so if there's some iterator abstraction that allows me be to do this, great.

For parsing commands, I need to be able to read the first three characters of each line to determine the command code. Commands are two characters followed by a space, with the rest of the line being being arguments.

Commands need to map to functions which can parse the argument data of that line, and potentially append or modifiy a rust data struct representing the graph being built up somehow. But, just setting up a convenient way to map commands to functions would be great. There will be quite a few, and I often find myself wanting to add more commands to meet my needs. I made use of Lua tables to create a look-up table of callbacks. Hopefully I can do a similar thing in Rust without too much fuss?

I need to be able to split data up by spaces. Lua does not have a built-in split() like you'd see in other languages. I'm hoping Rust standard library has one somewhere (this functionality should be standard not outsourced to a crate right? right?!)

I need a top-level data struct that can be populated with information that is parsed. I the lua implementation, I used tables: usually as a hashmap or array structure, sometimes an array of array. IIRC Rust STD has hashmaps, and vectors I think should be enough. I can't foresee too many ownership issues due to how imperative this is, but who knows.

I need to implement topological sort (Kahn's algorithm). My lua approach used node IDs instead of references, so I think this is going to be mostly Rust-friendly to port. Still, I get the feeling that I may be forgetting something that will be a pain point with Rust.

When it has been determined that the graph contains no cycles, generate the SQLite code to standard output. This seems pretty straight forward to me, more or less going through the generated struct (read-only) and printing equivalent SQLite code.

2024-07-12 09:45: Starting initial boilerplate code. #dagzet-rust #timelog:01:07:29

2024-07-12 09:46: get it to read lines of a file #dagzet-rust

Going to try to use neovim for this now.

tangent: trying to get auto-import working. Found: <<neovim/nvim_cmp>>.

how to get this working with lazy, README only has vim-plug?

Wait, it's already installed, according to the Lazy control panel. What does InsertEnter mean?

Okay, I get the recommendations, but I don't know how to insert it.

Got it! Typing "File" then hitting ctrl-y does it.



2024-07-12 10:19: haha. flopping around with rust compiler on trivial things. nailing it. #dagzet-rust

2024-07-12 10:24: give it a file on the command line #dagzet-rust

2024-07-12 10:39: Parse lines, find their command code. #dagzet-rust

2024-07-12 10:51: That's some good enough boilerplate. #dagzet-rust

I have it parsing command codes in the test file, and there is some placeholder stuff where I can eventually do stuff with those commands.

2024-07-15 08:45: As it turns out, there was no time yesterday. #dagzet-rust

2024-07-15 09:50: dagzet in rust today #dagzet-rust #timelog:01:24:57

2024-07-15 09:53: Beginning initial top-level struct #dagzet-rust

Eventually this will be populated with data from the commands.

2024-07-15 10:02: Now would be a good time to figure out rust docstrings #dagzet-rust

Being able to add in-line descriptions of struct contents that get rendered to rust documentation would be helpful.

2024-07-15 10:44: namespace and graph remarks mostly figured out #dagzet-rust

I have a incremental TDD approach to porting this, which feels nice.

2024-07-15 10:45: Make use of =Option= to indicate uninitialized values #dagzet-rust

2024-07-15 11:14: Better error handling, new node command incomplete #dagzet-rust

I'm making use of Result and putting return codes into a single Enum. This is just mirroring how I'd do it in C. Hopefully it's idiomatic enough in Rust.

2024-07-15 15:49: More work on dagzet in rust #dagzet-rust #timelog:00:35:58

2024-07-15 15:50: Let's run clippy on my dagzet project so far #dagzet-rust

Turns out it doesn't have a lot to say. Voxbox, on the other hand, is a mess.

2024-07-15 16:19: finished new node command, now lines. #dagzet-rust

2024-07-15 20:25: back to dagzet #dagzet-rust #timelog:00:22:25

added a lines command. now I'm very tired.

2024-07-17 16:06: Attempting to implement connect #dagzet-rust #timelog:01:07:59

Connect adds a pair of edges to the graph.

These are set to be local connections. ID values work just fine here.

I'm curious about knowing how I did cx? Was that another structure? Am I checking cycles there?

cx is another structure and loops/cycles are not= being checked for.

2024-07-17 16:37: I think connections don't check for existing nodes until the end? #dagzet-rust

My original dagzet implementation reperesents the dagzet connections as strings that are only checked at the end of parsing when all the nodes have been added. Only then does it attempt to resolve the symbols. This makes for a more permissive parser.

My dagzet parser does the topological sort with the strings. I think it would be better to resolve those chunks into local ID values before the sort.

2024-07-17 17:09: Connections work, but unverified #dagzet-rust

This is intentional. Verifications are defferred until everything is parsed. Verification is beyond the responsibility of connection.

There is a linear check for already existing connections, and a test is put into place to make sure this is all properly caught.

2024-07-17 17:13: Shortcuts are not yet supported in co #dagzet-rust

Adding TODOs for that.

I've implemented enough today.

2024-07-18 08:47: dagzet could bea good portfolio project #dagzet-rust

It's got a good scope, it's got data structures, and I'm putting in the time to incrementally test as I go.

This is also a migration project. If I do this right, I should be able to drop in the dagzet program and have it replace my adhoc lua code for my knowledge tree generator, which I'm using to power the knowledge graph here.

2024-07-18 09:34: connection shortcuts #dagzet-rust #timelog:01:03:58

2024-07-18 09:39: oh no, I'm getting my task tags wrong #dagzet-rust

2024-07-18 09:42: back on track. #dagzet-rust

2024-07-18 10:00: Reminding myself why string hashmaps are used for nodes #dagzet-rust

hashmaps are used as the data structure to ensure that a node isn't created twice.

if I wanted to get a node name from a string, how would I do that? Right now, the answer seems to be enumerate through all the keys and find an answer. I could also just make a separate inverse lookup table out of a vector.

2024-07-18 10:20: I made an inverse lookup table #dagzet-rust

It's a memory hit, and there's room for the eventual possibility that the tables will go out of sync. But, it's good enough for now. I imagine I'll be doing this lookup operatoin quite a few times I don't want to do a linear sweep every time.

2024-07-18 10:50: Up next: connection remarks #dagzet-rust

Variation on a theme, I suspect. Unplugging and getting lunch now.

2024-07-19 08:12: Connection remarks next, then what? #dagzet-rust

My goal is to use this rust port as a drop-in replacement for my current dagzet program. So, a good thing to do would be to write a program that scrapes all the current commands used in my RC dagzet instance.

I also have more than enough here to begin working on implementing topological sort and SQLite code generation.

2024-07-19 09:04: Connection remarks in dagzet #dagzet-rust #timelog:01:02:06

2024-07-19 09:06: Connection remarks notes: more difficult than expected #dagzet-rust

This is more difficult than I expected. A connection remark needs to reference a connection (the last connection made). So, how does one reasonably accomplish this?

Since the connections are a pair, one approach is to use a 2-dimensional hashmap of values. It feels like a lot more space is used up here. It also duplicates some of the logic in the connections hashmap.

Another thing to do is to somehow reference the connection. An actual Rust memory reference could cause the borrow checker to be grumpy. Using unique ID values to reference each connection, and having that be a key in a hash table could work. This could become a bookkeeping problem if I ever added the ability to remove connections, but there are no plans to add such a feature.

Connection ID can be their position in the "connections" vector.

Since there is no way to selecting a specific connection, it is reasonable to assume that the last connection will always be implicitely be what remarks are being written to. So, the ID will always be length of the vector (minus one for the index).

2024-07-19 09:23: Making tests for connection remarks #dagzet-rust

2024-07-19 09:33: Implementing connection remarks #dagzet-rust

2024-07-19 09:59: I am weirdly stumped as to why this test is failing #dagzet-rust

It was silently erroring. I needed to add a namespace.

2024-07-19 10:16: Time for unknown nodes. #dagzet-rust #timelog:00:21:57

2024-07-19 10:20: before topsort, but unknown nodes #dagzet-rust

I initially thought I was ready for topsort and cycle checking. But unknown node resolution should happen first.

An edge list should be created from the connections for the topsort. But those should all be resolved ID values. "Check unknown" nodes should return a set of nodes that do not exist in the node.

Rust does have a set in their library! Huzzah! Called HashSet. That's what I need.

2024-07-19 10:26: Building test for unknown node checker #dagzet-rust

2024-07-19 10:37: Implementing unknown nodes. #dagzet-rust

2024-07-19 10:39: Begin cycle checking #dagzet-rust #timelog:00:22:41

2024-07-19 10:42: How to make this a testable component? #dagzet-rust

The trick here is that the connections need to be resolved, which is another step to test independently.

A connection list, after it has been verified, should be turned into an edge list. This function is still allowed to error out. This edge list then gets passed in to the cycle checker.

Note: Topsort makes use of sets that will expand and grow. I expect to dynamically generate these inside the function.

Generating an edgelist should come first. Panic on missing nodes for now. that can be fixed later.

The top-level cycle checker should be a method called check_for_cycles. The original implementation was able to get some information on these cycles.

Looks like the original topsort populates a "loops found" list. I will do the same, only it'll be a HashSet. On success it will return an Ok, otherwise an error code.

2024-07-19 10:51: Implementing initial cycle checker test #dagzet-rust

2024-07-19 11:02: Placeholder tests and functions in place. #dagzet-rust

Implementation will come next. Signing off for now.

2024-07-20 12:14: Work on loop checker #dagzet-rust #timelog:00:54:23

2024-07-20 13:22: Loop checker passes test. I think it works. #dagzet-rust

2024-07-20 13:52: Begin SQLite code generation primitives #dagzet-rust

2024-07-20 13:53: Initial thoughts #dagzet-rust

Most of this boils down to string generation. It'd be good to have some intermediate structures before that.

Being able to generate sqlite schemas is a good start. A table as named values, each with a type.

It would be nice to have a consistent ordering of these names, for things like generating insert statements.

A table itself also has a distinct name.

stringify() could be a behavior that types and tables implement.

Ah, stringify is already a behavior the standard libary implements.

2024-07-20 14:04: Making a new file. Thinking about SQLite params. #dagzet-rust

2024-07-20 14:30: Got some tests going. Now working on a table. #dagzet-rust

2024-07-20 14:51: Now, how about some insert statements #dagzet-rust #timelog:02:22:15

The best thing would be to have some struct map to the VALUES part in an INSERT statement. A method on the table, such as sqlize_insert, could then take in a pointer to a struct implementing that row that's able to generate values.

Trying to un-confuse myself: supposing I had table A. I'd want to make it so A.sqlize_insert(row) would generate an insert statement from data in row. Another table B attempting to call B.sqlize_insert(row) would get a compiler error due a type mismatch.

Generics for trait? Like this trait can only work when the type is A, not B.

2024-07-20 15:07: Attempting initial insert row logic #dagzet-rust

2024-07-20 15:18: My table abstraction could be better #dagzet-rust

The table schema itself needs to be a concrete type, which it is not right now. If that happens, then I can make a row type for that table.

2024-07-20 15:21: Make table a concrete type. #dagzet-rust

2024-07-20 15:33: Working backwards. #dagzet-rust

Define the interface that I want to see in a test, then work backwards from there.

2024-07-20 15:36: Maybe what I want are phantom types? #dagzet-rust

See: <<rust/phantom_types>>.

2024-07-20 15:55: The code feels close, but Rust compiler still doesn't like it #dagzet-rust

2024-07-20 16:02: Hooked something up using phantom types #dagzet-rust

2024-07-20 16:13: Tests passed. I believe this is what I want? #dagzet-rust

2024-07-20 16:32: start working on the CLI, generate nodes table #dagzet-rust #timelog:00:28:30

Real quick stuff.

2024-07-20 17:01: Some things print. #dagzet-rust

Lots of elaboration needed, but it's a great start. Things work.

2024-07-20 18:35: Get more things to print #dagzet-rust #timelog:02:10:58

2024-07-20 18:41: IntegerPrimaryKey shouldn't be used in insert. #dagzet-rust

2024-07-20 18:56: Oh goodness. These generics are beginning to be chaotic #dagzet-rust

I'm trying to rework the nodes table code into something more re-usable.

It feels like it's just starting to slip into something out of my grasp.

2024-07-20 18:59: Ah okay, the rust syntax gets weirder too #dagzet-rust

This makes me feel better. It has an easy code smell.

Make this an interface somehow?

2024-07-20 19:03: This isn't working either. #dagzet-rust

you know what they are just going to be functions. refactor later.

2024-07-20 19:12: rework things to use writer instead of print! #dagzet-rust

I'm hoping to use the trick in C where the file handle can be standard out as well as a file.

2024-07-20 19:38: writer works, but I can't get things to abstract well #dagzet-rust

2024-07-20 19:49: A useable interface, now to move stuff out of main #dagzet-rust

2024-07-20 19:54: Make clippy happy again #dagzet-rust

2024-07-20 20:02: Finally, let's get another table in there. #dagzet-rust

Let's try the connections table.

2024-07-20 20:11: Add the lines table. #dagzet-rust

2024-07-20 20:47: Lines work. This is a great stopping point #dagzet-rust

Most of the hard stuff is accomplished I think.

2024-07-21 09:44: Figuring out which commands I need to make #dagzet-rust #timelog:00:14:09

I just need enough to generate my RC dagzet and replace the lua implementation.

The results of my shell-ing:

$ cat *.dz | awk '{print $1}' | grep -v "^$" | sort | uniq -c
    127 co
     50 cr
     15 cx
     59 fr
     38 gr
    165 hl
    357 ln
    280 nn
     40 ns
     96 rm
      1 sn
     15 td
     76 tg
      4 zz

Done: co, cr, ln, gr, ln, nn, ns.

TODO: cx, fr, hl, rm, sn, td, tg, zz

Wow that's more than I expected. And there's going to be a non-trivial amount of time troubleshooting the SQLite generated code because of course it's not going to work on the first try.

More slow, incremental testing?

No, the errors are going to come from incorrectly generated SQLite code, and SQL code validation is outside the scope of dagzet.

2024-07-21 10:00: Let's port some more commands. #dagzet-rust #timelog:01:05:39

2024-07-21 10:03: graph remarks table generation #dagzet-rust

2024-07-21 10:18: connection remarks: needs edges table #dagzet-rust

I need to generate that once, and cache it in dagzet.

2024-07-21 10:23: actually, no not really. SQLite does ID lookup #dagzet-rust

The connections can remain as strings.

2024-07-21 10:31: comments now #dagzet-rust

2024-07-21 10:37: node remarks #dagzet-rust

2024-07-21 10:51: File ranges. (fr) #dagzet-rust

Oh yeah, that's right. This has some shorthand behavior as well. '$' is used to reference the last file. This might take up the rest of my morning before I break.

Some of the slowness comes from building up the test correctly. I'm doing this one right, becaues I've already done it wrong.

2024-07-21 11:09: Some placeholder test code for file range. #dagzet-rust

Comments in place, testing a handful of edge cases. Now I just need to implement incrementally. That's for another time.

2024-07-22 16:55: dagzet things? #dagzet-rust #timelog:00:05:00

2024-07-22 18:55: dagzet: file range work #dagzet-rust #timelog:00:28:08

2024-07-22 19:16: initial tests pass, but they are wrong #dagzet-rust

2024-07-22 19:38: Attempts to get the file range table working. #dagzet-rust

Okay, nevermind.

2024-07-23 09:31: file range table generation #dagzet-rust #timelog:00:07:14

2024-07-23 09:39: Off to implement hyperlinks #dagzet-rust #timelog:00:37:18

2024-07-23 10:16: TODO task #dagzet-rust #timelog:00:13:18

2024-07-23 10:30: Tags (tg) #dagzet-rust #timelog:00:31:30

This one is slightly more interesting because of how the data is represented in SQLite vs how it's entered in dagzet. From a dagzet point of view, tags are a list of entries associated with a node. In SQLite, they boil down into pairs.

Tags should be able to be called more than once.

I think I'm only going to have time for the initial test before I break for lunch. Maybe.

2024-07-23 11:01: Test is breaking in a weird way. #dagzet-rust

I'm not sure why it's not getting the correct number of tags? Oh well, will have to look at it later.

2024-07-23 12:21: What is going on what this test #dagzet-rust #timelog:00:31:14

I was misinterpretting the boolean result of the hashset insert method. Negating that fixed stuff.

tags finished.

2024-07-23 12:53: Attempts to move test to another place #dagzet-rust #timelog:00:08:11

I think for the most part these could be thought of as integration tests.

Nevermind. I just won't fight it right now.

2024-07-23 13:03: Select node command is last on my list I think #dagzet-rust #timelog:00:12:18

2024-07-23 13:18: I think it's all implemented? Now for an initial replacement test #dagzet-rust #timelog:00:05:23

2024-07-23 13:21: Wow, I think it might have worked on the first go. #dagzet-rust

Installed with =cargo install --path .". Replaced my lua program with my rust program. Looked around on the generated ouput. It seems to be right so far? Publishing to see what will happen.

Yeah, it says it is fine. so that's pretty cool.

2024-07-23 13:23: Adding a README #dagzet-rust #timelog:00:19:13

2024-07-23 13:43: I forgot about cx. #dagzet-rust

I've done enough for one session. It's almost there.

2024-07-23 13:45: I need to see it crash. #dagzet-rust #timelog:00:16:18

It should have crashed. Make it crash, then I'm done for now.

oooh. the "dagzet" tool and "gendb.sh" are different. I gotta replace it there.

2024-07-23 13:51: Okay it crashed because I forgot to escape. #dagzet-rust

2024-07-23 14:01: Lots more errors. This makes more sense now. Halting for now. #dagzet-rust

I need better error reporting. It seems like this version of dagzet has stricter parsing (you can't have duplicate tags, for example, whereas my old implementation could).

2024-07-24 10:27: More dagzet integration work #dagzet-rust #timelog:00:42:14

2024-07-24 10:34: I don't think I finished the topsort work #dagzet-rust

2024-07-24 11:09: topsort bug fixed. I just didn't finish writing #dagzet-rust

The test I had for it was too small.

2024-07-24 11:10: nice! running into invalid command error, which is what we want. #dagzet-rust

2024-07-24 11:11: Working on cx command. #dagzet-rust

"cx" is an external connection, and it allows you to make connections to nodes that are external

A partial implementation of cx is all that is needed for now. No aliases, no shorthands. It's all just full paths.

Any nodes used by cx get stored in a list (set?). When unknown nodes are checked, if it can't find the node in the locally created nodes, it'll check the nodes here.

Connections will work the same way since it utilizes a SQL command. The burden is on the dagzetter to ensure that nodes get created before they are connected with cx. This is different from co, which follows a more declarative style.

2024-07-24 11:38: set up tests for initila cx behavior #dagzet-rust

2024-07-24 11:50: now the implementation #dagzet-rust #timelog:01:40:38

2024-07-24 12:09: now to try and hook things up again #dagzet-rust

2024-07-24 12:12: In generate edges, I need to ignore external connections #dagzet-rust

At this point, any unknown nodes should have checked for.

Is there ever a situation where an edge would have an unknown node missed by the unknown node error check?

2024-07-24 12:22: I think it works. One quick look around. #dagzet-rust

2024-07-24 12:24: Nope. Full page generation is causing a crash in genpage.lua #dagzet-rust

It's a sqlite error. I need to figure out what the error is.

I need to use db:errmsg

I didn't configure the column name correctly.

2024-07-24 12:34: More sqlite errors. I think I'm forgetting to name the columns #dagzet-rust

2024-07-24 12:36: The rest of the tables need to be implemented #dagzet-rust

2024-07-24 12:54: Running into a really odd stack overflow error #dagzet-rust

I'm going to have to call it quits after this.

2024-07-24 13:00: We are getting loops in the output. #dagzet-rust

It is random too, probably due to randomness in lua tables.

Something is going wrong in the SQLite table generation. When I place debug prints in the shortname generator, I get a ton of stuff.

2024-07-24 16:46: debugging #dagzet-rust #timelog:00:45:48

My goal is to take a look at the generated table outputs of both.

2024-07-24 17:00: number of nodes are consistent, name order is not #dagzet-rust

This is probably due to the randomized hashmap structure in Rust.

But, the node IDs are the same, monotonically increasing unique values.

2024-07-24 17:02: lines output is identical as well #dagzet-rust

2024-07-24 17:14: something is wrong with the connections #dagzet-rust

There seems to be duplicates. I generated and sorted the connections alphabetically and found these.

The rust version also has inconsistent connections as well.

2024-07-24 17:18: connections: 127 rust dagzet vs 142 lua dagzet #dagzet-rust

It is consistently this. I wonder if "cx" is partially to blame. disabling all the cxs now.

2024-07-24 17:21: connections are the same without cx. #dagzet-rust

There are 15 cx connections made, and that accounts for the missing connections.

2024-07-24 17:22: I wonder if it's the edge generation somehow #dagzet-rust

2024-07-24 17:25: I think I should be doing a name lookup, not using the ID values directly #dagzet-rust

2024-07-24 17:30: Okay! that fixed the connections issues. #dagzet-rust

Now, let's see if the error goes away.

2024-07-24 17:32: FINALLY. IT WORKS. #dagzet-rust

I think I really mean it this time. It generates the data without crashing and everything. Peaking around, it seems to produce the data just fine.

2024-07-24 18:21: I have taken a break broadcast my success. Heading out now for dinner. #dagzet-rust