dpbriggs blog

Adding FFI Support in x7

FFI, or Foreign Function Interface, is mechanism whereby different programming languages can interact with each other in a direct way. x7 is a lisp I made to explore language design and interpreters. Here's the hello world for it:

(println "Hello World!")

I want other rust programs to be able to conveniently embed a x7 interpreter, pass data back and forth, and have x7 call foreign functions. Here's a TLDR example:

// Some necessary imports
use x7::ffi::{ForeignData, IntoX7Function, X7Interpreter};

// A foreign function we want to add to x7.
// Note: It's typed in u32, which x7 does _not_ understand.
fn my_foreign_function(a: u32, b: u32) -> u32 {
    a + b
}

fn main() {
    // Make an interpreter instance.
    let interpreter = X7Interpreter::new();
    // Add my function to the interpreter
    interpreter.add_function("my-ff", my_foreign_function.to_x7_fn());
    // Run an x7 program using my function
    let res: u32 = interpreter.run_program("(my-ff 1 2)").unwrap();
    // Verify the results
    assert_eq!(res, 3);
}

This blog post will detail the inner workings of making FFI convenient in x7.

An Overview of x7's internal structure

To better understand how x7's FFI system works, we'll need to quickly cover how x7 represents types internally. All types, from numbers to lists to functions are instances of the Expr enum:

pub enum Expr {
    Num(Num),
    String(String),
    Symbol(Symbol),
    List(Vector<Expr>),
    Function(Function),
    Nil,
    // ... some elided variants
}

Fundamental to the FFI system is being able to convert foreign data types like u64 to a sensible Expr variant like Expr::Num. x7 only understands Expr, and it's impossible to use any other type. Throughout this article I will refer to non-Expr types as foreign, as they really are.

The next fundamental concern is converting foreign functions to x7's Function struct. The struct contains a few useful things, like the actual function pointer that actually does the work, and some auxiliary information like minimum number of arguments and whether to evaluate arguments or not.

In subsequent sections of the article we'll cover the rust traits responsible for the conversion.

Motivating Example

My original motivation was embedding x7 as a scripting language in the database redis-oxide.

In particular, I wanted this to be possible:

;; basic x7 program, running inside of redis-oxide
(+ "redis-oxide returned: "
   (redis "get" "mykey"))

That redis function is foreign, so as alluded to in the FFI example above, it operates on it's own foreign types. This implies that x7's FFI system will need to transparently and automatically convert between foreign types and it's own native type Expr.

An implementation detail of the redis-oxide database is that it operates internally on the RedisRefValue type. So concretely we need a way to convert "get" and "mykey" into a RedisRefValue. On the other end, redis-oxide returns the same type after an operation, so we need to convert that to something x7 understands (Expr), so it can operate on that.

What should x7's FFI interface support?

At a minimum we need to support:

  1. A nice way to transparently operate on foreign data types.
  2. Foreign functions being typed in their native types, not x7's Expr.
    1. If you've worked with other FFI systems, like Python's excellent FFI system, functions added to the interpreter are typed in Python's types, not the types of your program. This is fine, but not convenient as you need to manually convert python to your types and back.
  3. A nice way to construct and add these functions to the interpreter.

Reasoning about Foreign Data

A fundamental concern of an FFI system is to reason about data passed between the foreign program and x7. In x7's FFI system, this behaviour is captured in the ForeignData trait:

pub trait ForeignData
where
    Self: Sized,
{
    /// Convert from Self to x7's Expr type.
    fn to_x7(&self) -> Result<Expr, Box<dyn Error + Send>>;
    /// Convert x7's Expr type to Self.
    fn from_x7(expr: &Expr) -> Result<Self, Box<dyn Error + Send>>;
}

This trait encodes a fallible forward-and-backward mapping between some foreign type and x7's Expr type. If you're not familiar with rust, this adds methods to a type, and each operation either results in a Result, which encodes a success path, and a failure path of type Box<dyn Error + Send>.

If the conversion fails for any reason, we try our best to type erase that error in terms of Box<dyn Error + Send>. This error trait object helps a lot with wrangling different error types into just one that we can reason about. That Error trait is the std::error::Error trait, which helps interoperability between different rust programs. The Send trait bound is required as our motivating example is redis-oxide, which requires that errors be sent between threads. There's creates some unfortunate boilerplate around handling errors, but thankfully it's only around twenty lines.

Example implementation of ForeignData

The last few paragraphs sound fancy, but we're really doing something simple here: converting between two different types. We'll have some foreign data type called MyData, defined as a rust enum:

#[derive(Debug)]
enum MyData {
    Int(u32),
    String(String),
}

We'll also need a way to express errors as a Box<dyn Error + Send>, so we'll add a basic MyError type with some boilerplate:

// A struct to hold the error string
#[derive(Debug)]
struct MyError(String);

// Display is required for std::error::Error
impl std::fmt::Display for MyError {
    fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
        write!(f, "Error: {}", self.0)
    }
}

// Actually implement std::Error::Error for MyError
impl std::error::Error for MyError {}

impl MyError {
    // An easy way to construct this error
    fn boxed(err: String) -> Box<dyn std::error::Error + Send> {
        Box::new(MyError(err))
    }
}

This will help us return errors when someone for example tries to convert a x7 function into a MyData.

Now that we have the boilerplate out of the way, we can implement ForeignData for MyData!

// Necessary imports
use x7::ffi::{ExprHelper, ForeignData};
use x7::symbols::Expr;

// ForeignData implementation for MyData
impl ForeignData for MyData {
    fn to_x7(&self) -> Result<Expr, Box<dyn std::error::Error + Send>> {
        // The forward direction is actually infallible
        let res = match self {
            MyData::Int(i) => Expr::Num((*i).into()),
            MyData::String(s) => Expr::String(s.clone()),
        };
        Ok(res)
    }

    fn from_x7(expr: &Expr) -> Result<Self, Box<dyn std::error::Error + Send>> {
        // The backward direction _is_ fallible.
        // e.g. There's no way to express x7 functions in terms of MyData
        let res = match expr {
            Expr::Num(i) => MyData::Int(i.to_u64()?), // to_u64 is a ExprHelper method
            Expr::String(s) => MyData::String(s.clone()),
            unknown_type => {
                let err_msg = format!("{} cannot be converted to MyData", unknown_type);
                return Err(MyError::boxed(err_msg));
            }
        };
        Ok(res)
    }
}

Relatively speaking that's a lot of code for a simple conversion, but we can leverage the ForeignData trait to make some cool things for our FFI.

Implementing ForeignData for common types

To maximize convenience and skirt around rust's orphan rule, we'll provide ForeignData implementations for common types like String and u64. There's a lot of these implementations in the ffi.rs file in x7.

impl ForeignData for u64 {
    fn to_x7(&self) -> Result<Expr, Box<dyn Error + Send>> {
        Ok(Expr::Num((*self).into()))
    }

    fn from_x7(expr: &Expr) -> Result<Self, Box<dyn Error + Send>> {
        expr.to_u64()
    }
}

This will allow users to make functions using common rust types, and not worry about the details.

Foreign Functions: The IntoX7Function Trait

Now that we can reason about foreign data, we want a way to convert functions written in terms of foreign data types into something x7 can use. We'll use a technique known as extension traits to add a .to_x7_fn() method to functions. We'll ignore variadic (any number of arguments) functions for now, and only focus on functions with a known number of arguments.

The basic ingredients for a x7 function is:

  1. A X7FunctionPtr, the pointer to the x7 function, which has some serious restrictions.
    1. The function has the shape Fn(Vector<Expr>, &SymbolTable) -> LispResult<Expr>.
    2. We require Sync + Send bounds to ensure thread safety (and make it compile).
  2. A minimum number of arguments.
  3. A symbol to represent the function (my-ff, etc).

We can use some trait magic to automate the first two, by having a trait that takes self and and returns the minimum number of args, and the X7FunctionPtr:

pub trait IntoX7Function<Args, Out> {
    fn to_x7_fn(self) -> (usize, crate::symbols::X7FunctionPtr);
}

An interesting part of this trait is the <Args, Out> types. We need those types to help differentiate our implementations of IntoX7Function, and support variadics later on. The general implementation concept for a function that takes n arguments is:

  1. Make an x7 function: Fn(args: Vector<Expr>, _sym: &SymbolTable) -> LispResult<Expr>
  2. Verify that exactly n arguments are passed (args.len()==n).
  3. Convert each x7 Expr argument to the type we need with ForeignData::from_x7().
    1. Return a nice error message if that fails.
  4. Call the foreign function with those now-converted arguments.
    1. Then capture and convert the function output with Out::to_x7() - x7 needs an Expr!
    2. Massage any errors into the error type x7 expects (anyhow::Error)

With that in mind, here's what the two argument implementation looks like without comments:

impl<F, A, B, Out> IntoX7Function<(A, B), Out> for F
where
    A: ForeignData,
    B: ForeignData,
    Out: ForeignData,
    F: Fn(A, B) -> Out + Sync + Send + 'static,
{
    fn to_x7_fn(self) -> (usize, crate::symbols::X7FunctionPtr) {
        let f = move |args: Vector<Expr>, _sym: &SymbolTable| {
            crate::exact_len!(args, 2);
            let a = convert_arg!(A, &args[0]);
            let b = convert_arg!(B, &args[1]);
            (self)(a, b).to_x7().map_err(|e| anyhow!("{:?}", e))
        };
        (2, Arc::new(f))
    }
}

And here's what it looks like with comments:

// We're working with a function with two arguments,
// that returns a single output type Out.
impl<F, A, B, Out> IntoX7Function<(A, B), Out> for F
where
    // All inputs and outputs to this function require ForeignData
    A: ForeignData,
    B: ForeignData,
    Out: ForeignData,
    // X7FunctionPtr requires Sync + Send, so we add that restriction to F
    F: Fn(A, B) -> Out + Sync + Send + 'static,
{
    fn to_x7_fn(self) -> (usize, crate::symbols::X7FunctionPtr) {
        // This closure conforms to the shape X7FunctionPtr requires,
        // namely a function that takes a Vector<Expr> and a symbol table reference.
        //
        // args: (<ff-symbol> 1 2) -> vector![Expr::Num(1), Expr::Num(2)]; actual args passed
        // _sym: A symbol table reference. Unused.
        let f = move |args: Vector<Expr>, _sym: &SymbolTable| {
            // exact_len: macro to throw an error if args.len() != 2.
            crate::exact_len!(args, 2);
            // convert_arg: macro that calls A::from_x7(&args[0]) and return a nice error if that fails
            let a = convert_arg!(A, &args[0]);
            let b = convert_arg!(B, &args[1]);
            (self)(a, b) // (self)(a,b) calls the foreign function with args a, b
                .to_x7() // convert the output to an x7 Expr
                .map_err(|e| anyhow!("{:?}", e)) // massage error type to x7's error type (anyhow)
        };
        // Finally, return a tuple of minimum args + our function
        (2, Arc::new(f))
    }
}

x7's FFI system contains similar implementations for functions that take one args to five args. The appendix will cover variadic functions. All of this gives us the power to do:

let my_ff = |a: u64, b: u64| a + b; 
let my_ff_x7 = my_ff.to_x7_fn();

Which is great, but now we need a way to make a x7 interpreter instance and actually add that my_ff function to it.

The X7Interpreter

This part is thankfully much simpler than previous sections. We only need a struct holding a SymbolTable and a way to run programs on it.

#[derive(Clone)]
pub struct X7Interpreter {
    symbol_table: SymbolTable,
}

And a way to make a new one:

impl X7Interpreter {
    /// Make a new interpreter instance.
    pub fn new() -> Self {
        X7Interpreter {
            symbol_table: crate::stdlib::create_stdlib_symbol_table_no_cli(),
        }
    }
   // elided...
}

Finally, we need a way to run programs. We'll just take the program string as a &str, and call the usual lisp function of read and eval:

impl X7Interpreter {
    /// Run a x7 program.
    pub fn run_program<T: 'static + ForeignData>(
        &self,
        program: &str,
    ) -> Result<T, Box<dyn Error + Send>> {
        let mut last_expr = Expr::Nil;
        for expr in read(program) {
            last_expr = expr
                .and_then(|expr| expr.eval(&self.symbol_table))
                .map_err(ErrorBridge::new)?;
        }
        T::from_x7(&last_expr)
    }
}

This function parses the input (read), and then for every expression in the program call eval and return errors as necessary. We need to return something, so we default to Expr::Nil and return the last expression. The user needs to specify the output type T, as we want to actually return something. It's now possible to do:

fn main() {
    let interpreter = X7Interpreter::new();
    let output = interpreter.run_program::<u64>("(+ 1 1)").unwrap();
    println!("output: {}", output);
    // prints "output: 2"
}

Finally, we need a way to add functions to the interpreter. This is also straightforward thanks to IntoX7Function:

impl X7Interpreter {
    /// Add a function to the interpreter under the symbol `function_symbol`
    pub fn add_function(
        &self,
        function_symbol: &'static str,
        fn_tuple: (usize, crate::symbols::X7FunctionPtr),
    ) {
        let (minimum_args, fn_ptr) = fn_tuple;
        // The `true` at the end tells x7 to evaluate arguments passed in.
        let f = Function::new(function_symbol.into(), minimum_args, fn_ptr, true);
        self.symbol_table
            .add_symbol(function_symbol, Expr::Function(f));
    }
}

All we're doing is making a new Function struct instance, and adding the function to the interpreter with under the symbol function_symbol. Here's an example:

fn main() {
    let interpreter = X7Interpreter::new();

    // Make and add the function to x7 under the symbol my-ff
    let my_ff = |a: u64, b: u64| a + b;
    interpreter.add_function("my-ff", my_ff.to_x7_fn());

    let output = interpreter.run_program::<u64>("(my-ff 1 1)").unwrap();
    println!("output: {}", output); // prints "output: 2"
}

Interestingly, we can also see our careful handling of errors has paid off. The following x7 program:

(my-ff 1 "i am a string")

Fails with this error message, as my-ff expects a u64 not a string:

Error in Fn<my-ff, 2, [ ]>, with args (1 "i am a string")

Caused by:
    Error: Expected num, but got type 'str': "i am a string"

    Caused by:
        BadTypes

Conclusion

The x7 FFI system was very interesting for me to figure out, and this is what came out. I hope you enjoyed the article - it's certainly more code and concept heavy than previous articles.

Overall I think there's some room for improvement in the FFI system, but it's powerful and convenient enough to be embedded into redis-oxide, so I am content for now.

Appendix: Variadics

Thanks to the signature of IntoX7Function, we can add a type Variadic and implement IntoX7Function in terms of it.

pub struct Variadic<T>(Vec<T>);

impl<T> Variadic<T> {
    pub fn to_vec(self) -> Vec<T> {
        self.0
    }
}

Note: The underlying Vec<T> forces us to use the single type for every variadic argument. This isn't necessary a big deal, but it is something to keep in mind.

We now implement IntoX7Function in terms of Variadic:

impl<F, T: ForeignData, Out> IntoX7Function<(Variadic<T>,), Out> for F
where
    T: ForeignData,
    Out: ForeignData,
    F: Fn(Variadic<T>) -> Out + Sync + Send + 'static,
{
    fn to_x7_fn(self) -> (usize, crate::symbols::X7FunctionPtr) {
        let f = move |args: Vector<Expr>, _sym: &SymbolTable| {
            let args = match args.iter().map(T::from_x7).collect::<Result<Vec<_>, _>>() {
                Ok(v) => v,
                Err(e) => return Err(anyhow!("{:?}", e)),
            };
            (self)(Variadic(args))
                .to_x7()
                .map_err(|e| anyhow!("{:?}", e))
        };
        (1, Arc::new(f))
    }
}

For interest, this is what using Variadic in redis-oxide looks like:

let send_fn = move |args: Variadic<RedisValueRef>| {
    let args = args.to_vec();
    // ... use args as a vec ...
    // ... implementation details coming in future article ..
};
self.interpreter.add_function("redis", send_fn.to_x7_fn());

This variadic FFI interface allows us to provide a fully featured redis-oxide API in about 20 lines of code, as commands are converted directly into the internal representation for command processing:

;; No special API code! Directly converted and passed to the command processor
(redis "mset" "key1" "value1" "key2" "value2")
(redis "lpush" "list-key" "value1" "value2")
... Continue Reading(current)

Implementing Records in x7

x7 is a lisp I made to explore language design and interpreters. Here's the hello world for it:

(println "Hello World!")

It's got all the functional niceness that lisps provide:

;; Define a function which squares the input
(defn square (x)
  (* x x))

;; predicate for x mod 4 == 1
(defn is-one-mod-4 (x)
  (= 1 (% x 4)))

;; Filter and map the first two hundred numbers
(filter is-one-mod-4
        (map square (range 200)))

;; Outputs: (1 9 25 49 81 121 169 225 ...)

x7 has all the nice immutable, stateless fun that functional languages provide. What it can't do is represent types which are internally stateful, like files or sockets. This article will explain how we can add state to the language in the form of Records.

The motivation for this that the OS expects you maintain some state if you want to operate on files (or really any IO). There's tricks to avoid internally mutable state, but it's way more fun to add it to the language.

Records, or objects, allow us to encapsulate state in one place, that you can call methods on. We'll work towards opening, writing, and reading from a file in x7. This is what it looks like:

;; Open a file
(def my-file (fs::open "my_file.txt"))

;; .write is a method of the record File,
;; which will write strings to the file
(.write my-file "Hello World")

;; Similarly, we can use the .read_to_string
;; method to read the file contents
(.read_to_string my-file)

Before we can implement Records, we'll need a better idea of x7's interpreter internals.

x7 Interpreter Internals

To run x7 programs, we need to parse the incoming strings into a form we can actually evaluate. Central to this is the type Expr, which represents all possible types in the interpreter:

pub(crate) enum Expr {
    Num(Num),
    Symbol(Symbol),
    List(Vector<Expr>),
    Function(Function),
    // elided variants...
}

The shown members of that enum are the minimum required variants to evaluate simple expressions like (+ 1 1). If you're not familiar with lisps, this is the same thing as 1 + 1.

Parsing "(+ 1 1)"

The string "(+ 1 1)" needs to get transformed into a type we can operate on. The details of the parser aren't relevant this article. All we really need to know is that (+ 1 1) gets transformed into the following Expr type:

Expr::List(
    Expr::Symbol("+"),
    Expr::Num(1),
    Expr::Num(1),
)

This form is very convenient as we can recursively evaluate it to facilitate computation.

Evaluating (+ 1 1)

Crucial to lisps is the expression, or the idea that types just exist and propagate through computation. Evaluation simply stated is a computation process of taking an expression, and returning another expression. For example, typing 3.3 in the x7 interpreter returns 3.3, another expression. In contrast, (+ 1 1) is considerably more complex, as it represents a function call to add two numbers. We'll need to work our way there.

To build our way to evaluating (+ 1 1), we'll need to discuss core concepts like Expressions, Symbols, and Lists.

Evaluating Expressions

As mentioned earlier, evaluation is fundamental to the function of x7. More importantly, different types evaluate differently!

Here's a quick preview of what different types evaluate to:

Table 1: Eval Behaviour for diffent types
Type Eval Behaviour
Num Itself - another num.
Function Itself - the same function.
Symbol A lookup to find whatever the symbol represents (function, etc).
Lists A function call, where the first member is interpreted as a function.

If we start the x7 interpreter, we can see this in action:

>>> 3.3                ;; Num
3.3
>>> (fn (x) (x))       ;; Function
Fn<AnonFn, 1, [ x ]>
>>> +                  ;; Symbol
Fn<+, 1, [ ]>
>>> (+ 1 1)            ;; List
2

With this behaviour in mind, we'll need to better understand how Symbols work in the interpreter.

Evaluating Symbols

Symbols act as references to something else in the interpreter - things like constants or functions. x7 uses the SymbolTable type, which provides a lookup(&self, key: &Expr) method to map symbols to expressions in the interpreter.

The implementation of eval for Symbol looks like:

// self is an Expr
if self.is_symbol() {
    return symbol_table.lookup(&self);
}

Part of the x7 initialization process is to populate the symbol table with the standard library - either from rust with the make_stdlib_fns! macro or the x7 files in stdlib/. If we disable symbol population you'll see that x7 runs just fine, but isn't very useful:

;; x7 without a stdlib isn't very useful
>>> +
Error: Unknown Symbol +

Evaluating Lists

The next area is to understand is the interaction between Lists and Symbols. A list evaluation in x7 (and lisps) is a function call, with the following convention:

(<fn-expr> <arg1> <arg2> ...)

The goal then is for the x7 interpreter to evaluate <fn-expr> to a function, and then call the function with the args. The vast majority of time <fn-expr> will be a symbol, like + or -, so it'll be a symbol lookup. The process we want then:

  1. Evaluate <fn-expr>, and hope it returns a Function
  2. Call the Function with the args provided.

In rust this looks like:

if let Ok(mut list) = self.get_list() {
    // Empty lists are _not_ function calls
    if list.is_empty() {
        return Ok(self.clone());
    }

    let head = list.pop_front().unwrap();
    let tail = list;

    return head.eval(&symbol_table)?.call_fn(tail, symbol_table);
}

The last line is the operative one - we evaluate the first item (head), and the use the call_fn method to call the function.

If you're not super familiar with rust, the last line can be understood as:

  1. Evaluate head, and early return if we get an error.
    1. The most common error is simply the symbol not resolving. More exotic errors could be something like head is itself a function call that failed.
    2. To elaborate further, head can evaluate to anything. While this case is intended to map symbols to functions, anything can happen here. If we didn't evaluate head, Symbol("+") would never become the function Fn<+, 1, [ ]>.
      1. For the curious, ((if (random_bool) + -) 10 5) is a valid x7 program. It randomly returns 5 or 15.
  2. Call the method call_fn.
    1. If head doesn't evaluate to a function, return the error NotAFunction
    2. If it's a function, call it with args tail

Notably, we don't evaluate tail. To allow conditional constructs like if or cond to not evaluate branches not taken, we need a way to opt out of evaluation! This is implemented as a flag on the Function struct which can be controlled by the rust portion of the standard library.

Now that we have an overview of the x7 interpreter internals, we can actually add records to the language!

The Record Trait

To represent types which are internally stateful, we'll add a trait called Record to the language. It needs to express following behaviours:

  1. Call methods with arguments.
  2. Represent the type in a display and debug way.
  3. Represent the type name as a string, and it's methods.
  4. Uniquely identify the type (hash).
  5. Clone the type safely.

Aside from calling methods, most of these items are related to making sure error messages are nice, or slotting it in cleanly with the rest of the interpreter machinery.

Here's the trait in rust:

/// Fundamental trait for records.
///
/// Records allow x7 to represent a variety of internally mutable types
/// while not expanding the Expr enum too much. These types are responsible for
/// implementing RecordDoc if they want to have documentation.
pub(crate) trait Record: Sync + Send {
    /// Call a method on this record.
    /// (.method_name <rec> arg1 arg2 arg3)
    /// Becomes:
    /// (&self: <rec>, sym: "method_name", args: vector![arg1, arg2, arg3])
    fn call_method(&self, sym: &str, args: Vector<Expr>) -> LispResult<Expr>;

    /// Uniquely identify this record
    fn id(&self) -> u64 {
        0
    }
    /// Nicely display the record type.
    fn display(&self) -> String;

    /// Add more information for debug printing
    fn debug(&self) -> String;

    /// Clone the object.
    fn clone(&self) -> RecordType;

    /// Return the names of the methods for help messages.
    fn methods(&self) -> Vec<&'static str>;

    /// Return the type name for nice help messages
    fn type_name(&self) -> &'static str;
}

Now that we have a trait, we need a fundamental type we can export and use throughout x7. Since we want to use trait objects, a Box is a natural choice:

pub(crate) type RecordType = Box<dyn Record>;

I'll elide the implementation details to get Record implemented for RecordType, but if you're curious, they can be found here.

Wiring RecordType into x7

To integrate RecordType into the language, we'll need to add it to aforementioned Expr enum. Here's what it looked like before we add Record:

#[derive(Clone, Hash)]
pub(crate) enum Expr {
    Num(Num),
    Symbol(Symbol),
    String(String),
    // truncated...
}

And after:

#[derive(Clone, Hash)]
pub(crate) enum Expr {
    Num(Num),
    Symbol(Symbol),
    String(String),
    Record(crate::records::RecordType),
    // truncated...
}

It's just that easy. We can use compiler errors to figure out what we're missing.

The compiler errors tells us that we're missing Hash, PartialEq, and Clone on RecordType:

/// We can stick whatever we want into hashmaps, so we need Hash implemented
impl Hash for RecordType {
    fn hash<H: Hasher>(&self, state: &mut H) {
        self.id().hash(state);
    }
}

/// We also need to equality check RecordTypes
/// As their internal state may differ, always return false.
/// This could be improved.
impl PartialEq for RecordType {
    fn eq(&self, _other: &RecordType) -> bool {
        false
    }
}

/// x7 clones types all over the place
impl Clone for RecordType {
    fn clone(&self) -> RecordType {
        Record::clone(self)
    }
}

Great! We now have the requisite traits implemented for RecordType, and aside from a few changes like my custom Display implementation, we're good to go.

The last thing we'll want is a way to grab a RecordType out of the enum when we want it:

impl Expr {
  // ... elided...
  pub(crate) fn get_record(&self) -> LispResult<RecordType> {
      if let Expr::Record(r) = self {
          Ok(r.clone())
      } else {
          bad_types!("record", &self)
      }
  }
}

This will let us grab a record type in our standard library and have nice error messages if we don't.

The next thing we'll want to do is add a standard library function to call methods!

Adding call_method to the standard library

Now that RecordType is embedded in the interpreter's machinery, we can actually use it! We will want a way to explicitly call methods in the standard library, stdlib::call_method.

We won't have the .method-call syntactic sugar yet, so a free standing x7 function will have to do.

Recall the signature of Record::call_method:

fn call_method(&self, sym: &str, args: Vector<Expr>) -> LispResult<Expr>;

We'll want a standard library method that takes:

  1. A record
  2. A method on that record
  3. Some args for that method.

Thankfully this is pretty straightforward. All functions in x7 must have the following type:

pub(crate) type X7FunctionPtr =
    Arc<dyn Fn(Vector<Expr>, &SymbolTable) -> LispResult<Expr> + Sync + Send>;

In rust, this looks like:

fn call_method(exprs: Vector<Expr>, _symbol_table: &SymbolTable) -> LispResult<Expr> {
   // TODO: Implement the function
}

The next issue to tackle is the argument layout for stdlib::call_method.

We're only given a list of arguments, so we'll need to define a calling convention:

(call_method <record> <method-name> <args>...)

So we'll expect a record as the first member, and then the method name, and finally the args. For example, here's how we'd expect writing to a file to look like:

(call_method my-file "write" "hello world")

Thanks to the Expr::get_record function we made earlier, we can easily implement stdlib::call_method:

fn call_method(exprs: Vector<Expr>, _symbol_table: &SymbolTable) -> LispResult<Expr> {
    // First list member is a record.
    let rec = exprs[0].get_record()?;

    // Second list member is the method string.
    let method = &exprs[1].get_string()?;

    // Collect the args in the list.
    let args = exprs.clone().slice(2..); // .clone() is O(1) and .slice needs a &mut

    // Finally, call the method on the record with args
    use crate::records::Record;
    rec.call_method(method, args)
}

Now that we have the function, we'll need to make it accessible from the interpreter. x7 uses a macro called make_stdlib_fns to expose rust functions to the interpreter, so we need to just slot it in:

make_stdlib_fns!{
  // elided functions...
  ("call_method", 2, call_method, true, "<doc-string>"),
}

This can be read as:

  1. Expose stdlib::call_method to the interpreter with the symbol call_method, and
  2. It expects at least two arguments, and
  3. Ask the interpreter to evaluate the arguments (true), and finally
  4. Have the docstring "<doc-string>".

We can start the interpreter and ask it to evaluate the symbol call_method:

>>> call_method
Fn<call_method, 2, [ ]>

Nice! We can't really do much with it as we haven't actually implemented Record on any types yet, so let's do that!

Implementing the File Record

The original motivation for adding Record to x7 is the ability to open, read, and write to files. We'll back the x7 File implementation by the rust File struct, so let's make a new file in x7 - records/file.rs:

We will start by making a FileRecord struct:

#[derive(Clone, Debug)]
pub(crate) struct FileRecord {
    path: String,
    // The Record trait requires Sync + Send
    file: Arc<Mutex<std::fs::File>>,
}

The type Arc<Mutex<std::fs::File>> is necessary as x7 requires all types to be thread safe.

Now that we have a struct, let's expose a way to generate one from x7. We want the following x7 expression to work:

(fs::open "file-name")

This will map to a Expr::String("file-name") in the interpreter, so we need two methods:

  1. A way to open files given a String
  2. A way to open files given an Expr::String

With that in mind, here's the two relevant methods:

impl FileRecord {
      /// Open a file with the given Path
      pub(crate) fn open_file(path: String) -> LispResult<Expr> {
      // Open the file with liberal permissions.
      let f = OpenOptions::new()
          .write(true)
          .create(true)
          .read(true)
          .open(path.clone())
          .map_err(|e| anyhow!("Could not open file \"{}\" because {}", &path, e))?;
      // Make the path pretty.
      let abs_path = fs::canonicalize(path)
          .map_err(|e| anyhow!("Could not canonicalize path! {}", e))?
          .to_str()
          .ok_or_else(|| anyhow!("Could not represent path as UTF-8 string"))?
          .into();
      // record! is a macro to assist in making LispResult<Expr::Record> types
      record!(FileRecord::new(f, abs_path))
  }

  /// Open a file from x7
  /// This function signature will let us expose it directly to the interpreter
  pub(crate) fn from_x7(exprs: Vector<Expr>, _symbol_table: &SymbolTable) -> LispResult<Expr> {
      exact_len!(exprs, 1);
      let path = exprs[0].get_string()?;
      FileRecord::open_file(path)
  }
}

Now that we have the ability to make a FileRecord, we'll need to implement Record so it can be understood by the interpreter (Expr::Record).

impl Record for FileRecord {
    fn call_method(&self, sym: &str, args: Vector<Expr>) -> LispResult<Expr> {
      // We have no methods yet.
      unknown_method!(self, sym)
    }

    fn type_name(&self) -> &'static str {
        "FileRecord"
    }

    fn display(&self) -> String {
        format!("File<{}>", self.path)
    }

    fn debug(&self) -> String {
        self.display()
    }

    fn clone(&self) -> RecordType {
        Box::new(Clone::clone(self))
    }

    fn methods(&self) -> Vec<&'static str> {
        Vec::new()
    }

    fn id(&self) -> u64 {
        use std::collections::hash_map::DefaultHasher;
        use std::hash::{Hash, Hasher};
        let mut h = DefaultHasher::new();
        self.path.hash(&mut h);
        h.finish()
    }
}

We also need to expose FileRecord::from_x7 to the interpreter, so let's head back and add it to make_stdlib_fns:

 make_stdlib_fns!{
  // elided functions...
  ("call_method", 2, call_method, true, "<doc-string>"),
  // Open a file
  ("fs::open", 1, FileRecord::from_x7, true, "Open a file."),
}

We can now compile and run x7 to see what happens:

>>> (def f (fs::open "hello-world.txt"))
nil
>>> f
File</home/david/programming/x7/hello-world.txt>

Nice! We've opened a file. We can now implement some other useful methods on FileRecord like reading from a file:

impl FileRecord {
  /// Read the contents of a file to a String,
  /// rewinding the cursor to the front.
  fn read_all(&self) -> LispResult<String> {
      let mut buf = String::new();
      let mut guard = self.file.lock();
      guard
          .read_to_string(&mut buf)
          .map_err(|e| anyhow!("Failed to read to string {}", e))?;
      rewind_file!(guard);
      Ok(buf)
  }

  /// Read the contents of a FileRecord to a string.
  fn read_to_string(&self, args: Vector<Expr>) -> LispResult<Expr> {
      // We want no arguments.
      exact_len!(args, 0);
      self.read_all().map(Expr::String)
  }
}

We can update our Record implementation for FileRecord to include this method:

impl Record for FileRecord {
    fn call_method(&self, sym: &str, args: Vector<Expr>) -> LispResult<Expr> {
        match sym {
            "read_to_string" => self.read_to_string(args),
            _ => unknown_method!(self, sym),
        }
    }
}

And use it:

~ echo "hello" > hello-world.txt
~ x7
>>> (def f (fs::open "hello-world.txt"))
>>> (call_method f "read_to_string")
"hello"

Awesome! We're able to call methods on FileRecord. It's the same process to implement .write and other useful file operations, so we'll elide it. This is great stuff, and would be even better with some syntactic sugar.

Let's add method call syntax so these two expressions are equal:

>>> (call_method f "read_to_string")
>>> (.read_to_string f)

Adding Method call syntax in x7

Without getting too much in the parser weeds, .method is parsed into an Expr::Symbol. We can modify the parser to recognize the period, and instead parse it into a Function that calls our method for us.

The parse_symbol function is defined as:

fn parse_symbol<'a>(i: &'a str) -> IResult<&'a str, Expr, VerboseError<&'a str>> {
    map(take_while1(is_symbol_char), |sym: &str| {
        Expr::Symbol(sym.into())
    })(i)
}

So all it does is try to recognize a symbol, and then transform the type when it fully parses one. We'll modify it to recognize if a symbol starts with a period, and if so, call make_method_call and return an Expr::Function.

Let's first make make_method_call: ((if (random_bool) + -) 10 5)

fn make_method_call(method: String) -> Expr {
    // Import some useful types
    use crate::symbols::Function;
    use std::sync::Arc;
    let method_clone = method.clone();
    // This is the cool part. We're making a closure that obeys
    // the X7FunctionPtr type.
    // When we call .write, it'll call this function.
    let method_fn = move |args: Vector<Expr>, _sym: &SymbolTable| {
        // First item is a record, and get it
        let rec = match args[0].get_record() {
            Ok(rec) => rec,
            Err(e) => return Err(e),
        };
        // `rec` is a record, so call the method.
        // Note that we move `method_clone` into this this closure!
        use crate::records::Record;
        // The layout of `args` is: (<record> <arg1> <arg2> ...),
        // and the type signature we have is Record::call_method(method, args)
        rec.call_method(&method_clone, args.clone().slice(1..));
    };
    // Make a Function struct
    let f = Function::new(
        format!("method_call<{}>", method), // function name
        1,                                  // number of args
        Arc::new(method_fn),                // function pointer
        true,                               // eval args
    );
    // Return an Expr::Function
    Expr::Function(f)
}

This is pretty cool - we're transforming a symbol into a function. All we need to do is to add an if-gate into parse_symbol, and we're set!

fn parse_symbol<'a>(i: &'a str) -> IResult<&'a str, Expr, VerboseError<&'a str>> {
    map(take_while1(is_symbol_char), |sym: &str| {
        if sym.starts_with('.') {
            make_method_call(sym[1..].into()) // sym[1..] => drop the period
        } else {
            Expr::Symbol(sym.into())
        }
    })(i)
}

We can start x7 and test it out:

>>> .read_to_string
Fn<method_call<read_to_string>, 1, [ ]>

Nice, we're getting a function from our parser. We can try using it:

>>> (def f (fs::open "hello-world.txt"))
>>> (.read_to_string f)
"hello"

And that's it! We've implemented records in x7. I hope you enjoyed reading the article!

... Continue Reading(current)

Rust Traits: Iterator

The Iterator trait in Rust allows you to conveniently operate over a sequence of elements. They provide an expressive, functional, convenient, and performant way to do computations.

This article will focus on the mechanics of the trait and offer a deeper look into it. If you're not already familiar with traits in Rust, I recommend skimming the rust book chapter before reading this article.

The Basics

Iterators are very common in Rust, and if you've written Rust you have likely used them. Here is a basic example:

for val in 0..10 {
  // ... use val
}

0..10 is a Range, which implements Iterator, which is central to the function of for-loops (desugar here).

You may have written something more complex as well:

let v1 = vec![1, 2, 3];
let v2 = vec![4, 5, 6];

let dot_product: u32 = v1
    .iter()
    .zip(v2)
    .map(|(l, r)| l * r)
    .sum(); // 32

Either way, we're iterating and operating on some sequence of values. The Iterator trait provides convenient ways to construct, transform, and consume these sequences.

The Iterator Trait

The Iterator trait could be thought as having three parts:

  1. The foundation: a next() function which returns some type Item if it can.
  2. Lots and lots of methods for iterator transformations (e.g. functional tools like map and filter).
  3. A function called collect(), which allows you to evaluate iterators into some collection type.

The Foundation

The foundation to the Iterator is a type Item, and a method next() which returns Option<Item>:

trait Iterator {
  type Item;
  fn next(&mut self) -> Option<Self::Item>;
  // ... several elided methods
}

Annotated with comments:

trait Iterator {
  type Item;
  //   ^-- Associated type; the type we are returning each iteration
  fn next(&mut self) -> Option<Self::Item>;
  // ^    ^             ^ returns either an Item, or nothing
  // |    | it mutates something each iteration
  // | `next` method gets somehow called each iteration in for-loops

  // ... several elided methods
}

The trait signature tells us a lot about how it works:

  1. We need to mutate the type for which we implement Iterator (something needs to book-keep).
  2. If we have a value to yield, return Some(val)
    1. Otherwise, stop iteration by yielding None
  3. We return the same type each iteration.

So now that we've seen the foundation, let's preview some Iterator trait methods for transforming iterators.

The Thousand Elided Methods

While it's nice that we can cleanly define a way to retrieve a single element at a time from a collection, it would be very nice to operate on the iterable itself. The Iterator trait provides a LOT of functions to conveniently work with iterators in a functional style. We can succinctly express more complex logic with these methods – for example:

let some_iterable = 0..100;
let sum = some_iterable
    .filter(|&e| e > 50)
    .map(|e| e * e)
    .sum();

vs

let some_iterable = 0..100;
let mut sum = 0;
for e in some_iterable {
    if !(e > 50) {
        continue;
    }
    let doubled = e * e;
    sum += doubled;
}

I personally find the first far easier to read as it requires much less parsing. This isn't always true of iterators in Rust, but most of the time it is.

Other methods we'll use for this article include Iterator::take(N), which only takes up-to N elements from the iterator. This is useful for infinite iterators, and is common in functional languages.

You can view a list of the iterator methods here.

The collect() Method

While important, this article won't focus much on the mechanics of collect(). In short, this method uses the FromIterator trait to convert iterators into some collection. You'll find yourself using this often when working with iterators to convert them into tangible and convenient types.

There's a good example of collect() here.

Now that we've seen an overview of what's provided, we can implement Iterator!

Part 1: The Natural Numbers

To get more familiar with the trait, let's make a useful construct: The Natural Numbers.

To implement this, we'll need a struct holding the current value:

// Book keeping struct
struct NaturalNumbers {
    curr: u32,
}

// Start at 0 because computers
impl NaturalNumbers {
    fn new() -> Self {
        Self { curr: 0 }
    }
}

And implement Iterator by incrementing curr:

impl Iterator for NaturalNumbers {
    type Item = u32;

    fn next(&mut self) -> Option<Self::Item> {
        let ret = self.curr;
        self.curr += 1;
        Some(ret)
    }
}

Nice! We have a struct NaturalNumbers which will yield every natural number until it panics on overflow.

This is certainly useful, and will serve as a bedrock for later functions. Unfortunately our terminals don't appreciate printing millions of integers, so we'll use the method Iterator::take(N) which limits the number of iterations to at most N.

We can then test NaturalNumbers with:

fn main() {
    for i in NaturalNumbers::new().take(5) {
        println!("{}", i);
    }
}

Which outputs:

~ cargo run
   Compiling iterator-article v0.1.0 (/home/david/programming/iterator-article)
    Finished dev [unoptimized + debuginfo] target(s) in 0.15s
     Running `target/debug/iterator-article`
0
1
2
3
4

You can the run this example yourself on the Rust playground!

So now that we can generate a sequence of values, let's implement some familiar functional friends: map, filter, and reduce (fold).

Implementing Map

A frequent programming task is to loop over some collection and operate (transform) the type of an element given in each iteration.

This occurs commonly when retrieving data from some source, and you need to bind the data in some useful construct (class / struct / etc). Or if you're crunching numbers you may want to operate on each element individually before some other step.

Either way, this pattern is so common that most languages offer the map construct – a way to provide an iterable and a function, and get the function applied to each element of the iterable returned.

For example, let's double each number in a vector. Rust offers a map() method on iterators, so we'll use that first:

Pseudo-code:

seq: 0, 1, 2, 3, ...
fn:  |e| e * e
out: 0, 1, 4, 9, ...

Rust:

let input = vec![1, 2, 3];
let doubled: Vec<_> = input
    .iter()
    .map(|e| e * e)
    .collect();

So we provide a function, |e| e * e which double numbers, and map implicitly takes self, which is an iterator. This may not make sense right now, so let's dig deeper into building our own Map.

Things are going to get a little higher-order here, so let's outline what we'll need:

  1. We need a type Iter, which implements Iterator
  2. We need a function, which maps Iter::Item to some output type Out
    1. Syntax: Iter::Item is the associated type Item from implementation of Iterator on Iter.
    2. We can express the map function in Rust then as FnMut(Iter::Item) -> Out
      1. FnMut as we're consuming the element and may want to mutate captured variables. Feel free to use Fn if you don't want that. More on this later in the Reduce section.

Putting the above together we'll need a struct to store our function and iterator:

 // Our Map struct
struct Map<Iter, Fn> {
    iter: Iter,
    f: Fn,
}

// We'll want to instantiate one later, so add a constructor method:
impl<Iter, Fn> Map<Iter, Fn> {
    fn new(iter: Iter, f: Fn) -> Self {
        Self { iter, f }
    }
}

Great, we can now tackle implementing Iterator. The first challenge is getting the types setup for our impl. As described above, we'll need an Iter, F (map fn), and Out types:

impl<Iter, F, Out> Iterator ...

But we need further guarantees as described above:

impl<Iter: Iterator, F: FnMut(Iter::Item) -> Out, Out> Iterator ...

I recommend the reader really make sure the type signature above makes sense. Rust has a tendency to hit type soup, and it is worthwhile to take a minute to understand it.

We can now implement Iterator in a straightforward way:

impl<Iter: Iterator, F: FnMut(Iter::Item) -> Out, Out> Iterator for Map<Iter, F> {
    type Item = Out;

    fn next(&mut self) -> Option<Self::Item> {
        self.iter.next().map(|e| (self.f)(e))
    }
}

So we're calling next() on our stored iterator to iterate once, and mapping the value with our stored function, and returning it. This is very efficient and something that rustc / llvm love to optimize, which gives some insight into why Rust iterators are so fast.

Now that we have it, let's use it!

fn main() {
    let nat = NaturalNumbers::new().take(5);
    let seq = Map::new(nat, |e| e * e);
    for i in seq {
        println!("{}", i);
    }
}

And run it:

$ cargo run
     Compiling iterator-article v0.1.0 (/home/david/programming/iterator-article)
      Finished dev [unoptimized + debuginfo] target(s) in 0.17s
       Running `target/debug/iterator-article`
  0
  1
  4
  9
  16

Nice! We can transform sequences using our own struct. If you want to see it in action yourself, you can play with it on the rust playground.

This is certainly powerful, but it would be nice to filter the element as well. Map only has access to a single element at a time, and must operate on the element. We can play around with the function types passed but most of the time we just want to filter out certain elements.

Filter

Filter is an interesting abstraction, as it concerns itself with retaining elements of a sequence which satisfy some criteria, and dropping the rest. The criteria function, or predicate function, borrows a value from the iterator and returns true or false. If the predicate evaluates to true on an element, return it to the caller. If the predicate is false, forget about it and continue searching.

This abstraction is also very common in other languages, and is just as essential as Map for functional programming.

The other wrinkle is that we need to care about ownership in Rust. Map would want to own each element as it needs to transform it, but filter just needs to borrow the element. We won't cover the magic involved with the Fn family and references, but this will work:

FnMut(&Iter::Item) -> bool

Our job is then similar to Map, we need a struct and constructor:

// struct to hold iterator and predicate function pointer
struct Filter<Iter, Predicate> {
    iter: Iter,
    pred: Predicate,
}

// And a default constructor
impl<Iter, Predicate> Filter<Iter, Predicate> {
    fn new(iter: Iter, pred: Predicate) -> Self {
        Self { iter, pred }
    }
}

Same idea as Map – store the iterator and function in a struct. Now we can implement Iterator in a similar fashion:

impl<Iter, Predicate> Iterator for Filter<Iter, Predicate>
where
    Iter: Iterator,
    Predicate: FnMut(&Iter::Item) -> bool,
{
    type Item = Iter::Item;
    fn next(&mut self) -> Option<Self::Item> {
        while let Some(ele) = self.iter.next() {
            if (self.pred)(&ele) {
                return Some(ele);
            }
        }
        None
    }
}

We're again iterating over our underlying iterator, and then testing each element with our predicate. If it passes, we return the element. We're implicitly mutating self.iter as it's also an iterator, so no state is lost. When the caller calls next() we'll simply continue iterating where left off in self.iter and continue the process. Eventually we'll exhaust the underlying iterator and stop iteration by returning None.

So now that we have it, let's use it! We'll build off of the Map example above to retain the even elements:

fn main() {
    let nat = NaturalNumbers::new().take(10);
    let seq = Map::new(nat, |e| e * e);
    let mut seq = Filter::new(seq, |e: &u32| *e % 2 == 0);
    for i in seq {
        println!("{}", i);
    }
}

Which when run prints out (run it on the playground here):

~ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.04s
     Running `target/debug/iterator-article`
0
4
16
36
64

Great! We can now selectively retain elements in a sequence. The final tool to make is reduce (also called fold) which is the most powerful tool yet.

Reduce

The motivation for reduce (fold in Rust) is pretty simple: We need a way to collapse entire sequences into some type. Map and Filter only operate on each element one a time, not an entire sequence. How would we sum all numbers in a list?

The mechanics are pretty simple thankfully:

  1. We have a base type; the accumulator. In the summing example, this would be 0.
  2. We have a function FnMut(acc, ele) -> acc which melds the accumulator and the given element.

For example, to multiply a list of integers we will need:

  1. The accumulator, with initial value 1.
  2. the function |acc, ele| acc * ele
  3. A list [1, 2, 3]

We can view the computation with the table below:

Table 1: Final result: 6
iter acc ele product
1 1 1 1
2 1 2 2
3 2 3 6

So the idea is to accumulate values into the accumulator. We don't need the Iterator trait just yet, so we can implement reduce with a free standing function:

fn reduce<Acc, Iter, ReduceFn>(iterator: Iter, acc: Acc, reducefn: ReduceFn) -> Acc
where
    Iter: Iterator,
    ReduceFn: Fn(Acc, Iter::Item) -> Acc,
{
    let mut acc = acc;
    for ele in iterator {
        acc = reducefn(acc, ele);
    }
    acc
}

We can now use it:

fn main() {
    let nat = NaturalNumbers::new().take(4);
    let mut seq = Filter::new(nat, |e: &u32| *e > 0);
    let prod = reduce(seq, 1, |acc, ele| acc * ele);
    println!("{}", prod);
}

Which outputs 1 * 1 * 2 * 3 = 6 as expected (rust playground):

~ cargo run
    Blocking waiting for file lock on build directory
   Compiling iterator-article v0.1.0 (/home/david/programming/iterator-article)
    Finished dev [unoptimized + debuginfo] target(s) in 0.33s
     Running `target/debug/iterator-article`
6

Quick note on reduce

reduce is strictly more powerful than Map and Filter as it has access to the whole collection and an accumulator. We can easily implement Filter in terms of reduce for example:

let mut empty_vec = vec![];
let bigger_than_five = reduce(
    NaturalNumbers::new().take(10),
    &mut empty_vec,
    |acc, ele| {
        if ele > 5 {
            acc.push(ele);
        }
        acc
    },
);

I would recommend playing around with this function. It's useful to internalize that reduce (fold) can produce any output type. However I would keep in mind that unnecessary uses of reduce like the example above removes access to the Iterator performance optimizations.

Part 2: Our own Iterator Trait

The following code is certainly nice:

let nat = NaturalNumbers::new().take(4);
let doubled = Map::new(nat, |e| e * e);
let mut seq = Filter::new(doubled, |e: &u32| *e % 2 == 0);
let prod = reduce(seq, 1, |acc, ele| acc * ele);

But this is far easier to read:

let prod = NaturalNumbers::new()
    .take(4)
    .map(|e| e * e)
    .filter(|e: &u32| *e % 2 == 0)
    .reduce(1, |acc, ele| acc * ele);

The question is then: How does Iterator provide this interface?

As mentioned above, Iterator provides a whole bunch of default methods to facilitate this clean API. To better understand how this works, let's define our own Iterator trait:

trait MyIterator {
    type Item;
    fn next(&mut self) -> Option<Self::Item>;
}

And update our previous Iterator implementations:

-impl<Iter, Predicate> Iterator for Filter<Iter, Predicate>
+impl<Iter, Predicate> MyIterator for Filter<Iter, Predicate>
...

You can view the whole refactor on the rust playground. Unfortunately, our changes don't compile as we no longer have a Iterator::take(N) method:

error[E0599]: no method named `take` found for struct `NaturalNumbers` in the current scope
   --> src/main.rs:116:37
    |
1   | struct NaturalNumbers {
    | ---------------------
    | |
    | method `take` not found for this
    | doesn't satisfy `NaturalNumbers: std::iter::Iterator`
...
116 |     let nat = NaturalNumbers::new().take(4);
    |                                     ^^^^ method not found in `NaturalNumbers`
    |
    = note: the method `take` exists but the following trait bounds were not satisfied:
            `NaturalNumbers: std::iter::Iterator`
            which is required by `&mut NaturalNumbers: std::iter::Iterator`
    = help: items from traits can only be used if the trait is implemented and in scope
    = note: the following trait defines an item `take`, perhaps you need to implement it:
            candidate #1: `std::iter::Iterator`

It's looking like we'll need to implement Take ourselves. It's a very similar process as before. We'll need a struct and Iterator implementation:

struct Take<Iter> {
    iterator: Iter,
    left: usize,
}

impl<Iter> Take<Iter> {
    fn new(iterator: Iter, left: usize) -> Self {
        Self { iterator, left }
    }
}

impl<Iter: MyIterator> MyIterator for Take<Iter> {
    type Item = Iter::Item;
    fn next(&mut self) -> Option<Self::Item> {
        if self.left > 0 {
            self.left -= 1;
            self.iterator.next()
        } else {
            None
        }
    }
}

Now that we have the struct, we need to modify MyIterator to achieve the desired API. Things will get a bit introspective, as we cannot refer to any concrete types. We instead rely on the Self language feature to specify that types which implement MyIterator will be the ones used in the method calls. We'll want to transfer ownership of iterators in these methods, so our MyIterator::Take(N) signature will read:

fn take(self, left: usize) -> Take<Self>

The other wrinkle is that this won't compile, as the Rust compiler is not confident it can layout the Take struct properly, as Self can be !Sized. This can seem obscure, but the error message is pretty good:

error[E0277]: the size for values of type `Self` cannot be known at compilation time
   --> src/main.rs:116:37
    |
90  | struct Take<Iter> {
    |             ---- required by this bound in `Take`
...
116 |     fn take(self, amount: usize) -> Take<Self> {
    |                                     ^^^^^^^^^^- help: consider further restricting `Self`: `where Self: std::marker::Sized`
    |                                     |
    |                                     doesn't have a size known at compile-time
    |
    = help: the trait `std::marker::Sized` is not implemented for `Self`
    = note: to learn more, visit <https://doc.rust-lang.org/book/ch19-04-advanced-types.html#dynamically-sized-types-and-the-sized-trait>

To better understand this error, what is the type of seq in the following?

let seq = NaturalNumbers::new()
    .take(4)
    .map(|e| e * e)
    .filter(|e: &u32| *e % 2 == 0);

The answer is Filter<Map<Take<NaturalNumbers>, fn#1>, fn#2>.

Recall that Map, Filter, and Take all take a type Iter: MyIterator by value, so it needs to physically store that iterator in the struct memory layout. The Rust language tracks this information in the Sized trait. So if a type is Sized, Rust can properly lay out the struct. If a type is !Sized, then indirection or obscure language features are required to embed that type in the struct. The compiler has helpfully told us to add a Sized bound on Self:

 fn take(self, amount: usize) -> Take<Self>
+where
+    Self: std::marker::Sized,
 {
     Take::new(self, amount)
 }

This compiles and works! Let's run our main again:

fn main() {
    let nat = NaturalNumbers::new().take(4);
    let doubled = Map::new(nat, |e| e * e);
    let seq = Filter::new(doubled, |e: &u32| *e > 0);
    let prod = reduce(seq, 1, |acc, ele| acc * ele);
    println!("{}", prod);
}

Which outputs:

~ cargo run
    Finished dev [unoptimized + debuginfo] target(s) in 0.03s
     Running `target/debug/iterator-article`
36

We can now do the same procedure for Map and Filter. We can reuse the constructors but replace Iter with Self:

trait MyIterator {
    // elided ...

    fn map<Out, F>(self, f: F) -> Map<Self, F>
    where
        F: FnMut(Self::Item) -> Out,
        Self: std::marker::Sized,
    {
        Map::new(self, f)
    }

    fn filter<F>(self, f: F) -> Filter<Self, F>
    where
        F: FnMut(&Self::Item) -> bool,
        Self: std::marker::Sized,
    {
        Filter::new(self, f)
    }
}

Our main function is now:

fn main() {
    let seq = NaturalNumbers::new()
        .take(4)
        .map(|e| e * e)
        .filter(|e: &u32| *e > 0);
    let prod = reduce(seq, 1, |acc, ele| acc * ele);
    println!("{}", prod);
}

Which outputs 36 as before. Now we just need to implement reduce in a similar way as before:

trait MyIterator {
  // elided...

  fn reduce<Acc, ReduceFn>(mut self, acc: Acc, mut reducefn: ReduceFn) -> Acc
  where
      ReduceFn: FnMut(Acc, Self::Item) -> Acc,
      Self: std::marker::Sized,
  {
      let mut acc = acc;
      while let Some(ele) = self.next() {
          acc = reducefn(acc, ele);
      }
      acc
  }
}

And change our main function to be:

fn main() {
    let prod = NaturalNumbers::new()
        .take(4)
        .map(|e| e * e)
        .filter(|e: &u32| *e > 0)
        .reduce(1, |acc, ele| acc * ele);
    println!("{}", prod);
}

Which outputs 36 as expected (rust playground):

~ cargo run
   Compiling iterator-article v0.1.0 (/home/david/programming/iterator-article)
    Finished dev [unoptimized + debuginfo] target(s) in 0.15s
     Running `target/debug/iterator-article`
36

Conclusion

Phew, 3.6k words later we've accomplished our goal. We've recreated the Iterator, and delved into it's mechanics. I hope you've learned something from his article, as I certainly learned a lot writing it. I really like this language feature, and think it represents some of the best API design Rust offers.

Appendix: The Primes

We started our journey by defining the NaturalNumbers, so it would be cool if we could generate an infinite sequence of Primes:

struct Primes {
    seen: Vec<u32>,
    curr: u32,
}

impl Primes {
    fn new() -> Self {
        Self {
            seen: vec![],
            curr: 2,
        }
    }
}

impl Iterator for Primes {
    type Item = u32;

    fn next(&mut self) -> Option<u32> {
        for ele in self.curr.. {
            if !self.seen.iter().any(|prime| ele % prime == 0) {
                self.seen.push(ele);
                self.curr = ele + 1;
                return Some(ele);
            }
        }
        None
    }
}

Which can we use:

fn main() {
    println!("{:?}", Primes::new().take(20).collect::<Vec<_>>());
}

And this outputs the first twenty primes (rust playground):

~ cargo run
      Finished dev [unoptimized + debuginfo] target(s) in 0.19s
       Running `target/debug/iterator-article`
  [2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47, 53, 59, 61, 67, 71]

It's just that easy to generate a sequence of Primes using Iterator in Rust. The reader is encouraged to use MyIterator::reduce to achieve the same effect.

... Continue Reading(current)

Implementing a Copyless Redis Protocol in Rust with Parsing Combinators

Redis is a fantastic noSQL database with a beautifully simple design. One of the fundamental responsibilities of the redis server is to encode and decode RESP (Redis Serialization Protocol) messages. For example, when a client issues the command:

SET foo bar

That's encoded by the client and sent to the server as:

*3\r\n$3\r\nSET\r\n$3\r\nfoo$3\r\nbar\r\n

It's not necessary important to understand that RESP message right now, but the server will need to decode that back into something equivalent so it can perform the operation. This blog post will go through my efforts to implement a copyless RESP parser in redis-oxide.

Redis Serialization Protocol (RESP)

The redis folks were kind enough to simply document the v3 protocol on their website. The protocol is CLRF (\r\n) delimited, with each word carrying a type. The simplest types are Simple Strings and Errors, which look like:

+OK\r\n
-Error Msg\r\n

There's also bulkstrings, which are strings with a length:

$3\r\nFOO\r\n

We also have integers:

:1337\r\n

And finally we have arrays, which simply have a size indicating how many redis values follow.

*3\r\n$3\r\nSET\r\n$3\r\nfoo$3\r\nbar\r\n

We can read the array resp as:

*3\r\n        -- We have three elements in this array
$3\r\nSET\r\n -- First element is a bulk string of length 3 with value SET
$3\r\nfoo     -- Second element is also a bulk string of length 3 with value foo
$3\r\nbar\r\n -- Third element is also a bulk string of length 3 with value foo

Now that we're familiar with the protocol, lets get an idea on what parsing combinators actually are.

Parsing Combinators

The basic idea behind parsing combinators is that you build more complex parsers from simpler parsers. A simple parser could do something like fetch a word, and then we can use that later to parse sentences, as they're composed of words.

From the RESP examples above, we can see that it's a series of words delimited by CLRF, so it would be very handy to have a word parser. We can implement one pretty simply by collecting all bytes until we hit a CLRF. As we'll see later, almost everything in RESP will be parsed by varying the output of our word parser.

All parsing combinators need a way of representing position in the input. The strategy I'll be using is to have a cursor which I'll track the position of (starting at 0).

Copyless

Now that we understand RESP, and have an idea on what a parser combinator is, we'll need to understand how we can avoid copying in redis oxide. For context, I previously heap allocated almost all bytes in redis-oxide. Without digging too deep into the details, my fundamental types were defined as:

/// These types are used by state and ops to actually perform useful work.
pub type Value = Vec<u8>;
/// Key is the standard type to index our structures
pub type Key = Vec<u8>;

Which meant my parser needed to output Vec's, which are heap allocated. Value and Key are used almost everywhere in the application to represent almost all values in a redis command. So we need to change these types to be small, stack allocated items. No matter what direction we take, we need to play nice with tokio's codec scheme.

Now it should be understood that tokio's Decoder trait works as follows:

  1. redis-oxide uses the tokio framework which provides services to listen on sockets.
  2. One of the APIs provided is the Decoder trait, which you use to carefully read bytes off a socket to produce a type.
  3. Tokio maintains a large buffer and will copy bytes received off a socket into this buffer.
  4. It will pass this buffer to the parser until the parser signals that a type can be produced.
  5. The tokio managed buffer is smart, and allows for several byte slices to be made safely (Bytes type).
  6. The parser will cleave off enough bytes from the buffer when producing a type to allow tokio to safely copy later received bytes.
  7. We bypass lifetime issues as the Bytes type maintains reference counts, so we can pass slices up further up the application.
  8. Once redis-oxide is done with the produced types, we'll drop the slice references, and memory can be reclaimed.

So our parser will need to dance this careful dance. As described above, we can safely share byte slices of this underlying buffer using the Bytes type. So we'll redefine our fundamental types in terms of Bytes:

/// These types are used by state and ops to actually perform useful work.
pub type Value = Bytes;
/// Key is the standard type to index our structures
pub type Key = Bytes;

Aside from a massive related refactoring job, we now need to just write the parser πŸ˜›.

Writing the Parser

Writing the parser will require us to solve a few problems:

  1. Data representation and type transformations.
  2. Error handling and type setup.
  3. Writing the fundamental parsers.
  4. Dealing with arrays.

Data Representation and Type Transformations

To better understand our requirements, let us first consider our output type:

/// RedisValueRef is the canonical type for values flowing
/// through the system. Inputs are converted into RedisValues,
/// and outputs are converted into RedisValues.
#[derive(PartialEq, Clone)]
pub enum RedisValueRef {
    String(Bytes),
    Error(Bytes),
    Int(i64),
    Array(Vec<RedisValueRef>),
    NullArray,
    NullBulkString,
    ErrorMsg(Vec<u8>), // This is not a RESP type. This is an redis-oxide internal error type.
}

This is the type that redix-oxide uses to later run commands, so our parser will eventually need to output this type. This means we'll need to transform the given RESP buffer into one of those enums above. Doing it directly however is expensive – recall that the Bytes type needs to fiddle with reference counts. So we'll use a simpler type:

/// Fundamental struct for viewing byte slices
///
/// Used for zero-copy redis values.
struct BufSplit(usize, usize);

/// BufSplit based equivalent to our output type RedisValueRef
enum RedisBufSplit {
    String(BufSplit),
    Error(BufSplit),
    Int(i64),
    Array(Vec<RedisBufSplit>),
    NullArray,
    NullBulkString,
}

So as we're parsing, we'll need to need to track the start and end of a given byte slice that represents one of RedisBufSplit. Later we'll use BufSplit and the true tokio buffer to transform RedisBufSplit β†’ RedisValueRef.

So for example, if I have the following RESP fragment:

frag:  $3\r\nFOO\r\n
index: 012 3 4567 8
(\r,\n are single characters)

We'd have the following type:

RedisBufSplit::String(BufSplit(4,7))

Representing the string byte slice "FOO".

Now that we can represent our values, we'll need to consider error handling.

Error handling and Types

There's a lot of ways that parsing can fail. A client could send us straight garbage, or something more subtle like an off-by-one error. We'll list each error in an enum type:

#[derive(Debug)]
pub enum RESPError {
    UnexpectedEnd,
    UnknownStartingByte,
    IOError(std::io::Error),
    IntParseFailure,
    BadBulkStringSize(i64),
    BadArraySize(i64),
}

As we're writing rust, it's natural to use the Result<T, E> type. Our success type needs to track our current position as well as returning a sensible type. As well, we'll need to signal our parsing status to tokio. The Decoder trait has the following signature:

fn decode(&mut self, src: &mut BytesMut) -> Result<Option<Self::Item>, Self::Error>;

This is a peculiar type, so let's work through the possible cases:

Case Meaning
Ok(Some(Self::Item)) We successfully parsed a value!
Ok(None) Looks fine but incomplete. We need the client to send more data.
Err(Self::Error) Parsing failed somehow.

So now we have all the information required. Our Item type needs to track position and the actual type, so we can use a tuple (usize, RedisBufSplit). Our fundamental parsing type is then:

type RedisResult = Result<Option<(usize, RedisBufSplit)>, RESPError>;

All subsequent parsers will eventually need to output RedisResult.

Writing the Fundamental Parser

Now that we understand our data representation and errors, lets write our first parser! As mentioned several times, RESP is a word based protocol. So lets write a word parser! The only thing we care about is finding the position (index) of the next CLRF.

As this is infallible, we don't necessary need to use the RedisResult type. So our function can have the following signature:

fn word(buf: &BytesMut, pos: usize) -> Option<(usize, BufSplit)>

So we'll take the tokio provided buffer buf, and our current position pos, and if we can, output Some((next_pos, BufSplit)). We'll use burntsushi's fantastic memchr crate to accelerate searching for CLRF (\r\n):

/// Get a word from `buf` starting at `pos`
#[inline]
fn word(buf: &BytesMut, pos: usize) -> Option<(usize, BufSplit)> {
    // We're at the edge of `buf`, so we can't find a word.
    if buf.len() <= pos {
        return None;
    }
    // Find the position of the b'\r'
    memchr(b'\r', &buf[pos..]).and_then(|end| {
        if end + 1 < buf.len() {
            // pos + end == first index of b'\r' after `pos`
            // pos + end + 2 == ..word\r\n<HERE> -- skip to after CLRF
            Some((pos + end + 2, BufSplit(pos, pos + end)))
        } else {
            // Edge case: We received just enough bytes from the client
            // to get the \r but not the \n
            None
        }
    })
}

Great! We can now efficiently grab individual words from our input buffer. Even better, simple strings and errors are simple type transformations of this:

fn simple_string(buf: &BytesMut, pos: usize) -> RedisResult {
    Ok(word(buf, pos).map(|(pos, word)| (pos, RedisBufSplit::String(word))))
}

fn error(buf: &BytesMut, pos: usize) -> RedisResult {
    Ok(word(buf, pos).map(|(pos, word)| (pos, RedisBufSplit::Error(word))))
}

If that syntax isn't super familiar, both of the above are equivalent to:

fn simple_string(buf: &BytesMut, pos: usize) -> RedisResult {
    match word(buf, pos) {
        Some((pos, word)) => Ok(Some((pos, RedisBufSplit::String(word)))),
        None => Ok(None),
    }
}

So all we're doing is wrapping the BufSplit returned by word in the appropriate RedisBufSplit type.

Nice! So our easy types are out of the way. We now need to parse ints, bulk strings, and finally arrays.

Parsing Ints

Ints are the first non-trivial type to parse. RESP represents signed 64 bit integers as a base 10 string, so we'll need to:

  1. Grab a word (BufSplit, can turn into byte slice with BufSplit::as_slice)
  2. Convert byte slice to a str
  3. Convert the str to an i64

This process can fail on steps 2 and 3. Rust requires that strings are uft-8 encoded, so converting to a str can fail. Then someone could pass "abc" as the int, so converting to i64 can fail. Keeping those in mind, we can now write the int function:

fn int(buf: &BytesMut, pos: usize) -> Result<Option<(usize, i64)>, RESPError> {
    match word(buf, pos) {
        Some((pos, word)) => {
            // word.as_slice(buf) is the method call BufSplit::as_slice(&self, &BytesMut) to access the byte slice.
            let s = str::from_utf8(word.as_slice(buf)).map_err(|_| RESPError::IntParseFailure)?;
            // Convert the string to an i64. Note the `?` for early returns.
            let i = s.parse().map_err(|_| RESPError::IntParseFailure)?;
            Ok(Some((pos, i)))
        }
        None => Ok(None),
    }
}

Nice, so we can grab ints from the input. We only need a trivial function to get the desired RedisResult type:

fn resp_int(buf: &BytesMut, pos: usize) -> RedisResult {
    Ok(int(buf, pos)?.map(|(pos, int)| (pos, RedisBufSplit::Int(int))))
}

Bulk Strings

So bulk strings in RESP start with a length (i64), and then the string content (delimited by CLRF of course). So we can use our previous int function, and then work through the possible cases (see second code block for comments).

Here's the code without comments:

fn bulk_string(buf: &BytesMut, pos: usize) -> RedisResult {
    match int(buf, pos)? {
        Some((pos, -1)) => Ok(Some((pos, RedisBufSplit::NullBulkString))),
        Some((pos, size)) if size >= 0 => {
            let total_size = pos + size as usize;
            if buf.len() < total_size + 2 {
                Ok(None)
            } else {
                let bb = RedisBufSplit::String(BufSplit(pos, total_size));
                Ok(Some((total_size + 2, bb)))
            }
        }
        Some((_pos, bad_size)) => Err(RESPError::BadBulkStringSize(bad_size)),
        None => Ok(None),
    }
}

And here's the same code with comments explaining what's going on:

fn bulk_string(buf: &BytesMut, pos: usize) -> RedisResult {
    // recall that the `pos` returned by `int` is the first index of the string content.
    match int(buf, pos)? {
        // special case: redis defines a NullBulkString type, with length of -1.
        Some((pos, -1)) => Ok(Some((pos, RedisBufSplit::NullBulkString))),
        // We have a size >= 0
        Some((pos, size)) if size >= 0 => {
            // We trust the client here, and directly calculate the end index of string (absolute w.r.t pos)
            let total_size = pos + size as usize;
            // The client hasn't sent us enough bytes
            if buf.len() < total_size + 2 {
                Ok(None)
            } else {
                // We have enough bytes, so we can generate the correct type.
                let bb = RedisBufSplit::String(BufSplit(pos, total_size));
                // total_size + 2 == ...bulkstring\r\n<HERE> -- after CLRF
                Ok(Some((total_size + 2, bb)))
            }
        }
        // We recieved a garbage size (size < -1), so error out
        Some((_pos, bad_size)) => Err(RESPError::BadBulkStringSize(bad_size)),
        // Not enough bytes to parse an int (i.e. no CLRF to delimit the int)
        None => Ok(None),
    }
}

Now we have only one type left: Arrays.

Arrays: An Issue

Arrays are fundamentally more complex than other types as they are a sequence of redis values. We'll have to be more clever. They are defined as a size (i64) and then a size number of redis values. This is naturally recursive, as we can have arrays inside arrays.

The issue is that we need a function which will parse redis values, as fn array(..) is only responsible for redis arrays. But that generic parse function will also need to call the array parser!

Thankfully we can use some first year CS.

Mutual Recursion: Top Level Parse Function and Arrays

Lets first define our top level parse function. It's responsible for taking a buffer and returning a RedisResult, agnostic to particular RESP types. RESP tags every element with a type byte, so our function is short:

fn parse(buf: &BytesMut, pos: usize) -> RedisResult {
    if buf.is_empty() {
        return Ok(None);
    }

    match buf[pos] {
        b'+' => simple_string(buf, pos + 1),
        b'-' => error(buf, pos + 1),
        b'$' => bulk_string(buf, pos + 1),
        b':' => resp_int(buf, pos + 1),
        b'*' => array(buf, pos + 1),
        _ => Err(RESPError::UnknownStartingByte),
    }
}

So parse(..) will check the byte at pos (initially 0), and use that to delegate to the correct function. Now this is very useful, and will allow us to write the array parser.

Here's the code without comments:

fn array(buf: &BytesMut, pos: usize) -> RedisResult {
    match int(buf, pos)? {
        None => Ok(None),
        Some((pos, -1)) => Ok(Some((pos, RedisBufSplit::NullArray))),
        Some((pos, num_elements)) if num_elements >= 0 => {
            let mut values = Vec::with_capacity(num_elements as usize);
            let mut curr_pos = pos;
            for _ in 0..num_elements {
                match parse(buf, curr_pos)? {
                    Some((new_pos, value)) => {
                        curr_pos = new_pos;
                        values.push(value);
                    }
                    None => return Ok(None),
                }
            }
            Ok(Some((curr_pos, RedisBufSplit::Array(values))))
        }
        Some((_pos, bad_num_elements)) => Err(RESPError::BadArraySize(bad_num_elements)),
    }
}

And the same code with comments:

fn array(buf: &BytesMut, pos: usize) -> RedisResult {
    match int(buf, pos)? {
        // Not enough bytes to determine the array size
        None => Ok(None),
        // special value: NullArray. Has size -1.
        Some((pos, -1)) => Ok(Some((pos, RedisBufSplit::NullArray))),
        // Happy path. We have a valid size (num_elements > 0)
        Some((pos, num_elements)) if num_elements >= 0 => {
            // As we're recieving a dynamic number of elements, we need to heap allocate our BufSplits.
            let mut values = Vec::with_capacity(num_elements as usize);
            // We're going to forward iterate on `curr_pos`
            let mut curr_pos = pos;
            for _ in 0..num_elements {
                // Mutual Recursion! We need to parse the value at `curr_pos`
                match parse(buf, curr_pos)? {
                    // We got a value, so add it to the `values` vector and
                    // update `curr_pos`.
                    Some((new_pos, value)) => {
                        curr_pos = new_pos;
                        values.push(value);
                    }
                    // Not enough bytes. Abandon parsing and free vec.
                    None => return Ok(None),
                }
            }
            // We had enough bytes to fully parse the array! Return it.
            Ok(Some((curr_pos, RedisBufSplit::Array(values))))
        }
        // Client sent us a garbage size (num_elements < -1)
        Some((_pos, bad_num_elements)) => Err(RESPError::BadArraySize(bad_num_elements)),
    }
}

So we can now parse arrays, and can now put everything together.

Putting everything together

We're so close! We just need a few conversion functions before we can implement Decoder. Once we're done parsing, we're guaranteed to have a contiguous slice of memory that corresponds to the RedisBufSplit types we've generated until this moment. So we just need two functions:

  1. Take the large Bytes buffer and a BufSplit(start,end) slice into it to make a byte slice (also Bytes type)
  2. Take the RedisBufSplit and the large Bytes buffer and produce RedisValueRef types.

The conversion function is actually pretty mechanical:

// First, we need a convenient way to convert our index pairs into byte slices.
impl BufSplit {
    /// Get a lifetime appropriate slice of the underlying buffer.
    ///
    /// Constant time.
    #[inline]
    fn as_slice<'a>(&self, buf: &'a BytesMut) -> &'a [u8] {
        &buf[self.0..self.1]
    }

    /// Get a Bytes object representing the appropriate slice
    /// of bytes.
    ///
    /// Constant time.
    #[inline]
    fn as_bytes(&self, buf: &Bytes) -> Bytes {
        buf.slice(self.0..self.1)
    }
}
// Second, we'll need to convert a RedisBufSplit -> RedisValueRef given a Bytes buffer.
impl RedisBufSplit {
    fn redis_value(self, buf: &Bytes) -> RedisValueRef {
        match self {
            // bfs is BufSplit(start, end), which has the as_bytes method defined above
            RedisBufSplit::String(bfs) => RedisValueRef::String(bfs.as_bytes(buf)),
            RedisBufSplit::Error(bfs) => RedisValueRef::Error(bfs.as_bytes(buf)),
            RedisBufSplit::Array(arr) => {
                RedisValueRef::Array(arr.into_iter().map(|bfs| bfs.redis_value(buf)).collect())
            }
            RedisBufSplit::NullArray => RedisValueRef::NullArray,
            RedisBufSplit::NullBulkString => RedisValueRef::NullBulkString,
            RedisBufSplit::Int(i) => RedisValueRef::Int(i),
        }
    }
}

We can now implement the Decoder trait so our parser fits in with the tokio machinery:

/// The struct we're using. We don't need to store anything in the struct.
/// Later on we can expand this struct for optimization purposes.
#[derive(Default)]
pub struct RespParser;

impl Decoder for RespParser {
    type Item = RedisValueRef;
    type Error = RESPError;
    fn decode(&mut self, buf: &mut BytesMut) -> Result<Option<Self::Item>, Self::Error> {
        if buf.is_empty() {
            return Ok(None);
        }

        match parse(buf, 0)? {
            Some((pos, value)) => {
                // We parsed a value! Shave off the bytes so tokio can continue filling the buffer.
                let our_data = buf.split_to(pos);
                // Use `redis_value` defined above to get the correct type
                Ok(Some(value.redis_value(&our_data.freeze())))
            }
            None => Ok(None),
        }
    }
}

We did it! We can now decode RedisValueRef's from bytes off a socket! A complete parser includes encoding RedisValueRef's, but the code is pretty simple so you can read it here. You can view the tests here and how it's actually used in redis-oxide here.

Conclusion

Overall this 500+ line journey has netted us an efficient, zero copy RESP parser using parsing combinators. It was a lot of work to get the project refactored, but I am proud to have a solution I wrote myself and actually understand (no offense to the combine people).

... Continue Reading(current)

Linux and Fuzzy Feelings

I've been using (GNU/)Linux for the better part of a decade now, and have come to appreciate my attachment with the software. There's plenty of disadvantages and frustrations with using it as your daily driver. In fact, I write this loving article after dealing with:

  • A bios update that made my motherboard forget that Linux exists.
  • After fixing that, the cool kernel I wanted to try failed to start the graphical target.
  • For some reason, CTRL+<Mouse Wheel> isn't zooming webpages in.
    • Update: 30 minutes later into writing the article, it works again. No idea why.

So, something is keeping me here.

Why do I use Linux then?

What other people say:

People usually post the following list on hackernews to incite some type of FOMO:

  • Better development experience
  • Security and Privacy
    • A sense you're not beholden to the whims of the machine god (win10 updates, etc)
  • The shellβ„’, emacsβ„’ and vimβ„’
  • Some hyper customized, one of a kind setup, that is literally impossible on OSX and Windows

What I usually say:

  • Better development experience
  • Security and Privacy
    • A sense you're not beholden to the whims of the machine god (win10 updates, etc)
  • The shellβ„’, emacsβ„’ and vimβ„’

What I actually mean:

  • I've sunk an immense amount of time and energy in understanding this operating system and how to be proficient in it.
  • I've developed custom workflows that you just can't recreate faithfully on other platforms.
    • No really, I've tried. OSX is good, but it's still OSX.
  • I have a real sense of ownership over my software and hardware using it.

That last point is very important. I'm not subservient to the machine gods. There's no product manager pushing some bullshit change for the sake of it to fuck up my established workflow without recourse (1 week slack WYSIWYG :D).

I've got my tools, and they are old. They don't change often, and rarely for the worse. The core ones – Emacs and Vim, will last well into the next century. The other tools are less essential, like my web browser, shell, and window manager. I've frequently changed those over time, and just rebind the keys to fit muscle memory as required.

What's in the back of my mind

  • Fading skills on other operating systems. I used to have the control panel memorized. Now I don't.
  • Less attention to features on other operating systems (eg. you can configure the touchbar on OSX)
  • At work, I'm "that guy". I'm the 4.5% that uses emacs + terminal. This has occasionally caused tension when someone wants to show me something, and bizarre shit happens when they mash keys.
    • And I can't really help much with IDE setup issues outside of interactions with the terminal (e.g. PATH issues).
    • Rebinding caps lock to control messes with some people at a deep level. I had no idea anyone actually used the key.
    • The anxiety of someone watching me causes bizarre stuff to happen, including lowercasing an entire file, highlighting only the whitespace, and sorting a python script lexographically.

Fuzzy Feelings

Hopefully it's clear by now that I find Linux familiar and comfortable, if nothing else. I recently built a PC and decided to install Arch Linux for the first time, and it pretty much worked! I had an operating system in which I installed all of the required bits, and had a greater understanding of all the parts required to keep it running. The feeling of leaving that chroot and praying I made the boot entries properly was a thrill.

The fact that this new computer, shoddily constructed by myself and my brother, boots into a viable and familiar operating system that I control is fucking amazing. God bless the opensource community.

Finally: My Setup

It doesn't seem right to write a love letter blog about Linux and not share my setup.

The setup I've roughly used since my CrunchBang days is to have four virtual desktops1 setup as follows:

  1. The web browser. Used to be Chrome before Firefox became good again.
  2. Emacs.
  3. Terminal.
  4. Empty. Used for temporary or less important applications. Things like graphical file browsers, word processors, or music clients.

I deeply enjoy this setup as it's baked into my muscle memory. There's no distinct advantage in and of itself, aside from maybe separation of concerns. The base system doesn't matter much. I used Antergos until they shutdown and now I just use vanilla Arch Linux (btw). My window manager is i3 with some scripts to polish the experience. In fact, at work, I encode this setup fairly well on the OSX machines we work on. The individual tools (emacs) don't work as well or in the same way, but it's similar.

Conclusion

I feel at home in my Linux setup. It's comfortable. I'm responsible for it, and can tweak it to my needs. And it looks pretty. And so on.

I get fuzzy feelings thinking about it sometimes.

... Continue Reading(current)

Learning and Loving Rust Combinators

I've been working pretty regularly on redis-oxide, a pure rust clone of redis, and the following snippet surprised me:

HashOps::HLen(key) => match read_hashes!(state, &key) {
    Some(hash) => hash.len() as Count,
    None => 0,
}.into()

By itself, this code isn't extraordinary. It just checks how many elements is in a given redis hash (dictionary), defaulting to zero. The fact it doesn't use combinators is what suprised me. Most of my code recently has taken the following form:

HashOps::HLen(key) => read_hashes!(state, &key)
    .map_or(0, |hash| hash.len() as Count)
    .into()

So What?

Combinators are nice as they allow for concise, and often more readable, type transformations. Both examples have similar line length, but differ in visual complexity. The first example has the match keyword, a block, and pattern matches on option's variants. Visually this is more text to parse and understand, especially when you're tired after work. The second, "combinated", example is in my opinion easier to read. We're reading our state, transforming the hash type into a Count, and then discarding the None case by defaulting to 0.

We'll see that there's a few more advantages, such as eliminating branches and general purity.

What are Combinators?

Combinators are a realization of a pretty common pattern: "I can't deal with this type. I'll just return it." You've likely used this pattern many, many times. Do you recognize the following snippets?

In python:

def sketchy_legacy_code(n):
    if n is None:
        return None
    return n * 2

Or in javascript:

function sketchyLegacyCode(n) {
  if (n === undefined || n === null) {
    return n;
  }
  return n * 2;
}

This pattern certainly occurs more frequently in weakly typed or dynamically typed languages. A (sometimes reasonable) solution to this problem is exceptions, but rust doesn't have them. And exceptions can be problematic, as you can forget to catch them or catching them makes the code harder to understand.

Combinators are then a generalization of this concept. If I can operate on this type, I'll do so. Otherwise, I'll pass it along. This is a very powerful pattern. You can focus on the happy path and errors will be propagated automatically if they occur.

Why use Combinators in Rust?

In the same way that monads saved haskell, combinators help save rust. Rust is notorious for it's error handling as you're forced to encode your errors directly in the type system. This usually takes the form of returning Option<T> and Result<T, E>, which your callees consume.

So rust provides a thousand tools to cut through these types. I'll focus on Option<T> this article as I'm working with a forgiving database. An Option<T> in rust is an enum:

enum Option<T> {
    Some(T),
    None
}

So if I receive an Option<T>, it's either got data for me (Some(T)) or nothing for me (None). This type is very common in redis-oxide and most rust projects. Sometimes you don't care if it's None, so you can use if let Some(value) = myoption {...}, or just let value = myoption.unwrap(). But sometimes you end with a pathological case.

A Pathological Case

The redis command ZSCORE zset_name member is such an example. It has the following (observed behaviour):

  1. If the sorted set does not exist, return Nil.
  2. If the sorted exists, and the member isn't found, return Nil.
  3. Otherwise, return the score of the member as Int.

Let's code this using match statements:

// Inputs: zset_key, member
match read_zsets!(state, &zset_key) {
  // The sorted set may not contain `member`, so another option!
  Some(zset) => match zset.score(&member) {
    // Notice that we have to explicitly give the ReturnValue type.
    // We can't wrap it as we need to return Nil!
    Some(score) => ReturnValue::IntRes(score),
    None => ReturnValue::Nil
  },
  None => ReturnValue::Nil
}

Now this can nest even deeper if we have more options or results to deal with. And we're repeating ourselves with ReturnValue::Nil.

Thankfully, we can use the and_then combinator to un-nest one of our match statements. Let's ignore the None from read_zsets!, as it's not different than the None from zset.score.

match read_zsets!(state, &zset_key).and_then(|zset| zset.score(&member)) {
  Some(score) => ReturnValue::IntRes(score),
  None => ReturnValue::Nil
}

Ok, it's better. But now the line is long and awkwardly formatted. Using and_then transformed our type into Option<Score>, and we would like Option<ReturnValue::IntRes>. We can get there with a map:

read_zsets!(state, &zset_key)
    .and_then(|zset| zset.score(&member))
    .map(ReturnValue::IntRes)

Now our only problem is the return type. We've reached Option<ReturnValue::IntRes>, so we can use the unwrap_or method to deal with the missing key case:

read_zsets!(state, &zset_key)
    .and_then(|zset| zset.score(&member))
    .map(ReturnValue::IntRes)
    .unwrap_or(ReturnValue::Nil)

We did it! We've concisely encoded ZSCORE in rust in a hard-to-fuck-up way. Any sweeping refactors won't forget to change a branch, as there's no branches. You've just transformed the types a few times to achieve the desired result.

A Detour into the Rust Language Source Code

The example above may seem clear to you, but sometimes it's easy to get caught up in a flow.

So what exactly does map do, and how is it different from and_then?

Thanks to modern software engineering practice, we can just check the source code.

Here's how rust implements map on an Option<T>. The signature below says: I take a function with one parameter, T, which returns a type U. I will then give you can an Option<U>.

pub fn map<U, F: FnOnce(T) -> U>(self, f: F) -> Option<U> {
    match self {
        Some(x) => Some(f(x)),
        None => None,
    }
}

So this seems reasonable. It's similar to the python and javascript examples above. However, that Some(f(x)) can cause issues. What is my function returns an option, like zset.scores? Welp, this expression:

read_zsets!(state, &zset_key)
    .map(|zset| zset.scores(&member))

Has type Option<Option<Score>>. Not good. That's why and_then exists:

pub fn and_then<U, F: FnOnce(T) -> Option<U>>(self, f: F) -> Option<U> {
    match self {
        Some(x) => f(x),
        None => None,
    }
}

So and_then will not wrap our result in Some, and instead relies on the passed function to return an Option. So that's why we used it in our previous example. zset.score returns an option, so let's just use its option instead!

Conclusion

Overall, combinators are useful for making concise type transformations. You can decompose your problem into a series of handy transformations, and work your way to the solution. This is very common in functional languages, like haskell's do notation or clojures threading macro (nil punning).

For sure, if this ends up on hackernews, someone will point out the issues with combinators. The worst of them is probably the "what type am I working with?" or borrowing issues.

That said, their use in redis-oxide is likely of great benefit, especially for future maintenance.

... Continue Reading(current)

Tips for Surviving the University of Waterloo

Now that my brother is in his first year at the University of Waterloo, I was inspired to write about some tips to get you through it. I've been at the University of Waterloo (abbreviated UW) for just under four years now, and I've come to appreciate the value of investments. I'm talking about investments that'll ease the burden of the University of Waterloo's relentless work-school cycle.

For those of you who are unfamiliar with UW, it's a university in Canada that specializes in providing industry connections ("co-op") for its student population. These students will flip-flop every four months between a school term and a work term. I'll refer to this process as the Co-op cycle (co-operative cycle).

In a school term, a student can expect to juggle job applications, midterm studying, assignments, midterms, interviews, and more. It's certainly no surprise that this cycle can be brutal, especially for the inexperienced or unorganized.

UW has the ability to warp reality itself. You'll find yourself marvelling at all the free time you have, working a 40+ hour job with hours of travel time a day. It's a common joke among students to recall their parents extolling the difficulties of a full-time job when you're dealing with 60+ hour spikes in workload (to be fair, these students also do not have children).

When you're in UW's bubble, you may feel like the workload is unreasonable, especially when compared to your perception of other universities. My rebuttal is this article. You can make investments in yourself and your tools to automate this burden away.

Please note, however, this article will be based on getting through Computer Science at UW, as that is my experience. However, I'm sure some of the advice will apply to other fields of study or universities.

Invest: In yourself (warning: philosophy ahead)

The best investment you can make to get through UW is to invest your spare time and resources into bettering yourself. There's the obvious advice: hit the gym, eat better, and sleep longer. That advice is well elaborated elsewhere.

I think there's some bigger advice that's often less-internalized than it should be.

What I feel is underappreciated is the need to focus on what you want, and determine pathways to get there. You're probably young and if you're at UW, you're surrounded with opportunity.

This is not new advice, as most self-help resources ask you to set a goal, and sub-goals to work towards it. This is easy to think about and easy to apply in linear pathways (e.g. you can do assignments one question at a time), but difficult to apply in less linear pathways, like life.

Life winds about. Things happen. You have successes, upsets, and plateaus. It can be easy to get caught up in the moment; survive day-by-day.

Long term planning, by contrast, is more difficult. There are many things working against you here:

  1. Failure is something most people don't want to ponder. It's often sickening.
  2. It's easy to excuse yourself (and not others!). Your brain is wired to protect itself.
  3. It's easy to dream unrealistic dreams; easier to think than act.
  4. You're human. You're not "ideal". You will make mistakes, miss things, and probably be hypocritical.

These points center around failures of one kind or another. I obsess over these four points as they offer great advice on how to get to where I want to be. So, with these flaws in mind, what should you do?

First things first, you need to determine what you want. I want financial success and a stable career, and use this a platform for some sort of philanthropy. I'm sort of wired for this stuff, as I've known this from my early teens. So, unless you've been blessed like me, you'll need to determine your values and use those to guide you. Note: If you don't know what you want, that's OK. You should ensure you're always keeping opportunities open for yourself when you figure it out.

Then, you need to make plans. I've developed a system I call the "delta system". I'm sure someone else has invented it prior (or its common knowledge), but in the same way I don't blog unless I've made the blogging platform, I won't follow it. This simple system is then: Measure where you are today, where you were previously, and where you want to be.

The measuring part is obviously an important part. Look critically at your life. If you experienced a failure, what caused it? If you had a success, what lead to it? Are you happy with plateauing, or can you achieve more?

Now, once you've accessed where you are, determine the delta required. What actions can you take to get closer to your goals? What actionable item could you be doing in the short-term, to invest in long-term success? For example, if you want long term financial stability, setting up a budget is an immediate delta to achieving it. Or if your long-term goal is career success, and you're failed previously at communication, your immediate delta is to find advice. Your later deltas would be to implement identified solutions.

Now, this is simultaneously a little robotic and highly personal, and that's on purpose. If modern times have shown anything, it's that gamified systems are easier to follow. (see: mobile games, snowball debt repayment, Chinese social credit score)

A personal example of this is the process I underwent to get into UW. I had long wanted a good career, and it was now time to identify pathways. At this time tech was ballooning despite the economic crisis, and this was a topic I had interest in.

Now, I was a pretty bad junior high-school student, with subpar grades to get into a good university. I had to get serious. I took to khan-academy and took a serendipitously timed math course in the summer between grades nine and ten. Great! I had the grades, but I realized I needed more. My next delta would be to build a portfolio, to help me get into UW and land my work tech job. So I built a portfolio. And then things worked out.

I made the necessary investments, and often little more. Do keep in mind that this paragraph does not detail the years of blood, sweat, and tears that go into something like this. A five hundred character "delta system" does not guarantee success, it merely organizes the effort that goes into being successful.

My final note on investing in yourself is just this: Just a GΓΆdel showed a mathematical system cannot prove itself1, an imperfect character cannot perfectly determine his own character. You need to surround yourself with positive, supportive people, and carefully learn from them.

Invest: In your skills

Now that we've made it past personal philosophy, we can talk about more practical matters. As a UW student, you'll need to compete against your fellow students for coveted positions. This means you need to good at what you do. For CS, in particular, there are a few areas to get good at:

  1. You should understand common algorithms data structures, and internalize their concepts.
  2. You should be proficient in at least one industry language (One of C/Python/Java/JS2 is fine), along with proficiency with industry tech.
  3. You should understand that getting an interview, doing an interview, and working a full-time job are three separate skills.

So, let's wade deeper.

Algorithms & Data Structures

If you're in CS, you really don't have a choice here. A large part of your education in CS will be to learn about algorithms against data structures, and measuring complexity. Do your best to gain an intuitive understanding of why these things actually work.

I was blessed to have a curious little brother, so a natural framework I use is to distill these things into something a child could understand. So, for example, why does a (balanced) binary search tree have a O(log n) search time? It's not immediately clear how log fits into the picture as most people usually study logarithms in a calculus context. If you're educated in this area, do try to come up with a good, simple explanation. I would probably say something like: Well, log can be used to measure how many times we can halve something3, and if you follow an example, you'll notice that we cut the tree in half each time as we search, until we reach the bottom. (with lots more elaborations)

If you're not in CS, pick these things up. There are a dizzying about of material elsewhere about this, so my recommendation is to pick a popular introduction to CS and finish it. Then work your way up. Then this stuff will apply better.

So, my advice is:

  1. Study and be able to reproduce common algorithms/data structures.
    1. E.g. Can you sort a list of integers? What's an appropriate way to do it? Merge sort, radix sort, heap sort, quicksort? (hint: almost always quicksort)
    2. E.g. What's a linked list? What's a stack? What's a tree? What's a graph? What's a hash table? How can you use these primitives so model problems? Which ones would you choose?
    3. E.g. Here's some spaghetti code, what's the time complexity of it?
  2. Understand, and recognize, the importance of converting data structures into other ones. (you most certainly do this all the time).
    1. Why do we convert data structures into one another? What's the advantage of converting a list of words into a radix tree? When is it worth it?
  3. Study the memory requirements of data structures/algorithms. This is a science after all, so practical applications are sometimes necessary.

Proficiency in at Least One Industry Language & Industry Tech

This actually pretty easy if you're curious about this stuff. Industry languages tend to be the most popular languages, so you've already picked one up. Proficiency is only gained through practice4, so keep at making programs. Once you've gained enough experience, the world is your limit. There are few other professions where the cost to entry is so low. Don't like something? Screw it, make your own.

Similarly, industry tech tends to be popular ones. I do recommend using Linux or at least some Unix environment. UW's CS club fee is like $2/term, and you get access to a bunch of Linux boxes. Even better, install Linux and use it as your daily driver. You'll find some rough edges you'll need to program around. Besides, it's vital to be comfortable with the command line. It's by far the most productive environment, especially when you hit a wall in your GUI editor (plug: emacs). You will probably want to understand how to traverse the file system, copy files, and pipe bash commands together.

Other bits of popular industry tech include docker, react, Django, etc. Look around online for popular frameworks and tools.

The Art of the Interview

As mentioned above, there's a reality-warping field in the tech industry which has divorced the industry into the holy trinity of the Resume, the Interview, and then the actual Work.

There's an immense amount of advice on the internet about all three of these, but here's what personal experience has taught me:

Resume:

  1. It must stand out visually but must be familiar.
  2. Be as concise as humanly possible, and convey what you accomplished.
  3. Use active voice. If can stick "by zombies" at the end of the sentence, it won't feel good to the recruiter.

Interview:

  1. Practice beforehand; map your skills and experience to the position.
  2. Get comfortable with getting stuff wrong. You won't always get the question the instant it's asked. They'll help you, and learn more about you.
  3. Communicate communicate communicate, if you need a second to think, tell them.

Actually working:

  1. Be yourself, but don't be a know-it-all.
  2. Ask questions. Fuck up. Learn. That's the point of co-op.
  3. Dress nicely and be friendly. If you're introverted like me, take the effort to get to know your co-workers.

Invest: In your templates

This section will probably yield the best time savings relative to the effort.

In my previous blog article I described the importance of having an optimized system for editing text, and this section will deal with having optimized systems for producing documents.

I've developed a few such templates, and I'll share three of them.

The Resume and Cover Letter

Before I get into this, I highly recommend using LaTeX or similar templating tools. The websites are nice and easy, but everyone uses it so you may not stand out. I keep my templates in source control and as its text, I can do dirty things to it with emacs.

This is probably my most complex template, as it contains several hundred lines of commented out sections that I add in or remove depending on the context. For example, my resume source has a bunch of these blocks:

%% \entry
%%     {2015}
%%     {REPORT ON TECHNOLOGY IN THE CLASSROOM (MSAC)}
%%     {}
%%     {I coordinated, managed, and shipped a report on technology in the classroom to the Ontario Minister of Education. \\
%%     Key Skills: \textit{Effective Communication, Problem Solving, Analytical thinking}}

Depending on the job or position, I toggle these sections to better fit my resume to the job. I then compile it and submit it.

Of course, a template also has to have variables. This is pretty easy in LaTeX. In my cover letter, you'll find a bunch of things like:

\newcommand{\positionTitle}{Developer}
\newcommand{\recruiterName}{Hiring Team}
\newcommand{\companyName}{COMPANY}
\newcommand{\sppp}{\hspace{20pt}}

And use it like this:

Regarding the \positionTitle{} position currently advertised on

Obviously, for more important applications, I add new handwritten sections and save them for future use.

Assignments

This one is pretty easy if you're using org-mode. I usually just copy/paste the header of a previous assignment, which looks like:

#+TITLE: CO487 - A2
#+AUTHOR: David Briggs (09876432)
#+EMAIL: [email protected]

#+OPTIONS: toc:nil num:0
#+LATEX_HEADER: \usepackage{mathtools}

\setlength{\parindent}{0em}
\newcommand{\eqdef}{\vcentcolon=}

Otherwise, you'll probably need to use LaTeX for your CS assignments. Develop a common enough document and reuse it later.

Work Term Report

If you're not familiar, at UW, you need to write a report on something relating to your work term. The grade weighting is all messed up, and the format is weighted as much as the content itself. So, if you have a perfectly formatted template, you can write a few thousand words of hot garbage and pass the assignment.

So, I built an extensive, nearly perfectly formatted template with LaTeX and org-mode. Nearly everything is automated. So when I need to write one, I fill in the variable section (see below), and then spew bullshit.

\newcommand{\theTitle}{Neato Documentation and Tutorial}
\newcommand{\bossCalling}{Mr.}
\newcommand{\bossFirst}{Foo}
\newcommand{\bossLast}{Bar}
\newcommand{\companyCity}{Waterloo}
\newcommand{\companyCityProvince}{\companyCity{}, Ontario}
\newcommand{\companyName}{SomethingInUW}
\newcommand{\companyPostalCode}{N2L 0Z0}
\newcommand{\companyProvinceShort}{ON}
\newcommand{\companyStreet}{Something Dr}
\newcommand{\currentProgram}{Computer Science}
% … snip (more variables/latex) … %

And that's it. Develop one of these templates in your 1B coop and it'll pay dividends for years to come.

Conclusion

All said UW is like any university, plus the constant job-finding part. It's a lot of work, but if you make good investments, it's way easier.

David Briggs

... Continue Reading(current)

Spacemacs for Fun and Profit

I've been using Spacemacs for a few years, and emacs for a few years before that. If you're not familiar with Spacemacs, it's an emacs/vim hybrid thing with a functional setup system. You can use it to edit text, code, email, play tetris, use M-x doctor as a cheap therapist, and uh, blog? This article is going to be about how I got into Spacemacs, and the best parts of it.

Throughout this article, you're going to see some key sequences. So when you see M-x foo that means you've pressed the alt key, hit the x key, and then typed foo. Or if you see SPACE+b+b, you've the space bar, then the b key, and then the b key again. This key sequence stuff is one the few things in common between vim and emacs.

Getting My Feet Wet

I've been programming for around ~7 years now, and have been using emacs for like 6 years. It's really like home for me. I used a really crappy laptop (bad at the time even), so GUI programming wasn't a question. So I started out building things like calculators, zork style games, and other text based things. My timeline would then be:

  1. Create god-awful scripts & games using Python's IDLE on Windows. All text of course.
  2. Experiment with Linux, finding out IDLE doesn't come by default, and not understanding you can just install it
  3. Trying out vim and noping out (how do I exit?)
  4. Use emacs with cua-mode, which lets you use windows keybindings for copy/paste (ctrl+c/ctrl+v). Feel like a HackerMan.
  5. Pickup clojure as hey, lisp and emacs are like PB&J. (paredit is something special).
  6. Develop repetitive motion injury (emacs pinky) developing my first real project.
  7. Shop around for an alternative, and read about vim's modal paradigm. How it prevents your hands from moving too much.
  8. Get frustrated with evil-mode (vim keybinding for emacs), discover spacemacs.
  9. Spend the next year flipping between emacs mode and vim mode. Learn enough vim to be dangerous.
  10. Learn and tinker with spacemacs ad-infinitum.

What's So Special About Spacemacs?

The biggest thing that comes to mind is uniformity.

I literally don't need to think, it's all in my hands. Everything just sort of works the same or similarly1. For example, I had to recently learn scala for my latest job, and the Spacemacs scala layer (functionality related to something) works pretty much the same as all of the other layers. I just add the scala layer to my .spacemacs file, carefully read the documentation, and I'm up to speed. All of the keybindings and behaviors I expect are there. It's a beautiful thing.

The second biggest thing, is that spacemacs just works. I mean that everything mostly works out of the box. If you've been in this scene for years, that's just a treat. Take for example, the footnotes at the bottom of this blog article. Those didn't exist in version 303f29b of my site. To implement that, I literally just:

  1. SPACE+b+b, type blo, this fuzzy finds blog.rs, hit enter, this opens the file, and I add the footnote field to the OrgModeHtml field.
  2. / to quickly search though blog.rs for slug (the closest field to footnote), and add the parsing logic.
  3. SPACE+p+f to open a fuzzy searcher for all project files, and add the footnote section to blog_article.tera.html.
  4. cargo test to make sure I didn't break parsing somehow.
  5. Spend another few minutes coding after you realize footnotes aren't nested, as you need a Option<Vec<String>> to collect footnotes.

This whole process took less than six minutes. You're following an optimized2 text editing paradigm. You've spent years slowly memorizing a myriad of obscure keybindings3, so you get quick at making changes.

This is mostly thanks to vim's modal text editing model, and Spacemacs fantastic keybinding system.

The Modal Model of Editing

So if you're already familiar with this concept, this section won't do you much. Otherwise, lets talk about it with some examples.

In most IDE's and text editors, you probably use the mouse for at least some things. If you're just starting out, you'd probably use the mouse quite a lot (I did anyway, that's what you're used to).

The central idea of advanced text editing techniques is that using the mouse is slow and (for me) bad for your wrists. So, ditch it. Try not to use it.

Ok, so now we can't use the mouse. How do we move around in a file? We have, uh arrow keys?

No, we literally learn another language to move around in and edit a file.

In particular, vim's syntax is more-or-less <repeat> verb modifier object. You more-or-less program your editor to program things for you. Then verbs are things like "move" or "change", modifiers are kinda complicated ("in", "find"), and objects "words", "blocks", "matching". The <repeat> is a cheap way of running the same sentence over again <repeat> times. It's just a number.

Furthermore, it's called "Modal" because there's modes. In vim, you start in normal mode. This is where you can use most of the features of vim. There's also insert mode, where vim functions like most editors (i.e. you can actually type text), and then visual and visual-block mode. As you don't have a mouse, you sometimes want to select a big chunk of text. You can visually do that with visual or visual-block mode. Or you can use more complex vim sentences πŸ˜…. There's a lot of choice in this stuff.

If you're not familiar with the topic already, this may seem complicated. But it's scary just like ΠŸΡ€ΠΈΠ²Π΅Ρ‚ ("hello") looked scary before I took Russian 101. Lets do some quick examples. As there's no mouse, you'll have to pay attention to the cursor (denoted |).

Deleting some words

Say I have a file with some words in it, and I don't like those words:

moist responsibility yeet

Lets delete them. So if the cursor is at the front:

|moist responsibility yeet

What you want to do is Delete Word, or in vim terms dw

|moist responsibility yeet
--> type 'dw', this document becomes
|responsibility yeet

Hell yeah, moist is gone. We can just repeat this two more times and we're out of hated words:

|responsibility yeet
--> type 'dw', this document becomes
|yeet
--> type 'dw', this document is now empty
|

We did it! But that was actually inefficient. We forget about <repeat>. We could have just deleted words three times!

|moist responsibility yeet
--> type '3dw', this document is now empty 
|

That was fast, but not fast enough. If we notice that all of the words are on the same line, we could have used the "End of Line" object (in vim its $)

|moist responsibility yeet
--> type 'd$', this document is now empty 
|

…and that's not even fast enough. If you've noticed that all of the vim terms we've used are lower-case, a cookie for you. There's generally an uppercase version of everything. For example, D is usually defined to be d$, so we can do this with once key!

|moist responsibility yeet
--> type 'D', this document is now empty 
|

Visually manipulating stuff

So, as we saw above, there's a bunch of ways to delete text. There's also some good ways to select text.

Say, for example, you have the rust struct below you'd like to copy to another file. Again, we'll use the pipe character | to denote the cursor.

// file: foo.rs
  |struct FooBar {
    baz: String,
    jaz: u32,
  }
// file: bar.rs

We have a couple of options. The easiest is probably Visually Selection (v), Move down a few lines(j a few times), and Copy (y,"yank"), then goto bar.rs (SPACE+f+f bar.rs), and paste (p). Phew, lets see it in action:

// file: foo.rs
  |struct FooBar {
     baz: String,
     jaz: u32,
   }
//-> press v, j j j (the |> means this line is visually selected)
  |>struct FooBar {
  |>  baz: String,
  |>  jaz: u32,
  |>}
//-> press y to copy it
  |struct FooBar {
    baz: String,
    jaz: u32,
   }
//-> SPACE+f+f bar.rs, enter, and paste it (p)
// file: bar.rs
  |struct FooBar {
    baz: String,
    jaz: u32,
   }

We did it! It made it. This probably takes me a few seconds, where the longest delay is opening bar.rs. As before, we could have used 3j to move three lines down, or notice that FooBar is at the end of the file, so G ("goto bottom") would move us to the bottom of the file to select it.

As making these diagrams is distracting me from writing about spacemacs, I'll just do one more quick example.

Visual-Block mode to quickly edit parallel text

Returning to our previous example, plus complications, lets quickly add pub to each struct field. We could just use h/j/k/l to move around (like arrow keys), or mash w, but lets look at a more elegant method.

As baz and jaz, and the rest are parallel, we can use Visual-Block mode to add pub before each of them. If you've heard of multiple cursors (seperate concept), this is kind of like having multiple cursors. We'll first position our cursor at baz, then enter Visual-Block mode, then select all the rows rows, and then insert "pub " (that's a space at the end). Lets see it:

// file: foo.rs
 |struct FooBar {
    baz: String,
    jaz: u32,
    kaz: u32,
    haz: u32,
  }
//-> Lets move the cursor to baz, j w
  struct FooBar {
   |baz: String,
    jaz: u32,
    kaz: u32,
    haz: u32,
  }
//-> Enter visual select mode, CTRL+V, move down to select rows j j j
  struct FooBar {
   |baz: String,
   |jaz: u32,
   |kaz: u32,
   |haz: u32,
  }
//-> Insert text, I (capital i), type pub(space)
  struct FooBar {
   |pub baz: String,
   |jaz: u32,
   |kaz: u32,
   |haz: u32,
  }
//-> Press escape to enter normal mode, vim will repeat each action
  struct FooBar {
   |pub baz: String,
    pub jaz: u32,
    pub kaz: u32,
    pub haz: u32,
  }

Nice!

More Nice Things About Spacemacs

Aside from being emacs, offering a solid vim experience, Spacemacs has plenty for the average developer.

  1. Its layer system makes it very easy to add further languages and other random bits of functionality
  2. Its consistent and discoverable key sequences. Seriously, hit SPACE read all the crazy stuff you can do.
  3. Its emacs, and therefore, literally infinitely customizable. I recently spent a day getting Firefox to work in a buffer just so I don't have to leave emacs
  4. If the programming language exists, it's got an emacs mode (roughly translate to a spacemacs layer).
  5. Spacemacs tries it's best to make vim work everywhere in emacs.

What's crazy is that this stuff 1 - works 2 - works together. That section above on Visual-Block mode also works in the Emacs file browser (generally dired). Need to prefix 45 test files with "test_" (so that foo.py becomes test_foo.py)? It's the same stuff, over again. Enter the dired edit mode, enter visual block mode, select the start of every file (hit G), add test_, and hit escape. you just renamed 45 files in like 5 seconds. No need for bash scripts like:

for i in $(ls); do
  mv $i test_$i
done

All you really need is emacs.

Extra Reasons to Love Emacs

Org-Mode

org-mode is an emacs mode for the markdown format org. It's quite amazing.

Literally anything you need to do can be done in org-mode

  1. Create blog articles (you're reading an org-mode document right now, rendered on my website)
  2. Create large, complicated design documents. Oh, you need to export to confluence? There's an export for that org-export-as-confluence
  3. Write all of your assignments in Latex? Organize them in a nice tree structure with org-mode, inline latex where needed, and export as a beautiful pdf.
  4. Need to do some credit card calculations? There's spreadsheet functionality built into org's tables.
  5. Need a calendar? org-calendar
  6. Need an agenda? org-agenda
  7. Need a quick, per-project to-do list, that integrates with your agenda + calendar? org-agenda-todo
  8. Need a simple, github render-able markdown format? org-mode.
  9. Need a nice, simple presentation? org-present

The list goes on and on. org-mode and .org is an example of a highly productive format. Nothing I've encountered comes close to the level of integration and quality that org-mode offers.

You can add org-mode to spacemacs by adding the org layer.

Magit

Magit is probably the best git porcelain to date. It's a lightweight way to manage your git repos. It offers everything one could need:

  1. A nice way to preview diffs, stage hunks, and commit them (no more git add . and have a messy history)
  2. A nice way to deal with merge conflicts, with an emacs-integrated side-by-side viewer.
  3. Zero thought pushing/pulling/branching/etc. It's just a SPACE+g+s p p to push your local commits.
  4. Need to see what commit a particular line was edited in? SPACE+g+b to inline git blame

Seriously, if you use emacs, or are planning to, make this a must have package.

You can add magit to spacemacs by adding the git layer.

More Advanced Text Editing techniques / General Cool Things

Expanding on the list above, here's some useful techiques I use frequently:

vim-surround

As a software developer, you're often surrounding stuff with brackets or quotes or something.

Spacemacs comes with vim-surround emulation out of the box. The grammar is something like <verb>s <repreat> <modifier> <object> <surrouding-object>" The most useful verbs are y, d, and c. y lets you surround something, d lets you delete something, and c changes the surrounding thing.

So the sequence ysiw) would say "Surround (in) this word with ()". Eg. Hello -> (Hello).

d is the opposite, so ds) would delete the parentheses. Eg. (Hello) -> Hello.

c changes the surrounding thing, cs)] would change () to []. Eg. (Hello) -> [Hello]

Macros

Vim/Emacs macros are a life saver. This is very similar to an Excel macro, if you've used one.

You start recording a macro with the grammar q<reg>, where reg is a vim register (I'll get to that). Most people use qq (store macro in the q reg). You record some actions, then store in that register. You can call it again later with the grammar @<reg>, which is usually @q.

For example, you can quickly convert a json object to a list of variables with a macro. Just record transforming one, and replay it on the rest:

{
   |"foo": "bar"
   "baz": "yaz"
   "haz": "bab"
}
//-> qq ds" v$ s/:/ =/ j 0 w q
//-> 2@q
//-> This will transform into
{
   foo = "bar"
   baz = "yaz"
   haz = "bab"
}
//-> And of course you can use ds{ to remove the outer curly braces.

All said, macros are the perfect way to trade physical effort for mental effort. This is especially important if you're prone to RMI (like me).

Registers

Like we say with macros, we can record stuff to registers. This are just temporary ways to store text. You can prefix a command with "<reg> <rest of the sentence>" and it will record into that register.

Vim automatically places yanked text (basically any text deleted or explicity copied) into registers "0 through "9.

So if you want to paste something you had deleted a two operations ago, it'll probably be in the "1 register.

I personally don't use registers much after I discovered Spacemacs had a paste transient mode, where I could just CTRL-j through my history.

Marks

Sometimes you're editing a super large file, and you need to keep coming back to the same area. You could guesstimate where it'll be by jumping to its line 415gg, or you could search for it.

A better, more stable way is with marks. You mark a location with m<reg>, edit somewhere else, and jump back with `<reg>. I mostly use a as my first register, then b, and so on. Lowercase registers are specific to a particular file, and uppercase registers are global. So if I have a spot I keep jumping into in a file, I just use mA, and then jump to it when I need to with `A.

Like before, vim automatically populates the anonymous register ` with the last cursor location. So if you jump to the end of a file with G, jump back with ``.

Similarly, CTRL-o will undo your navigation history. So you jump three files deep finding something, use CTRL-o to jump backward through your navigation history.

Undo-Tree

Have you ever CTRL+z'd a bunch of things, wrote some more stuff, and realized you needed something you undid?

Most editors have a linear undo-redo history. Not emacs with undo-tree (which Spacemacs comes with).

Just run undo-tree-visualize, and you'll have an entire tree of edits to jump through, where you can grab your changes, and then traverse back and paste it.

Projectile-*

Spacemacs comes with some baked in features to manage "Projects". A project is sort of loosely defined, but it's usually a source control repo (eg. a git repo).

Some useful features:

  • projectile-replace replace a particular identifier EVERYWHERE
  • projectile-find-file opens a fuzzy finder for every file in the project.

But yeah, all told: I'm a fan of Spacemacs, and I will probably be using it for years to come. Seriously, take the dive and try it out. I haven't even scratched the surface in this article.

... Continue Reading(current)

The joys of Rust, Tera, and CSS

I've been meaning to use the dpbriggs.ca domain for a while now, and I wanted to keep learning Rust, so this blog was born. I'm happy with the result, but by god, it was a journey. This article will detail the challenges and successes of making this website. For reference, it's built with:

  1. Rocket πŸš€ (Rust Web Framework)
  2. Tera (Fast templating language/engine, similar to Jinja2)
  3. Bootstrap (CSS framework)

And the steps taken, ordered by pain:

  1. Getting the CSS/templates looking right (~5 hours)
  2. Making the rust blog portions and testing them (~4 hours)
  3. Getting Google Cloud Platform/Cloudflare setup (~3 hours)
  4. Learning how Rocket, Bootstrap, and Tera works (~2 hours)
  5. Further modifying emacs to work better (~2 hours)
  6. Setting up my VM, and not fork bombing it (~1 hour)

So by my estimate, it took about ~17 hours of real work to get it built and deployed. If I had spent more time on step 4, the other steps may have gone faster 🀷.

So, let's get into making the website.

We'll cover the basics, then model the website in Rust, then make the Tera Templates + CSS, and then finally deploy it in production.

Getting Started

Rocket makes it pretty quick to get a website started. The simplest rocket project would involve:

0 - Making sure you're on a nightly toolchain:

➜  rustup toolchain install nightly
➜  rustup default nightly

1 - Making a new project

➜  cargo new my-rocket

2 - Adding rocket to my-rocket/Cargo.toml

[dependencies]
rocket = "0.4.0"

3 - Adding the rocket route & server launch in my-rocket/src/main.rs

#![feature(proc_macro_hygiene, decl_macro)]
#[macro_use]
extern crate rocket;

#[get("/")]
fn hello() -> &'static str {
    "Hello, world!"
}

fn main() {
    rocket::ignite().mount("/", routes![hello]).launch();
}

4 - Running the project (cargo run), and view localhost:8000 in a browser (or just curl it) (images aren't supported yet πŸ˜…)

➜ curl localhost:8000
Hello, world!%

(the % is just indicating there's no newline at the end)

Actually Making the Site

There's three pieces to talk about here: Rust/Rocket, Tera/Bootstrap, and Production.

  1. For Rust/Rocket, we'll need to figure out how to represent the website, and then wire things together.
  2. For Tera/Bootsrap, we'll need to figure out to make and style the content.
  3. For Production, we'll need to figure how to get certs for HTTPS, and how to serve our content.

If you just want the code, you can take it from the first commit of this website.

Representing the website with Rocket/Rust

As this website is mostly static, this part is pretty easy. There's six easy pieces:

  1. blog.rs to parse our org HTML files.
  2. context.rs to store all things context (eg. site name, my email, etc).
  3. routes.rs to map URLs to functions.
  4. server.rs to setup rocket & present it to main.rs.
  5. main.rs to be an entrypoint, and start the Rocket server.
  6. tests.rs to test our site and make sure its working right.

The code should be easy enough to understand, but lets look at the most important bit is SiteContext:

/// SiteContext represents the entire context required to render
/// this website. See [get_base_context](crate::context::get_base_context)
#[derive(Serialize, Debug)]
pub struct SiteContext<'a> {
    /// base is the static key-value context of the website.
    /// All of the information in base comes from
    /// [STATIC_SITE_CONTEXT_KV](crate::context::STATIC_SITE_CONTEXT_KV)
    pub base: &'static SiteContextKv,
    /// kv is the dynamic key-value context of the website.
    pub kv: SiteContextKv,
    /// blog is all blog related items, see [OrgBlog](crate::context::OrgBlog)
    pub blog: &'static OrgBlog,
    /// curr_blog is the current blog article, if applicable.
    pub curr_blog: Option<&'a OrgModeHtml>,
}

An instance of this struct is passed to every page. The page then uses the constituent parts to render different things. For example, the links in the navbar at the top use base to set the hrefs, and actually link you to other parts of the site.

Most routes then look like this:

#[get("/blog")]
fn blog_index() -> Template {
    let context = get_base_context("/blog");
    Template::render(get_template("/blog"), context)
}

We tell rocket to call this function when it sees /blog. The function then gets the base context for /blog, and renders the /blog template.

Note that special care was taken to move most of the David Briggs specific stuff to context.rs. So if you want to fork this site, you can do more or less edit that (and the templates).

As there the org-mode parsing libraries weren't quite what I wanted, I ended up parsing the exported org html. I just grab the major sections and stick them into OrgModeHtml. Eventually I'll either make my own parser or pray someone else does.

Templating With Tera & Bootstrap

Tera is a django style templating engine for HTML. It lets us conditionally render our HTML (among other things). This can let us do things like underline blog in the navbar above. There's a variable which tracks the URL path above and navbar uses it to underline the relevant section.

Tera has a bunch of features, but the key ones are:

  1. Extending templates to add/adapt content generically (think: sidebar & main content)
  2. Including templates for always present content (think: HTML head & navbar)

Currently, the root document in this website is:

<!doctype html>
<html>
    <head>
        {% include "head" %}
    </head>
    <body>
        <div>
        </div>
            {% include "navbar" %}

            {% block content %}
            {% endblock content %}

            {% include "scripts" %}
        </div>
    </body>
</html>

We see that head, navbar, and scripts are always present. This makes sense for this mostly static website.

The block content is more interesting. It doesn't actually do anything here, but we can extend base.html.tera and define the block in other files to add content.

So lets extend it, by making the skeleton for this blog article:

{% extends "base" %}

{% block content %}
<div class="container-fluid blog-font">
    <div class="row">
        <nav class="col-md-3 … whole bunch o css …">
            <div class="sidebar-sticky monospace">
                <ul class="nav flex-column">
                    <div class="… whole bunch o css …">
                        {% block blog_sidebar_title %}
                        {% endblock blog_sidebar_title %}
                    </div>
                    {% block blogsidebar %}
                    {% endblock blogsidebar %}
                </ul>
            </div>
        </nav>

        <main role="main" class="col-md-9 ml-sm-auto px-4">
            <div class="… whole hunch o css …">
                {% block blog_title %}
                {% endblock blog_title %}
            </div>
            <div class="float-left blog-article">
                {% block blogcontent %}
                {% endblock blogcontent %}
            </div>
        </main>
    </div>
</div>
{% endblock content %}

Now, you may be thinking thats a whole lot of spaghetti David, and you're right, but lets read through this.

First, we extend base, which lets Tera know which file to use when rendering the HTML.

Next, we redefine {% block content %} so Tera can copy/paste the stuff in it into base.html.tera. We use bootstrap here to define two vertical sections (see col-md-2 and col-md-10). That's the sidebar to the left, and the blog you're currently reading on the right.

We also define some more blocks, which we'll finally extend to make the blog content:

{% extends "blog/blog_base" %}

<!-- -------------------- Title -------------------- -->

{% block blog_title %}
<h4 class="monospace">{{ curr_blog.title }} ({{ curr_blog.date }})</h4>
{% endblock blog_title %}

<!-- -------------------- Sidebar -------------------- -->

{% block blog_sidebar_title %}
<h6 class="monospace">Table of Contents </h6>
{% endblock blog_sidebar_title %}

{% block blogsidebar %}

<ul class="nav flex-column">
    {{ curr_blog.toc | safe }}
</ul>

{% endblock blogsidebar %}

<!-- -------------------- Content -------------------- -->

{% block blogcontent %}

<div class="container bordered">
    {{ curr_blog.html | safe }}
</div>

{% endblock blogcontent %}

Phew, we've made it. This is the stuff that renders what you're currently reading.

As before, we are extending another file (blog/blog_base) and filling in blocks.

The {{ blog.xyz }} bits are variable expansion. Rocket passes a struct containing the information for this blog (we saw this above), and we insert it into the document. The {{ xyz | safe }} tells Tera not to escape the HTML given.

For example, notice how we've filled in blogcontent with { curr_blog.html | safe }}. That's me, you're reading. That's the org-mode HTML main content.

But that's enough on this topic, lets get this thing in production!

Production

This part is actually pretty easy. I use Caddy to serve the content as it makes getting certs ridiculously easy.

The entire production configuration is just:

dpbriggs.ca www.dpbriggs.ca

gzip

proxy / localhost:8000

tls {
    dns cloudflare
}

The important bits are the first line (dpbriggs.ca www.dpbriggs.ca) and the proxy line.

Caddy uses the domain along with Lets Encrypt to get certs for HTTPS. It then proxies Rocket, forwarding all request to localhost:8000.

On the VM I just have two tmux sessions, one which holds Caddy and the other holds Rocket.

Then the process to deploy the site is:

  1. cargo test and git push
  2. git pull to get the latest stuff
  3. cargo build --release
  4. tmux attach -t 0 and ./run_site.sh
  5. tmux attach -t 5 and ./run_caddy.sh

I tried at one point to automatically update the website, but the hot-reloading process more-or-less fork bombed my server. What would happen is the hot-reload script would git pull my website, pkill -USR1 caddy to reload it. But the git stuff takes time and happens in a subprocess, so this would end up spawning many, many processes. I actually had to run sudo kill -9 … in a loop to kill them.

I then ran my website through Googles Page Speed Tool and got a lower score, so I setup CloudFlare. The process was surprising easy, and ended up saving me money by not using Googles Cloud DNS. I'm now at 96/100, which is good enough for now.

And we're deployed. That's it β„’

Thanks for reading,

David Briggs ([email protected])

... Continue Reading(current)