Hide
Scraps
RSS

Handling files in Nim

4th May 2018 - Guide , Nim , Programming

In a post over on Reddit someone noted that Nim doesn't really have any article or tutorial about file reading. Trying to prove them wrong led me to a half-answer over on Rosetta Code and a forum post from 2014 asking about examples on file handling. Since this is a rather simple topic I decided to write down some of the most common ways to handle files in Nim, partially in an attempt to make the tutorial coverage greater, but also as a way to talk about my binaryparse project.

The basics

For the most basic file handling you can use things built right into the system module (which doesn't require an explicit import). This also includes some convenience procedures to make the simple task of reading an entire file super easy:

let fileContent = readFile("helloworld.txt")
echo fileContent
writeFile("copy.txt", fileContent)

The above snippet will simply read a file, print out its content, and write it to a new file. This might be fine for small files, but for reading large files, or files without an end such as the standard input this won't work very well. For this you will need a proper reading loop. Again using the things in Nims systems module we can read the file line by line, in preset increments to fit a buffer. It is also possible to seek around in the file to set the read position explicitly:

# Set up input and output files fmRead is the default mode
# For a full list of file modes see:
# https://nim-lang.org/docs/system.html#FileMode
let input = open("helloworld.txt")
var output = open("copy.txt", fmWrite)
# Use the lines iterator to traverse the file
for line in input.lines:
  # And write the output line by line to a new file
  output.writeLine(line)
# Reset the input file to the start
input.setFilePos(0)
# Create a new 10 byte long string
var inputString = newString(10)
# Read 10 bytes into the string, starting at position 0
# readChars is just a wrapper around low-level C functions with some additional
# error checking. This means that it returns the number of bytes actually read,
# and this could be less than the bytes we wanted to read, hence the assert
doAssert 10 == input.readChars(inputString, 0, 10), "Unable to read 10 chars"
echo inputString
# Close the input file, any further access is not allowed
input.close()
# Set the output file back by 10 characters
output.setFilePos(output.getFileSize() - 10)
# Write out some various things
output.write(42)
output.write("Hello")
output.write(true)
# Close the output file
output.close()

All of this also works for reading and writing to and from the aforementioned standard input and standard outputs. Note that stdin isn't detected as closed until the program generating output to it closes it. This means that this program won't proceed further than the loop until the program does so. If you want to detect if the file passed to a procedure is stdin or not you can simply compare it to the globally available stdin variable:

proc detect(f: File) =
  if f == stdin:
    echo "stdin"
  else:
    echo "file"

detect(stdin)
let x = open("helloworld.txt")
detect(x)
x.close()

When writing to stdout it is also important to know about the flush procedure. Since the output can be buffered your data might be put into a buffer before being written to the output. This means that data can appear to not be written. By closing stdout what's left in the buffer will be output, but often you don't want to close this since you then can't do any echoing. So instead you can use the flush proc to force flush the buffer.

Using streams

In case you want to support not only file reading, but reading from pretty much any input data. Or if you need to write binary data that is structured in a certain way then you can use Nims streams module. The streams module is basically an interface similar to the standard files you've seen earlier but with a bit more capabilities. They introduce things like peek which does a read without moving the position, readers to read binary data for all kinds of types, along with supporting streams from all sorts of different inputs. So if you're writing a library or framework it's probably best to use streams, that way the user can choose which input source they want, including of course regular files.

Using binaryparse

However if you are reading or writing binary data I'd say that there's one option which might fit your needs even better than streams. At least if the format you're reading is properly laid out and documented. This is where my own module binaryparse comes in. The idea here is to leverage Nims metaprogramming capabilites to create a DSL for binary data which is translated on compile-time into procedures reading and writing to streams. I'm not going to repeat the documentation which can be found in the project itself (though for the moment not hosted anywhere), but rather explain how it works. Since binaryparse is not in the standard library you will need to install it using nimble using "nimble install binaryparse". After that's done you can define your binary format. The main macro in binaryparse is the "createParser" macro. It takes a name to give the parser, any extra parameters, and then a body of field specifiers. These field specifiers take an the form "[type]<size>: <name>[options]". The simplest format consists only of static length fields:

import binaryparse
import stream

createParser(simple):
  u8: eightBitUnsignedField
  u16: sixteenBitUnsignedField
  4: fourBitSignedField
  3: threeBitSignedField
  1: oneBitSignedField

This will read a format consisting of four fields. The first field is 8 bits unsigned, and is stored in the returned tuple as a uint8. Similarily the next field is 16 bits unsigned and stored in the tuple as a uint16. The next field is only four bits long, and signed, but Nim doesn't have a four bit data type so it's stored in a uint8. Same for the last two fields, only that they take 3 bits and 1 bit respectively.

But binaryparse can do much more than just simple bit-wise reading. For example if you want to read from one field and use that value later, or if you want to read until a special magic symbol is found, all that is possible:


import binaryparse
import stream

createParser(list, size: uint16):
  u8: _
  u8: data[size]

createParser(complexParser):
  u8: _ = 128
  u16: size
  4: data[size*2]
  s: str[]
  s: _ = "9xC\0"
  *list(size): inner
  u8: _ = 67

This example is a bit more complex. First off we have two parsers, instead of just a single one. In this example we only use the list parser once, but there's no problem in using it multiple times. We can also see that the list parser has a parameter "size", this is something we can pass in to read multiple unsigned 8-bit integers into the data field. So in summary the list parser takes in a size, reads the first byte without assigning it to a field in the tuple, then reads the passed amount of bytes into the data field.

The second parser is even more complex, it starts off my reading a magic byte that has to equal 128. This is used in byte-oriented formats, and especially file formats. Binaryparser also makes sure that this is validated when the data is read, and that it is written correctly when the data is written.

Carrying on the parser reads a size as we saw in the previous parser, but now it uses that size in the next step to read 4-bit integers into the data field. The size is in 8-bit bytes to we need to read twice as many 4-bit integers to fill our sequence, this is no problem as binaryparser allows all sorts of things inside the brackets, even procedure calls.

The next line reads a sequence off null-terminated strings. Notice how it doesn't have any count in it, but is followed by a magic sequence? This means that it will read until that magic sequence is found, in this case a null-terminated string "9xC", but it doesn't have to be the same type as the sequence to be read.

Following this is what's called a custom parser, this means that it will pass control over to the parser named list (and pass any parameters in brackets, in this case the size read earlier) and put it's result in the named field. This doesn't have to be a parser created by the binaryparse library as long as it matches the signature of a parser (to see what binaryparse outputs, use the "-d:binaryparseEcho" switch on compile-time). So if you find a scenario that can't be fully parsed by binaryparse you are free to implement extra logic in your own custom procedures.

The pattern then completes with reading a magic byte which must equal 67.

 

As we can see binaryparser is both powerful and flexible and can be used to read and write all kinds of binary data. It was an interesting project to work on, and I've used it myself with success to generate binary test data for another project.