mnemosyne
A very simple HTTP-range supporting file server. Stream your file in, and stream your file out! Named after the goddess of memory and file systems.
Supports:
- Multiple directory mounts
- Rate-limiting downloads
- Permission-based uploads/updates/deletes
Contents
- Background
- Usage
- Updating files
- Deleting files
- Development
Background
A new format for cloud-optimized, multidimensional data (Zarr), and the potential need for SAEON to host (or at least test hosting) cloud-optimized-geotiffs (COGs) makes this quick project worthwhile.
Usage
Mnemosyne is a relatively simple file server - GET
HTTP requests for viewing files, PUT
HTTP requests for uploading files, POST
HTTP requests for upserting files, and DELETE
HTTP requests for deleting files and (empty) directories. Start the application with the --key
argument (and at least one --login
and --permission
) to enable uploads. Otherwise uploads are disabled by default, the idea being that it’s straightforward to share any directory on a server via HTTP Range requests.
Turn your current directory into a COG-sharing HTTP Range server with a single command!
npx @saeon/mnemosyne -v ./
# Or throttle downloads per file to 2MB/s
npx @saeon/mnemosyne \
-v ./ \
-t 2097152
Things to note:
- Folders that contain
index.html
files wil be served as websites. - CORS is enabled (
*
), so the file server can be perused automatically via client-side JavaScript - Default GET retrieval options can be overridden via URL params (see below)
- When mounting multiple directories, you cannot have duplicate top-level folders, but you can have duplicate top level files
- Running Mnemosyne without specifying a volume results in the application creating a temporary volume on your file system. Mnemosyne will NOT try to create volumes that are mounted but don’t exist on the file system
Viewing/retrieving files
Tools that understand cloud-optimized formats (i.e. COGs, Zarrs, etc.) should work flawlessly when pointed to files hosted on this server. Otherwise, some examples of explicit HTTP requests:
Download entire file via cURL
curl \
-X GET \
--keepalive-time 1200 \
http://localhost:3000/filename.tif
Download partial file via cURL
curl \
-H "Range: bytes=12-20" \
-X GET \
http://localhost:3000/filename.tif
Customize GET requests via URL params
By default a GET request will serve (in order of preference):
- A file if specified
- An
index.html
file (served as a website) if present - The directory listing
You can override this logic via specifying URL params:
?noindex
: For directories with anindex.html
file, this will serve the directory listing instead of a website?json
: This will return a JSON representation of a directory listing (useful in browser environments, but rather use theAccept: Application/json
header for API calls)
In the case where there are duplicate filenames (only possible at the root), use the param ?v=N
to specify which mounted volume you are referring to (where N is an integer starting at 0).
NOTE - all files are publicly available
Customize GET requests via HTTP headers
You can request a JSON representation of a directory listing by specifying Accept: Application/json
in the HTTP headers
Serving websites
Any folder that includes an index.html
file will be served as a website. One potential use-case of this is to provide a branded / themed landing page for a particular directory. A folder that includes an index.html
file will not show the directory listings, but those listings are still accessible as direct links or via a JavaScript request. For example, to request the directory listing as JSON
in JavaScript, and then append the result to the DOM:
// Client side JavaScript
fetch('http://localhost:3000/directory?json')
.then(res => res.json())
.then(json => {
document.getElementsByTagName('body')[0].append(JSON.stringify(json))
})
You can serve the website on a custom domain via registering a CNAME record and then configuring URL-rewrites to the desired folder. Currently this has to be done by manually adjusting Nginx configuration - ask me to make this user-configurable!!
Upload/update mode
Start the application with a --key
, a --login
, and a --permission
to enable uploading, updating and deleting files. The server will log an appropriate key to use for upload/update mode when started in read-only mode.
For example:
npx @saeon/mnemosyne \
--key <256-bit (32-byte) cryptographic key for use with the AES-256-CBC encryption algorithm.>
--volume /some/directory \
--login username \
--permission username:/some/directory
All files/folder in the exposed volume (/some/directory
in this case) will be served, and the login username
will have permission to write/edit some/directory
. Permissions need to be granted explicitly.
To upload files to the server either add a directory/file to the exposed volume, or upload via the HTTP PUT
API endpoint. Directories will be created to match the specified resource URL if they do not exist already.
cURL examples (uploads)
Here are some examples using cURL
(and some notes) using the default authentication token on localhost. (Look at server logs for authentication tokens):
- For
PUT
requests I find it’s necessary to pipe the output tocat
, sincecURL
won’t print output to stdout (who knows why) when usingPOST
andPUT
requests - The
-T
header means ‘transfer file’. Use this instead of--data-binary
, since the latter will try to load your entire file into memory before sending - The
--keepalive-time
header is necessary when you are uploading large files. By defaultcURL
will only use 60 seconds as the default time to keep a connection open, which is not long enough to upload large files - In some cases it may be helpful to use the
--limit-rate
flag. For example, if you have an incredibly fast internet connection and uploads are failing, try limiting the upload speed to 3MB/sec (--limit-rate 3m
) - Here is a helpful list of cURL flags
Upload file as binary
This loads the entire file into memory first, so should probably be avoided unless the file is small. I think this is faster than other options (for small files).
cat ./some/local/cog.tiff \
| curl \
--progress-bar \
--keepalive-time 1200 \
-X PUT \
-H "Authorization: Bearer fd1ddb014036c75f8f11532f330ab42d:403494a94ebf44728cd63e02a3f9c070" \
-H "Content-Type: application/octet-stream" \
--data-binary @- \
http://localhost:3000/some/deep/nested/directory/cog.tif \
| cat
Specify a file to upload
I think this will stream the contents of the file
curl \
--progress-bar \
--keepalive-time 1200 \
-X PUT \
-H "Authorization: Bearer fd1ddb014036c75f8f11532f330ab42d:403494a94ebf44728cd63e02a3f9c070" \
-T \
./some/local/cog.tiff \
http://localhost:3000/some/deep/nested/directory/cog.tif \
| cat
And then that file can be retrieved at http://localhost:3000/some/deep/nested/directory/cog.tif
.
Chunked uploads
This is useful for uploading large files via cURL as contents are never fully buffered in memory.
Stream from a file
I’m not sure if there is any benefit to this approach over specifying -T /path/to/file
as above.
cat ./some/local/cog.tiff \
| curl \
--progress-bar \
--keepalive-time 1200 \
-X PUT \
-H "Authorization: Bearer fd1ddb014036c75f8f11532f330ab42d:403494a94ebf44728cd63e02a3f9c070" \
-H "Content-Type: application/octet-stream" \
-T \
- \
http://localhost:3000/some/deep/nested/directory/cog.tif \
| cat
Stream from a file using mbuffer
This is definitely quite nifty as you get buffer information
mbuffer \
-i ./some/local/cog.tiff \
-r 2M \
| curl \
--progress-bar \
--keepalive-time 1200 \
-X PUT \
-H "Content-Type: application/octet-stream" \
-H "Authorization: Bearer fd1ddb014036c75f8f11532f330ab42d:403494a94ebf44728cd63e02a3f9c070" \
--data-binary @- \
http://localhost:3000/some/deep/nested/directory/cog.tif \
| cat
Uploading a directory recursively
For example, a Zarr directory
base_dir=$(pwd)/path/to/dir.zarr
find "$base_dir" -type f -exec sh -c '
f="$1"
base="$2"
url=http://localhost:3000/mnt1/dir.zarr/${f#$base/}
curl \
--progress-bar \
--keepalive-time 1200 \
-X PUT \
-H "Authorization: Bearer fd1ddb014036c75f8f11532f330ab42d:403494a94ebf44728cd63e02a3f9c070" \
--create-dirs \
-T "$f" \
$url' sh {} "$base_dir" \; | cat
PowerShell example (Windows)
The equivalent to the cURL
utility on Windows Platform is the Invoke-RestMethod
tool. This is an example on how to use it to upload a single file to Mnemosyne from the PowerShell terminal
cd /to/the/directory/with/your/file
$FILENAME = "some-file.tiff"
$headers = New-Object "System.Collections.Generic.Dictionary[[String],[String]]"
$headers.Add("Authorization", "Bearer fd1ddb014036c75f8f11532f330ab42d:403494a94ebf44728cd63e02a3f9c070")
Invoke-RestMethod `
-Uri "http://localhost:3000/some/nested/directory/$FILENAME" `
-Method Put `
-InFile "./$FILENAME" `
-Headers $headers `
-ContentType "application/octet-stream" `
-Verbose
Customizing the client
To customize the HTML client, override the /client
directory your own /client
directory, that contains an index.html
file and related static assets. In the case of using the Dockerfile provided in this repo, mount your website client to /mnemosyne/client
.
Updating files
Use the HTTP POST
method instead of the HTTP PUT
method in the examples above. Note that this is actually an upsert operation, where the target resource is either created if it doesn’t exist, or updated if it does (assuming correct permissions).
Deleting files
Use the HTTP DELETE method to delete an existing file or empty directory (trying to delete a non-empty directory will result in a 409
response). For example, to delete a file using cURL
:
curl \
--silent \
-X DELETE \
-H "Authorization: Bearer fd1ddb014036c75f8f11532f330ab42d:403494a94ebf44728cd63e02a3f9c070" \
http://localhost:3000/some/deep/nested/directory/cog.tif
Development
Install Node.js v20.3.1 Then setup the project:
# Clone the repository
git clone git@github.com:SAEON/mnemosyne.git
cd mnemosyne
# Install chomp CLI
npm install -g chomp
chomp --version # (wait for output)
# Install dependencies
npm install
# Start the app and write code!
chomp --watch
Refer to chompfile.toml to see the start command used for local development
Testing
There are some integration tests in the test/
folder, and some unit tests in the src/
folder. To run:
chomp test
chomp unit-tests
The unit tests are run on every save, but it may be helpful to run tests as a separate server. To do this, use the --watch
flag with chomp:
chomp test --watch
Deployment
Use a process manager such as pm2
to restart on failure. See below for Dockerized deployment
# Clone the repository
git clone git@github.com:SAEON/mnemosyne.git
cd mnemosyne
# Install dependencies from lockfile
npm ci --only=production
# Start the app
NODE_ENV=production \
TZ=UTC \
node \
src \
--key <256-bit (32-byte) cryptographic key for use with the AES-256-CBC encryption algorithm.>
--volume /path/to/directory
--volume /other/path/to/directory
--login some-user
--login some-user2
--login some-user@gmail.com
--permission some-user:/path/to/directory
# Look at the startup logs, and pass access tokens to the relevant users
Docker
Build a Docker image locally
docker build -t mnemosyne .
Run a containerized instance of the server
# default volume (cache directory), uploads disabled
docker run --rm -p 3000:3000 --name mnemosyne
# Mount a host volume, uploads disabled
docker run \
--rm \
--name mnemosyne \
-p 3000:3000 \
-v /some/host/directory:/mounted-directory \
mnemosyne \
--volume /mounted-directory
# Mount a host volume, uploads enabled
docker run \
--rm \
-p 3000:3000 \
--name mnemosyne \
-v /some/host/directory:/mnt1 \
-v /some/other/host/directory:/mnt2 \
mnemosyne \
--key <256-bit (32-byte) cryptographic key for use with the AES-256-CBC encryption algorithm.> \
--volume /mnt1 \
--volume /mnt2 \
--login user1 \
--login user2 \
--permission user1:/mnt1
--permission user1:/mnt2
--permission user2:/mnt2/some/sub/directory/only
Publishing
Publish as a public package to NPM
npm publish --access public