APIer

Grey Matter APIer is a real-time data transformation service which exposes a RESTful JSON API from a backing Excel or CSV file.

What is APIer

JSON has become a ubiquitous data-interchange format, offering excellent trade-offs in brevity, expressibility, and parse-ability. Although modern web APIs utilize JSON for these reasons, many traditional data source collection practices naturally gravitate towards data record formats such as Excel spreadsheets or CSV files. To address this disconnect, Grey Matter APIer operates as a real-time data-transformation microservice, converting tabular data into a JSON API for interoperability with modern web technologies.

APIer combines performance with robust error handling and automatic documentation generation. APIer has a wide range of configuration options, with minimal prerequisites, providing for simplicity and portability while offering your organization a unique and complimentary function within the Grey Matter Data Mesh Platform.

Technical Discussions

APIer is written in Python's FastAPI which gives it unique benefits of an auto-generated OpenAPI specification and input type-checking, among others. It's also lightweight and ephemeral because APIer keeps a minimal state. Upon startup, it parses through the provided environment variables. It attempts to request a remote spreadsheet and defaults to an exponential backoff procedure if the first request fails. Upon a successful request, it parses the spreadsheet and caches the result. This allows for rapid request response times, while still maintaining data freshness. The default cache refresh rate is set reasonably, but depending on your needs it can be configured with CACHE_RATE.

Sheets and Row-Column to Key-Value

APIer's row-column transformation scheme converts every row in a spreadsheet to a JSON object. Each row's columns become a new field in the JSON object, where the column header becomes the key name, and the row's column value becomes the key's value. APIer does not perform any sanitation or transformation on the values or the column names, other than converting "None" types to proper JSON.

APIer treats all spreadsheets as a potential collection of sheets. This aligns perfectly with Excel spreadsheets, and its open source equivalents, which can contain many sheets with potentially different internal structures. This abstraction still holds for CSV files, at the expense of being slightly contrived. As a result, CSV files have a fake "base" sheet which contains the entire spreadsheet. This has the upside of being more consistent, however, potentially enabling simpler programmatic interaction.

Environment variables by sheet

At this time, APIer does not support sheet-level environment variable customization. This is only a problem if the structure of two sheets in a spreadsheet is radically different. That is, Sheet 1 starts at Column N and Row A whereas Sheet 2 starts at Column M and Row B. There are a couple of workarounds:

  1. Standardize the spreadsheets. This is the best option, unless you lack complete control over the source, the sheets cannot be reconciled to a common format, or if the structure must remain to support other dependencies or institutional concerns.

  2. Set up multiple APIer instances for each sheet, or collection of similar sheets. This strategy works due to the SHEET_NAMES environment variable. If the first two sheets follow the same row-column structure and the last two sheets follow the same row-column structure you can set up one APIer with the first sheets configured together and another instance with the last two sheets configured together.

Filtering and Querying

The mechanism behind the q query parameter is a hardened and secured python eval function. It accepts any valid Python expression, assuming it doesn't violate the function's preconditions, for instance, calling anything found in builtins.

Read the full API docs for more details.

Configuration

Since all environment variables are strings, the Type field below hints to the expected format of the string. That is, an environment variable with an integer should be formatted like "1”. List[Type] types should be formatted as comma separated lists.

All variables that affect a sheet affect all sheets in the spreadsheet.

Environment Variables

Description

Type

Default

SOURCE_URL

The URL of the spreadsheet to load. This must be a direct download link.

String

Required

SOURCE_FORMAT

The file format type of the remote spreadsheet. This field is not case sensitive. It will be auto-populated if the file URL ends in: .xls, .xlsx, .xlsm, .xlsb, .odf, or .csv.

Enum[“EXCEL,“CSV”]

Required

SHEETS

This field is only valid for Excel spreadsheets. An Excel spreadsheet can contain many sheets. This variable allows for selecting a subset of those. Leave the variable empty to expose all sheets.

List[String]

None

SHEET_SKIP_ROWS

Used by APIer to skip over specific rows. Rows are 0-indexed.

List[Integer]

None

USE_COLUMNS

A list of comma separated numbers representing the columns from the spreadsheet to use. Columns are 0-indexed.

List[Integer]

None

SERVER_PORT

The port the server listens on.

Integer

8000

SERVER_HOST

The host that the server will bind to.

String

“0.0.0.0”

ROOT_PATH

Used by the documentation site to fetch resources from the server behind a proxy with a route prefix. Set this value if your application will not be exposed at the root URL of your domain. Ensure to include as trailing slash.

String

/

CACHE_RATE

The time in seconds until the cache should refresh. A value of 0 disables the cache.

Integer

600

DATASET_NAME

A descriptive name relating to the spreadsheet which will be injected into the OpenAPI specification.

String

None

DOCS_URL

The path where APIer will serve the docs from.

String

None