Skip to main content

Introduction

Private documents and data extracted from them can be managed via a REST API.

See below details of three different ways to access the API.

API access

The recommended method for accessing the Private Document API is via one of the SDKs. SDKs are available for Python and NodeJS / TypeScript.

Concepts

Scribe's Private Documents platform processes your documents. Each document uploaded to the platform corresponds to a task. Each task is processed independently.

After a task has been processed, an output model is available to download. This model contains structured data from the input document, in JSON format.

Tasks

The core collection of Private Documents is tasks. Each task corresponds to a single document uploaded to the platform, and an individual output model.

Example task data:

{
"jobid": "abcd1234",
"client": "Example Bank",
"clientFilename": "Portfolio Company 1 Dec-22.pdf",
"status": "SUCCESS",
"submitted": 1689002639,
"modelUrl": "https://document-store.s3.eu-west-2.amazonaws.com/path/to/model.json?Signed-Link-To-Fetch-File"
}

Note that some fields are not always included: in particular, modelUrl is never present before the task has been processed.

Every task has a jobid, which is a unique identifier. The jobid can be used to fetch or modify a particular task, eg. GET /tasks/abcd1234 or DELETE /tasks/abcd1243.

Tasks are processed asynchronously. You can track the progress of a task via the status field: status: "SUCCESS" indicates that the task has been successfully processed by Scribe.

When a task is first created, its status is "PENDING_UPLOAD". This indicates that a task has been created, but a document has not yet been uploaded corresponding to that task. Read more about task creation. If a document is not uploaded, a task in the PENDING_UPLOAD state will be automatically deleted after a period of time.

Models

When a task's status is "SUCCESS", an output model is available to download. Scribe creates one model per task.

The model can be downloaded via a signed URL, which is returned as part of the task data. Note that the signed URL is valid for a limited amount of time, so you should download the model immediately after fetching task data.

Deletion

Tasks can be deleted via the DELETE /tasks/{jobid} endpoint.

Deletion is irreversible: the original document, the output model and any other data derived from the original document are permanently deleted from the Scribe platform.

If a task is accidentally deleted, you will need to create a new task, upload the original document, and wait for processing.

Collation

For Fund Performance outputs, multiple models can be collated together. For example, you might want to export data for an entire portfolio, in a given quarter; or export data from multiple quarters relating to the same companies.

This is available via the GET /fund-portfolio endpoint.

The collated model has the same format as models corresponding to individual tasks.

Scribe platform

Private Documents can be managed via this API, and via Scribe's web platform. While the API provides more flexibility, the Scribe platform can be used without writing any code.

You can manage Private Documents using either or both: for example, data teams could upload documents via the web interface, and use the API to export outputs to a data lake after processing.