Defining a Transformation
How to define and run a Transformation
Review: What is a transformation?
Trellis helps you turn your documents into data. A Trellis “transformation” is essentially a formula that describes what data you want to extract, and how you extract it. Once you define a transformation, you can start to upload documents to Trellis. The extracted data will fit the schema you define in the transformation.
How do you define a transformation?
Defining a transformation includes specifying three key parameters:
-
model
- which Trellis LLM engine do you want the transformation to use? -
mode
- will you upload freeformdocument
-type assets, or structuredtable
-type assets? -
operations
- what data do you want to extract, and how?
Throughout the Trellis API, the transform_params
object is the single source of truth for all three of these parameters. Defining and redefining transform_params
is equivalent to setting up and updating your transformation.
The transform_params
object
The transform_params
object contains configuration settings for performing data transformations using our AI models. Below are the detailed parameters included in transform_params
:
model
-
Type:
string
-
Description: Specifies the type of LLMs engine to use for the transformation. Options include
trellis-vertix
andtrellis-scale
, each indicating a different level of speed and accuracy. We recommendtrellis-vertix
.
mode
-
Type:
string
-
Description: Method of processing the data. The default here should be
document
. You can also usetable
if you’re parsing tables.
operations
-
Type:
list of operation
-
Description: Each operation is a data field to extract from your data. Each operation is detailed by information about the target column, the data type of that column, the type of transformation to apply, and a description of the task.
Defining operations
Each operation
object contains the following information:
-
Name of the field you wish to populate
-
Data type
-
Transformation type (extraction, classification, generation, etc.)
-
Task description (guidelines for the Trellis LLM to populate your data)
Important: Every transform_params
object requires a minimum of one operation
with the following parameters:
-
column_type = 'assets'
-
transform_type = 'parse'
Here are the specific fields required for each operation:
column_name
-
Type:
string
-
Description: Names the column in the dataset on which the operation will be executed. It identifies the specific data point that will undergo transformation or extraction. Must be in snake case.
column_type
-
Type:
string
-
Description: Indicates the data type of the target column, adhering to all PostgreSQL data types as documented in the PostgreSQL documentation (https://www.postgresql.org/docs/current/datatype.html). Valid types include, but are not limited to,
assets
for file data,text
for string data,text[]
for arrays of text,numeric
for numerical data, anddate
for date values.
Transformations: column_type
Reference our guide on column types for more information!
transform_type
-
Type:
string
-
Description: Describes the transformation or extraction method to be applied to the data in the target column. The term
"parse"
refers to the transformation ofassets
-type columns into parsed text data, ready for other columns to reference. The term"extraction"
suggests that the operation aims to retrieve specific pieces of data from the column. Other types includesclassification
andgeneration
.
Transformations: tranform_type
Reference our guide on transform types for more information!
task_description
-
Type:
string
-
Description: Provides a clear, human-readable explanation of what the operation seeks to achieve. Uses double curly braces to reference other columns at transformation runtime (e.g.
Extract the invoice amount from {{Invoice}}
). Use cases for references include culling data from parsedassets
, classifying extracted data and more.
Important: Excluding operations with transform_type
in ['parse', 'manual']
, all operations’ task description must reference at least one other operation. Reference is done in the format {{column_name}}
Task descriptions for operations with transform_type
in ['parse', 'manual']
should be populated with “N/A”.
Example transformation_params object
When you’re done with defining the transformation you can go to create transforms to kick-off the transformation run.