Defining Transformation
how to define and run transformation
Transformation Parameters (or transform_params) is the way you define a set of transformations (LLM or otherwise) that you want to do on your data.
transform_params
Overview
transform_params
OverviewThe transform_params
object contains configuration settings for performing data transformations using our AI models. Below are the detailed parameters included in transform_params
:
model
model
- Type:
string
- Description: Specifies the type of LLMs engine to use for the transformation. Options include
"trellis-premium"
,"trellis-vertix"
andtrellis-scale
, each indicating a different level of speed and accuracy. We recommendtrellis-premium
.
mode
- Type:
string
- Description: Method of processing the data. The default here should be
document
. You can also usetable
if you're parsing tables.
operations
operations
- Type:
list of operation
- Description: Each operation is a data field to extract from your data. Each operation is detailed by information about the target column, the data type of that column, the type of transformation to apply, and a description of the task.
Each object within operations
encompasses the following parameters:
a. column_name
column_name
- Type:
string
- Description: Names the column in the dataset on which the operation will be executed. It identifies the specific data point that will undergo transformation or extraction. Must be in snake case.
b. column_type
column_type
- Type:
string
- Description: Indicates the data type of the target column, adhering to all PostgreSQL data types as documented in the PostgreSQL documentation (https://www.postgresql.org/docs/current/datatype.html). Valid types include, but are not limited to,
text
for string data,text[]
for arrays of text,numeric
for numerical data, anddate
for date values.
c. transform_type
transform_type
- Type:
string
- Description: Describes the transformation or extraction method to be applied to the data in the target column. The term
"extraction"
suggests that the operation aims to retrieve specific pieces of data from the column. Other types includesclassification
andgeneration
.
d. task_description
task_description
- Type:
string
- Description: Provides a clear, human-readable explanation of what the operation seeks to achieve. Examples include extracting URLs from text data, where the description would outline the purpose of extracting such information.
Here're an example of a transformation_params
{
"model": "trellis-premium",
"mode": "document",
"operations":
[
{
"column_name": "names_list",
"column_type": "text[]",
"transform_type": "extraction",
"task_description": "List of person names mentioned in the email"
},
{
"column_name": "is_investor",
"column_type": "text",
"transform_type": "classification",
"task_description": "Is the email from an investor?",
"output_values": {"yes":"this email is from investor", "no":"this email is not from investor"}
},
]
}
When you're done with defining the transformation you can go to create transforms to kick-off the transformation run.
Updated 3 months ago