Defining Transformation

how to define and run transformation

Transformation Parameters (or transform_params) is the way you define a set of transformations (LLM or otherwise) that you want to do on your data.

transform_params Overview

The transform_params object contains configuration settings for performing data transformations using our AI models. Below are the detailed parameters included in transform_params:

model

  • Type: string
  • Description: Specifies the type of LLMs engine to use for the transformation. Options include "trellis-premium" , "trellis-vertix"and trellis-scale, each indicating a different level of speed and accuracy. We recommend trellis-premium.

mode

  • Type: string
  • Description: Method of processing the data. The default here should be document. You can also use table if you're parsing tables.

operations

  • Type: list of operation
  • Description: Each operation is a data field to extract from your data. Each operation is detailed by information about the target column, the data type of that column, the type of transformation to apply, and a description of the task.

Each object within operations encompasses the following parameters:

a. column_name

  • Type: string
  • Description: Names the column in the dataset on which the operation will be executed. It identifies the specific data point that will undergo transformation or extraction. Must be in snake case.

b. column_type

  • Type: string
  • Description: Indicates the data type of the target column, adhering to all PostgreSQL data types as documented in the PostgreSQL documentation (https://www.postgresql.org/docs/current/datatype.html). Valid types include, but are not limited to, text for string data, text[] for arrays of text, numeric for numerical data, and date for date values.

c. transform_type

  • Type: string
  • Description: Describes the transformation or extraction method to be applied to the data in the target column. The term "extraction" suggests that the operation aims to retrieve specific pieces of data from the column. Other types includes classification and generation.

d. task_description

  • Type: string
  • Description: Provides a clear, human-readable explanation of what the operation seeks to achieve. Examples include extracting URLs from text data, where the description would outline the purpose of extracting such information.

Here're an example of a transformation_params

{
 "model": "trellis-premium",
 "mode": "document",
"operations": 
  [ 			
  	{
      "column_name": "names_list",
      "column_type": "text[]",
      "transform_type": "extraction",
      "task_description": "List of person names mentioned in the email"
     },
     {
       "column_name": "is_investor",
       "column_type": "text",
       "transform_type": "classification",
       "task_description": "Is the email from an investor?",
       "output_values": {"yes":"this email is from investor", "no":"this email is not from investor"}
      },
  	]
}

When you're done with defining the transformation you can go to create transforms to kick-off the transformation run.