Goals

We will be analyzing a dataset of emails from the Enron Corporation which was made public during compliance investigation. In this quickstart, we wil use Trellis to set up the extraction and classification of all the emails.

1. Get your API key

First, go to https://dashboard.runtrellis.com/sign-up and create your account. Then, click settings on the lower right or visit the setting page and copy your API key.

2. Set your API key

Python
import requests
YOUR_API_KEY = "YOUR_API_KEY" # add your api key

If you have already created the project and transformation in the dashboard. You can skip directly to upload assets here

3. Create a Project to put your data

Each project has a unique ID. You can use this ID to refer to the project in the data upload process.

Python
YOUR_PROJ_NAME = "YOUR_PROJECT_NAME" 
url = f"https://api.runtrellis.com/v1/projects/create"

payload = {"name": YOUR_PROJ_NAME}
headers = {
    "Authorization": YOUR_API_KEY,
    "Content-Type": "application/json"
}
response = requests.request("POST", url, json=payload, headers=headers)
proj_id = response.json()["data"]["proj_id"]
print(proj_id)

4. Define the transformation you want

Transformation is a set of operations that run on all your assets in the project and turn it in to the format you want. In this example, we want to extract who the email is from, the topic of the email based on our defined taxonomy, people mentioned, and whether this is a compliance risk. More on how to define transform_params can be found here.

Python

url = f"https://api.runtrellis.com/v1/transforms/create"

payload = {
    "proj_id": proj_id,
    "transform_name": "email_analysis",
    "transform_params": {
        "model": "trellis-premium",
        "mode": "document",
        "operations": [
            {
                "column_name": "email_from",
                "column_type": "text",
                "transform_type": "extraction",
                "task_description": "extract who sent the email. This should be in From"
            },
            {
                "column_name": "people_mentioned",
                "column_type": "text[]",
                "transform_type": "extraction",
                "task_description": "Extract a list of people mentioned in the email. Return empty list if no one is being mentioned."
            },
            {
                "column_name": "compliance_risk",
                "column_type": "text",
                "transform_type": "classification",
                "task_description": "Classify whether the email contains information that's potential compliance violation",
                "output_values": {
                    "No": "the email does not contain potential compliance violation",
                    "Yes": "the email contains potential compliance violation"
                }
            },
            {
                "column_name": "one_line_summary",
                "column_type": "text",
                "transform_type": "generation",
                "task_description": "Summarize the email in one line"
            },
            {
                "column_name": "genre",
                "column_type": "text",
                "transform_type": "classification",
                "task_description": "Classify the genre of the emails.",
                "output_values": {
                    "employment": "topics related to job seeking, hiring, recommendations, etc",
                    "empty_message": "no information in the text",
                    "document_review": "collaborating on document, editing",
                    "personal": "personal chat unrelated to work",
                    "company_business": "related to company business",
                    "logistics_arrangement": "meeting scheduling, technical support, etc",
                    "other": "Other email"
                }
            }
        ]
    }
}
response = requests.request("POST", url, json=payload, headers=headers)
transform_id = response.json()["data"]["transform_id"]

5. Create event triggers to automatically run the extraction and transformation

Python

url = f"https://api.runtrellis.com/v1/events/subscriptions/actions/bulk"
payload = { "events_with_actions": [
        {
            "event_type": "asset_uploaded",
            "proj_id": proj_id, 
            "actions": [
                {
                    "type": "run_extraction",
                    "proj_id": proj_id
                }
            ],
        },
        {
            "event_type": "asset_extracted",
            "proj_id": proj_id,
            "actions": [
                {
                    "type": "refresh_transform",
                    "transform_id": transform_id
                }
            ]
        }
    ] }
headers = {
    "accept": "application/json",
    "content-type": "application/json",
    "Authorization": API_KEY
}

response = requests.post(url, json=payload, headers=headers)

print(response.text)


6. Upload the data to the project

Since we created the event triggers in the earlier step, the extraction and transformation will automatically run on the data we upload. If you set up the project and transformation in the dashboard, the event triggers should already be set up for you.

If you have files locally instead of presigned URLs, you can use create presigned URLs endpoint to generate the presigned URLs to upload to Trellis.

Python
url = f"https://api.runtrellis.com/v1/assets/upload"

payload = {
    "proj_id": proj_id,
    "urls": [
        "https://trellis-ai-public.s3.us-west-2.amazonaws.com/enron_clay_johnson_email.txt",
        "https://trellis-ai-public.s3.us-west-2.amazonaws.com/enron_memorial_day_plan.txt",
        "https://trellis-ai-public.s3.us-west-2.amazonaws.com/enron_mx_secretary_energy.txt"
    ]
}
response = requests.request("POST", url, json=payload, headers=headers)
asset_ids = [data["asset_id"] for data in response.json()["data"] ]
print(response.text)

7. Get results

Python
url = f"https://api.runtrellis.com/v1/transforms/{transform_id}/results"
payload = {"filters": {}, "asset_ids": asset_ids}
response = requests.request("POST", url, json=payload, headers=headers)
print(response.text)