Goals

We will be analyzing a dataset of emails from the Enron Corporation which was made public during compliance investigation. In this quickstart, we wil use Trellis to set up the extraction and classification of all the emails.

1. Get your API key

First, go to https://dashboard.runtrellis.com/sign-up and create your account. Then, click settings on the lower right or visit the setting page and copy your API key.

2. Set your API key

Python
import requests
YOUR_API_KEY = "YOUR_API_KEY" # add your api key

If you have already created the project and transformation in the dashboard. You can skip directly to upload assets here

3. Create a Project to put your data

Each project has a unique ID. You can use this ID to refer to the project in the data upload process.

Python
YOUR_PROJ_NAME = "YOUR_PROJECT_NAME" 
url = f"https://api.runtrellis.com/v1/projects/create"

payload = {"name": YOUR_PROJ_NAME}
headers = {
    "Authorization": YOUR_API_KEY,
    "Content-Type": "application/json"
}
response = requests.request("POST", url, json=payload, headers=headers)
proj_id = response.json()["data"]["proj_id"]
print(proj_id)

4. Define the transformation you want

Transformation is a set of operations that run on all your assets in the project and turn it in to the format you want. In this example, we want to extract who the email is from, the topic of the email based on our defined taxonomy, people mentioned, and whether this is a compliance risk. More on how to define transform_params can be found here.

Python

url = f"https://api.runtrellis.com/v1/transforms/create"

payload = {
    "proj_id": proj_id,
    "transform_name": "email_analysis",
    "transform_params": {
        "model": "trellis-premium",
        "mode": "document",
        "operations": [
            {
                "column_name": "email_from",
                "column_type": "text",
                "transform_type": "extraction",
                "task_description": "extract who sent the email. This should be in From"
            },
            {
                "column_name": "email_to",
                "column_type": "text[]",
                "transform_type": "extraction",
                "task_description": "Extract a list of emails in the To section"
            },
            {
                "column_name": "people_mentioned",
                "column_type": "text[]",
                "transform_type": "extraction",
                "task_description": "Extract a list of people mentioned in the email. Return empty list if no one is being mentioned."
            },
            {
                "column_name": "compliance_risk",
                "column_type": "text",
                "transform_type": "classification",
                "task_description": "Classify whether the email contains information that's potential compliance violation",
                "output_values": {
                    "No": "the email does not contain potential compliance violation",
                    "Yes": "the email contains potential compliance violation"
                }
            },
            {
                "column_name": "one_line_summary",
                "column_type": "text",
                "transform_type": "generation",
                "task_description": "Summarize the email in one line"
            },
            {
                "column_name": "genre",
                "column_type": "text",
                "transform_type": "classification",
                "task_description": "Classify the genre of the emails.",
                "output_values": {
                    "employment": "topics related to job seeking, hiring, recommendations, etc",
                    "empty_message": "no information in the text",
                    "document_review": "collaborating on document, editing",
                    "purely_personal": "personal chat unrelated to work",
                    "company_business": "related to company business",
                    "logistics_arrangement": "meeting scheduling, technical support, etc",
                    "personal_professional": "Personal but in professional context (e.g., it was good working with you)"
                }
            },
            {
                "column_name": "primary_topics",
                "column_type": "text",
                "transform_type": "classification",
                "task_description": "Classify the specific topics of conversation",
                "output_values": {
                    "legal": "Topics around legal advice or involve legal matters",
                    "other": "Other topics not include in the existing categories",
                    "political": "Topics related political influence / contributions / contacts",
                    "regulation": "Topics around regulations and regulators (includes price caps)",
                    "company_image": "Topics around company image",
                    "energy_crisis": "Topics related to california energy crisis / california politics",
                    "internal_project": "Topics around internal projects -- progress and strategy",
                    "internal_operations": "Topics around Internal operations"
                }
            },
            {
                "column_name": "emotional_tone",
                "column_type": "text",
                "transform_type": "classification",
                "task_description": "Classify the tone and intent of the message.",
                "output_values": {
                    "anger": "The email has angry, aggresive or agitate tone.",
                    "humor": "The email is funny or has humorous tone",
                    "secret": "The email has secrecy / confidentiality tone or contains confidential information.",
                    "concern": "The email seems concern, worry or anxious",
                    "neutral": "The email is neutral",
                    "gratitude": "The email has gratitude or admiration tone"
                }
            }
        ]
    }
}
response = requests.request("POST", url, json=payload, headers=headers)
transform_id = response.json()["data"]["transform_id"]

5. Create event triggers to automatically run the extraction and transformation

Python

url = f"https://api.runtrellis.com/v1/events/subscriptions/actions/bulk"
payload = { "events_with_actions": [
        {
            "event_type": "asset_uploaded",
            "proj_id": proj_id, 
            "actions": [
                {
                    "type": "run_extraction",
                    "proj_id": proj_id
                }
            ],
        },
        {
            "event_type": "asset_extracted",
            "proj_id": proj_id,
            "actions": [
                {
                    "type": "refresh_transform",
                    "transform_id": transform_id
                }
            ]
        }
    ] }
headers = {
    "accept": "application/json",
    "content-type": "application/json",
    "Authorization": API_KEY
}

response = requests.post(url, json=payload, headers=headers)

print(response.text)


6. Upload the data to the project

Since we created the event triggers in the earlier step, the extraction and transformation will automatically run on the data we upload. If you set up the project and transformation in the dashboard, the event triggers should already be set up for you.

Python
url = f"https://api.runtrellis.com/v1/assets/upload"

payload = {
    "proj_id": proj_id,
    "urls": [
        "https://trellis-ai-public.s3.us-west-2.amazonaws.com/enron_clay_johnson_email.txt",
        "https://trellis-ai-public.s3.us-west-2.amazonaws.com/enron_memorial_day_plan.txt",
        "https://trellis-ai-public.s3.us-west-2.amazonaws.com/enron_mx_secretary_energy.txt"
    ]
}
response = requests.request("POST", url, json=payload, headers=headers)
asset_ids = [data["asset_id"] for data in response.json()["data"] ]
print(response.text)

7. Get results

Python
url = f"https://api.runtrellis.com/v1/transforms/{transform_id}/results"
payload = {"filters": {}, "asset_ids": asset_ids}
response = requests.request("POST", url, json=payload, headers=headers)
print(response.text)