✅ Goal

We will be analyzing a dataset of dummy pathology reports. We’ll upload the reports to Trellis, and use Trellis’s extraction and classification operations to process these reports.

🏁 Alright, let’s get started!

1. Get your API key

First, go to https://dashboard.runtrellis.com/sign-up and create your account. Then, click settings on the lower right or visit the setting page and copy your API key.

2. Set your API key

Python
import requests
YOUR_API_KEY = "YOUR_API_KEY" # add your api key

If you have already created the project and transformation in the dashboard. You can skip directly to upload assets here

3. Create a Project to put your data

Each project has a unique ID. You can use this ID to refer to the project in the data upload process.

Python
YOUR_PROJ_NAME = "YOUR_PROJECT_NAME" 
url = f"https://api.runtrellis.com/v1/projects/create"

payload = {"name": YOUR_PROJ_NAME}
headers = {
    "Authorization": YOUR_API_KEY,
    "Content-Type": "application/json"
}
response = requests.request("POST", url, json=payload, headers=headers)
proj_id = response.json()["data"]["proj_id"]
print(proj_id)

4. Define the transformation you want

A transformation is a set of operations that define a custom data extraction pipeline for your assets. In this example, we want to extract the following key information:

  1. subject: Who is the subject of the report?

  2. gender: What is the gender of the subject?

  3. diagnosis: What is the ultimate diagnosis?

  4. original_company: The healthcare company issuing the report.

  5. comments: A list of extra comments, if they exist.

Define your first Transformation!

To learn all about what Trellis Transformations have to offer, read our guide here!

Python
url = f"https://api.runtrellis.com/v1/transforms/create"

payload = {
  "transform_name": "Medical Example",
  "proj_id": proj_id,
  "params": {
    "transform_params": {
      "mode": "document",
      "model": "trellis-vertix",
      "operations": [
        {
          "column_name": "report",
          "column_type": "assets",
          "transform_type": "parse",
          "task_description": "N/A"
        },
        {
          "column_name": "gender",
          "column_type": "text",
          "transform_type": "classification",
          "task_description": "The gender of the {{subject}} in the {{report}}",
          "output_values": {
            "Male": "Male",
            "Female": "Female",
            "Unknown": "Unknown"
          },
          "has_default": true,
          "default_value": {
            "value": "Unknown"
          }
        },
        {
          "column_name": "subject",
          "column_type": "text",
          "transform_type": "extraction",
          "task_description": "The person who is the subject of the {{report}}, if given"
        },
        {
          "column_name": "diagnosis",
          "column_type": "text",
          "transform_type": "extraction",
          "task_description": "Summarize the diagnosis in the {{report}} "
        },
        {
          "column_name": "original_company",
          "column_type": "text",
          "transform_type": "extraction",
          "task_description": "The company issuing the {{report}} "
        },
        {
          "column_name": "comments2",
          "column_type": "list",
          "task_description": "comments in the {{report}} ",
          "transform_type": "extraction",
          "operations": [
            {
              "column_name": "comment",
              "column_type": "text",
              "task_description": "The comment.",
              "transform_type": "extraction"
            }
          ]
        }
      ]
    }
  }
}
response = requests.request("POST", url, json=payload, headers=headers)
transform_id = response.json()["data"]["transform_id"]

5. Create event triggers to automatically run the extraction and transformation

Python

url = f"https://api.runtrellis.com/v1/events/subscriptions/actions/bulk"
payload = { "events_with_actions": [
        {
            "event_type": "asset_uploaded",
            "transform_id": transform_id, 
            "actions": [
                {
                    "type": "run_extraction",
                    "transform_id": transform_id
                }
            ],
        },
        {
            "event_type": "asset_extracted",
            "transform_id": transform_id,
            "actions": [
                {
                    "type": "refresh_transform",
                    "transform_id": transform_id
                }
            ]
        }
    ] }
headers = {
    "accept": "application/json",
    "content-type": "application/json",
    "Authorization": API_KEY
}

response = requests.post(url, json=payload, headers=headers)

print(response.text)


6. Upload the data to the project

Since we created the event triggers in the earlier step, the extraction and transformation will automatically run on the data we upload.

Note: If you set up the project and transformation in the dashboard, the event triggers should already be set up for you.

Now, it’s time to upload the medical reports to our transformation. For illustrative purposes, we will only upload 3 documents.

Tip: If you have files locally instead of presigned URLs, you can use create presigned URLs endpoint to generate the presigned URLs to upload to Trellis.

Python
url = f"https://api.runtrellis.com/v1/assets/upload"

payload = {
    "transform_id": transform_id,
    "urls": [
        "https://trellis-ai-public.s3.us-west-2.amazonaws.com/medical_example/cblpath.pdf",
        "https://trellis-ai-public.s3.us-west-2.amazonaws.com/medical_example/faulkner.pdf",
        "https://trellis-ai-public.s3.us-west-2.amazonaws.com/medical_example/rml.pdf"
    ]
}
response = requests.request("POST", url, json=payload, headers=headers)

7. Get results!

url = f"https://api.runtrellis.com/v1/transforms/{transform_id}/results"
response = requests.request("POST", url, headers=headers)

response_data = response.json()

op_id_to_name = {
    col["id"]: col["name"] 
    for col in response_data["metadata"]["column_definitions"]
}

mapped_data = [
    {op_id_to_name.get(k, k): v for k, v in entry.items()}
    for entry in response_data["data"]
]

DEFAULT_COLUMNS = [
    "asset_id",
    "result_id",
    "ext_file_name",
    "ext_file_id",
    "metadata",
    "asset_status",
]

for row in mapped_data:
    print(f"Row ID: {row['result_id']}\n" + "-" * 50)

    for key, value in row.items():
        if key in DEFAULT_COLUMNS:
            continue

        formatted_value = value

        if isinstance(value, list):
            formatted_value = f"{len(value)} comment(s)"

        if isinstance(value, str) and len(value) > 100:
            formatted_value = value[:100] + "..."

        print(f"{key.replace('_', ' ').title()}: {formatted_value}")

    print("=" * 50)

Row ID: row_2tpUzpuwGw5kAVhtq3wCDlJkAHz
--------------------------------------------------
Original Company: CBLPath
Comments: 13 comment(s)
Diagnosis: Right breast poorly differentiated invasive ductal carcinoma measuring 1.9 cm with negative ER, PR, ...
Subject: Jane Doe
Gender: Female
==================================================
Row ID: row_2tpUzpwTN4OmILc8Ej8ga2THYc0
--------------------------------------------------
Original Company: Faulkner Hospital
Comments: 7 comment(s)
Diagnosis: Papillary carcinoma with two foci: follicular variant (1.7 cm) and classical variant (0.2 cm), confi...
Subject: None
Gender: Female
==================================================
Row ID: row_2tpUzpxywobkJJQslppTQvP8DoX
--------------------------------------------------
Original Company: REGIONAL MEDICAL LABORATORY
Comments: 1 comment(s)
Diagnosis: Skin, chest, punch biopsy - Grover's disease
Subject: TEST, DAVID1
Gender: Male
==================================================