We will be analyzing a dataset of dummy pathology reports. We’ll upload the reports to Trellis, and use Trellis’s extraction and classification operations to process these reports.
A transformation is a set of operations that define a custom data extraction pipeline for your assets.
In this example, we want to extract the following key information:
subject: Who is the subject of the report?
gender: What is the gender of the subject?
diagnosis: What is the ultimate diagnosis?
original_company: The healthcare company issuing the report.
comments: A list of extra comments, if they exist.
Since we created the event triggers in the earlier step, the extraction and transformation will automatically run on the data we upload.
Note: If you set up the project and transformation in the dashboard, the event triggers should already be set up for you.
Now, it’s time to upload the medical reports to our transformation. For illustrative purposes, we will only upload 3 documents.
Tip: If you have files locally instead of presigned URLs, you can use create presigned URLs endpoint to generate the presigned URLs to upload to Trellis.
url = f"https://api.runtrellis.com/v1/transforms/{transform_id}/results"response = requests.request("POST", url, headers=headers)response_data = response.json()op_id_to_name = { col["id"]: col["name"] for col in response_data["metadata"]["column_definitions"]}mapped_data = [ {op_id_to_name.get(k, k): v for k, v in entry.items()} for entry in response_data["data"]]DEFAULT_COLUMNS = [ "asset_id", "result_id", "ext_file_name", "ext_file_id", "metadata", "asset_status",]for row in mapped_data: print(f"Row ID: {row['result_id']}\n" + "-" * 50) for key, value in row.items(): if key in DEFAULT_COLUMNS: continue formatted_value = value if isinstance(value, list): formatted_value = f"{len(value)} comment(s)" if isinstance(value, str) and len(value) > 100: formatted_value = value[:100] + "..." print(f"{key.replace('_', ' ').title()}: {formatted_value}") print("=" * 50)
Row ID: row_2tpUzpuwGw5kAVhtq3wCDlJkAHz--------------------------------------------------Original Company: CBLPathComments: 13 comment(s)Diagnosis: Right breast poorly differentiated invasive ductal carcinoma measuring 1.9 cm with negative ER, PR, ...Subject: Jane DoeGender: Female==================================================Row ID: row_2tpUzpwTN4OmILc8Ej8ga2THYc0--------------------------------------------------Original Company: Faulkner HospitalComments: 7 comment(s)Diagnosis: Papillary carcinoma with two foci: follicular variant (1.7 cm) and classical variant (0.2 cm), confi...Subject: NoneGender: Female==================================================Row ID: row_2tpUzpxywobkJJQslppTQvP8DoX--------------------------------------------------Original Company: REGIONAL MEDICAL LABORATORYComments: 1 comment(s)Diagnosis: Skin, chest, punch biopsy - Grover's diseaseSubject: TEST, DAVID1Gender: Male==================================================