โœ… Goal

We will be taking a public W2 form (filled out here). Weโ€™ll upload the form to Trellis, and use their advanced-markdown parse strategy to convert the PDF to markdown.

๐Ÿ Alright, letโ€™s get started!

1. Set your API key

Create your account here [https://dashboard.runtrellis.com/sign-up]. Then, click settings on the lower right or visit the setting page and copy your API key.

Python
import requests
YOUR_API_KEY = "YOUR_API_KEY" # add your api key

2. Create a Project to put your data

Each project has a unique ID. You can use this ID to refer to the project in the data upload process.

Python
YOUR_PROJ_NAME = "YOUR_PROJECT_NAME" 
url = f"https://api.runtrellis.com/v1/projects/create"

payload = {"name": YOUR_PROJ_NAME}
headers = {
    "Authorization": YOUR_API_KEY,
    "Content-Type": "application/json"
}
response = requests.request("POST", url, json=payload, headers=headers)
proj_id = response.json()["data"]["proj_id"]
print(proj_id)

3. Upload the data to the project

In this example, we will use sample W2 form data.

Python
url = f"https://api.runtrellis.com/v1/assets/upload"

payload = {
    "proj_id": proj_id,
    "urls": ["https://trellis-ai-public.s3.us-west-2.amazonaws.com/pdf_w2_clean/W2_XL_input_clean_1.pdf"
    ]
}
response = requests.request("POST", url, json=payload, headers=headers)
asset_ids = [data["asset_id"] for data in response.json()["data"] ]
print(response.text)

4. Start the PDF to markdown extraction

Python

url = "https://api.runtrellis.com/v1/assets/extract"

payload = {
    "asset_ids": asset_ids,
    "proj_id": proj_id,
    "parse_strategy": "advanced-markdown",
    "run_on_all_assets": True
}


response = requests.request("POST", url, json=payload, headers=headers)

print(response.text)

5. Fetch the markdown results

Python
url = f"https://api.runtrellis.com/v1/assets/{asset_ids[0]}/extract"
  
response = requests.request("GET", url, headers=headers)
data = response.json()["data"]

6. Print the markdown

Python
import json

extraction = json.loads(data["extraction"])
for key, value in extraction.items():
  print(value)
# W-2 Wage and Tax Statement 2010 (REISSUED STATEMENT)

## Employee Information
| Field | Value |
|-------|--------|
| SSN | 522-49-1342 |
| Name | Phyllis Castro |
| Address | 40553 Lewis Glen |
| City, State ZIP | East Jeremy, ND 81378-8062 |

## Employer Information
| Field | Value |
|-------|--------|
| EIN | 42-6960984 |
| Name | Newton Ltd Group |
| Address | 0423 Jacob Rest |
| City, State ZIP | Davidborough, VA 60964-5066 |
| Control Number | 3262362 |

## Wage and Tax Information
| Box | Description | Amount |
|-----|-------------|---------|
| 1 | Wages, tips, other compensation | 135905.73 |
| 2 | Federal income tax withheld | 38637.8 |
| 3 | Social security wages | 110799.84 |
| 4 | Social security tax withheld | 8476.19 |
| 5 | Medicare wages and tips | 107411.6 |
| 6 | Medicare tax withheld | 3114.94 |
| 7 | Social security tips | 110799.84 |
| 8 | Allocated tips | 107411.6 |
| 9 | Advance EIC payment | |
| 10 | Dependent care benefits | 288 |
| 11 | Nonqualified plans | 125 |
| 12a | See instructions for box 12 | P 7288 |
| 12b | | D 642 |
| 12c | | 985 |
| 12d | | 305 |

## State and Local Tax Information
| State | State ID Number | State Wages | State Tax | Local Wages | Local Tax | Locality Name |
|-------|-----------------|-------------|------------|--------------|------------|---------------|
| VT | 658-42-659 | 65022.49 | 4605.02 | 143189.23 | 19935.39 | Williams Extension |
| WV | 031-16-217 | 73141.9 | 3714.53 | 123497.08 | 17867.98 | Tracy Roads |