β
Goal
We will be taking a public W2 form (filled out here). Weβll upload the form to Trellis, and use their advanced-markdown parse strategy to convert the PDF to markdown.
π Alright, letβs get started!
1. Set your API key
Create your account here [https://dashboard.runtrellis.com/sign-up]. Then, click settings
on the lower right or visit the setting page and copy your API key.
import requests
YOUR_API_KEY = "YOUR_API_KEY" # add your api key
2. Create a Project to put your data
Each project has a unique ID. You can use this ID to refer to the project in the data upload process.
YOUR_PROJ_NAME = "YOUR_PROJECT_NAME"
url = f"https://api.runtrellis.com/v1/projects/create"
payload = {"name": YOUR_PROJ_NAME}
headers = {
"Authorization": YOUR_API_KEY,
"Content-Type": "application/json"
}
response = requests.request("POST", url, json=payload, headers=headers)
proj_id = response.json()["data"]["proj_id"]
print(proj_id)
3. Upload the data to the project
In this example, we will use sample W2 form data.
url = f"https://api.runtrellis.com/v1/assets/upload"
payload = {
"proj_id": proj_id,
"urls": ["https://trellis-ai-public.s3.us-west-2.amazonaws.com/pdf_w2_clean/W2_XL_input_clean_1.pdf"
]
}
response = requests.request("POST", url, json=payload, headers=headers)
asset_ids = [data["asset_id"] for data in response.json()["data"] ]
print(response.text)
url = "https://api.runtrellis.com/v1/assets/extract"
payload = {
"asset_ids": asset_ids,
"proj_id": proj_id,
"parse_strategy": "advanced-markdown",
"run_on_all_assets": True
}
response = requests.request("POST", url, json=payload, headers=headers)
print(response.text)
5. Fetch the markdown results
url = f"https://api.runtrellis.com/v1/assets/{asset_ids[0]}/extract"
response = requests.request("GET", url, headers=headers)
data = response.json()["data"]
6. Print the markdown
import json
extraction = json.loads(data["extraction"])
for key, value in extraction.items():
print(value)
# W-2 Wage and Tax Statement 2010 (REISSUED STATEMENT)
## Employee Information
| Field | Value |
|-------|--------|
| SSN | 522-49-1342 |
| Name | Phyllis Castro |
| Address | 40553 Lewis Glen |
| City, State ZIP | East Jeremy, ND 81378-8062 |
## Employer Information
| Field | Value |
|-------|--------|
| EIN | 42-6960984 |
| Name | Newton Ltd Group |
| Address | 0423 Jacob Rest |
| City, State ZIP | Davidborough, VA 60964-5066 |
| Control Number | 3262362 |
## Wage and Tax Information
| Box | Description | Amount |
|-----|-------------|---------|
| 1 | Wages, tips, other compensation | 135905.73 |
| 2 | Federal income tax withheld | 38637.8 |
| 3 | Social security wages | 110799.84 |
| 4 | Social security tax withheld | 8476.19 |
| 5 | Medicare wages and tips | 107411.6 |
| 6 | Medicare tax withheld | 3114.94 |
| 7 | Social security tips | 110799.84 |
| 8 | Allocated tips | 107411.6 |
| 9 | Advance EIC payment | |
| 10 | Dependent care benefits | 288 |
| 11 | Nonqualified plans | 125 |
| 12a | See instructions for box 12 | P 7288 |
| 12b | | D 642 |
| 12c | | 985 |
| 12d | | 305 |
## State and Local Tax Information
| State | State ID Number | State Wages | State Tax | Local Wages | Local Tax | Locality Name |
|-------|-----------------|-------------|------------|--------------|------------|---------------|
| VT | 658-42-659 | 65022.49 | 4605.02 | 143189.23 | 19935.39 | Williams Extension |
| WV | 031-16-217 | 73141.9 | 3714.53 | 123497.08 | 17867.98 | Tracy Roads |