π Using Python to Access the CKAN DataStore API
Python is a powerful way to interact with CKANβs DataStore API.
Instead of downloading entire files, you can query, filter, and retrieve only the data you need.
π₯ Basic Approaches
Two common libraries for making HTTP requests in Python are:
requestsβ simple, widely used, integrates well withpandasurllib.requestβ part of Pythonβs standard library, no extra installation needed
Examples
import pandas as pd
import requests
# API endpoint
url = "https://canwin-datahub.ad.umanitoba.ca/data/api/3/action/datastore_search?resource_id=c5c16064-e2b3-4618-9b27-0dbf5c1388c2"
# Make a GET request
response = requests.get(url)
# Convert response to JSON dictionary
response_data = response.json()
# Extract records
data = response_data["result"]["records"]
# Inspect column headers
col_headers = data[0].keys()
# Filter rows where Cast == "3"
filtered_data = [row for row in data if row["Cast"] == "3"]
# Convert to DataFrames
df_all = pd.DataFrame(data)
df_filtered = pd.DataFrame(filtered_data)
# Save to CSV
df_all.to_csv("output.csv", index=False)
import urllib.request
import json
import pandas as pd
url = "https://canwin-datahub.ad.umanitoba.ca/data/api/3/action/datastore_search?resource_id=ea474f80-dcbe-4647-a28d-7fdce1293e09"
# Make request
http_response = urllib.request.urlopen(url)
# Read raw bytes
raw_data = http_response.read()
# Decode using response encoding
encoding = http_response.info().get_content_charset()
data_dict = json.loads(raw_data.decode(encoding))
# Extract records
data = data_dict["result"]["records"]
# Convert to DataFrame
df = pd.DataFrame(data)
# Save to CSV
df.to_csv("output.csv", index=False)
βοΈ Adding Parameters with datastore_search
You can refine queries by appending parameters to the URL after the resource_id.
Use & between each key=value pair.
- Limit results β
&limit=2 - Filter records β
&filters={"key":"value"}
See full parameter list in the DataStore API reference.
Examples
Format: &filters={"key":"value"}
url = "https://canwin-datahub.ad.umanitoba.ca/data/api/3/action/datastore_search?resource_id=c5c16064-e2b3-4618-9b27-0dbf5c1388c2&filters={\"Cast\":\"3\",\"sample_date\":\"2016-06-09T00:00:00\"}"
response = requests.get(url) data = response.json()["result"]["records"]
df = pd.DataFrame(data) df ```
π Authentication
Some endpoints require an API key.
Include it in your request headers: