Skip to content

🐍 Using Python to Access the CKAN DataStore API

Python is a powerful way to interact with CKAN’s DataStore API.
Instead of downloading entire files, you can query, filter, and retrieve only the data you need.


πŸ“₯ Basic Approaches

Two common libraries for making HTTP requests in Python are:

  • requests β†’ simple, widely used, integrates well with pandas
  • urllib.request β†’ part of Python’s standard library, no extra installation needed
Examples
import pandas as pd
import requests

# API endpoint
url = "https://canwin-datahub.ad.umanitoba.ca/data/api/3/action/datastore_search?resource_id=c5c16064-e2b3-4618-9b27-0dbf5c1388c2"

# Make a GET request
response = requests.get(url)

# Convert response to JSON dictionary
response_data = response.json()

# Extract records
data = response_data["result"]["records"]

# Inspect column headers
col_headers = data[0].keys()

# Filter rows where Cast == "3"
filtered_data = [row for row in data if row["Cast"] == "3"]

# Convert to DataFrames
df_all = pd.DataFrame(data)
df_filtered = pd.DataFrame(filtered_data)

# Save to CSV
df_all.to_csv("output.csv", index=False)
import urllib.request
import json
import pandas as pd

url = "https://canwin-datahub.ad.umanitoba.ca/data/api/3/action/datastore_search?resource_id=ea474f80-dcbe-4647-a28d-7fdce1293e09"

# Make request
http_response = urllib.request.urlopen(url)

# Read raw bytes
raw_data = http_response.read()

# Decode using response encoding
encoding = http_response.info().get_content_charset()
data_dict = json.loads(raw_data.decode(encoding))

# Extract records
data = data_dict["result"]["records"]

# Convert to DataFrame
df = pd.DataFrame(data)

# Save to CSV
df.to_csv("output.csv", index=False)

You can refine queries by appending parameters to the URL after the resource_id.
Use & between each key=value pair.

  • Limit results β†’ &limit=2
  • Filter records β†’ &filters={"key":"value"}

See full parameter list in the DataStore API reference.

Examples

Format: &filters={"key":"value"}

import pandas as pd
import requests

url = "https://canwin-datahub.ad.umanitoba.ca/data/api/3/action/datastore_search?resource_id=c5c16064-e2b3-4618-9b27-0dbf5c1388c2&filters={\"Cast\":\"3\",\"sample_date\":\"2016-06-09T00:00:00\"}"

response = requests.get(url) data = response.json()["result"]["records"]

df = pd.DataFrame(data) df ```

Format: &limit=2

import pandas as pd
import requests

url = "https://canwin-datahub.ad.umanitoba.ca/data/api/3/action/datastore_search?resource_id=c5c16064-e2b3-4618-9b27-0dbf5c1388c2&limit=2"

response = requests.get(url)
data = response.json()["result"]["records"]

df = pd.DataFrame(data)
df


πŸ”‘ Authentication

Some endpoints require an API key.
Include it in your request headers:

import requests

url = "https://canwin-datahub.ad.umanitoba.ca/data/api/3/action/package_list"
headers = {"Authorization": "YOUR-API-KEY"}

response = requests.get(url, headers=headers)
print(response.json())