Harvesting
This page provides information on how to harvest data from the ARTESP Open Data Portal. Harvesting allows you to automatically collect and synchronize datasets from our portal to your own systems. We offer multiple methods for harvesting data, making it easy to integrate our open datasets into your applications, analysis tools, or other data platforms.
What is Harvesting?
Harvesting is the process of automatically collecting metadata and data from one data portal to another. It allows organizations and individuals to keep a local copy of datasets that are synchronized with the original source. This is particularly useful for:
- Creating federated or aggregated data catalogs
- Building applications that need regular data updates
- Integrating open data into your own systems
- Performing analysis across multiple datasets
Available Harvesting Methods
DCAT RDF Endpoints
Our portal supports the Data Catalog Vocabulary (DCAT) standard, which provides a framework for describing datasets in a catalog. We offer the following DCAT endpoints:
Catalog Endpoint
Access all datasets in our catalog through:
https://dadosabertos.artesp.sp.gov.br/catalog.{format}
where {format} can be xml, ttl, n3, or jsonld
Parameters:
page={number}
- For pagination (default: 1)modified_since={ISO-date}
- Filter datasets modified since a specific dateq={query}
- Search query to filter datasets
Example: https://dadosabertos.artesp.sp.gov.br/catalog.xml?page=2&modified_since=2023-01-01
Individual Dataset Endpoints
Access metadata for a specific dataset:
https://dadosabertos.artesp.sp.gov.br/dataset/{dataset-id}.{format}
where {format} can be xml, ttl, n3, or jsonld
Example: https://dadosabertos.artesp.sp.gov.br/dataset/acidentes.xml
Content Negotiation
Our portal also supports content negotiation, allowing clients to request specific formats using HTTP Accept headers:
application/rdf+xml
for RDF/XML formattext/turtle
for Turtle formattext/n3
for N3 formatapplication/ld+json
for JSON-LD format
Example using curl: curl -H "Accept: text/turtle" https://dadosabertos.artesp.sp.gov.br/dataset/rodovias-concedidas
DCAT Configuration
Our DCAT implementation is configured with the following settings:
- RDF profile: DCAT-AP 3.0
- RDF endpoints enabled
- Content negotiation enabled
- 100 datasets per page configuration
Setting Up a Harvester in CKAN
If you are using CKAN to harvest data from our portal, you can use the CKAN Harvester extension. Here are the basic steps:
- Install the ckanext-harvest extension in your CKAN instance
- Configure the harvester to use either the CKAN harvester (for CKAN-to-CKAN harvesting) or the DCAT RDF harvester (for harvesting via our DCAT endpoints)
- Create a new harvest source pointing to our portal URL
- Configure the harvester with appropriate options (frequency, filters, etc.)
- Start the harvesting process
Example Configuration for DCAT RDF Harvester
When setting up a DCAT RDF harvester, you can use this configuration:
{ "rdf_format": "xml", "profiles": ["euro_dcat_ap_3"], "default_extras": { "harvest_source_title": "ARTESP Open Data Portal", "harvest_source_url": "https://dadosabertos.artesp.sp.gov.br/" } }
Using the CKAN API for Data Access
Beyond DCAT harvesting, the CKAN Action API offers a powerful RPC-style interface to interact with the portal programmatically. You can retrieve dataset information, search for data, and much more using HTTP requests with JSON payloads. This method provides fine-grained control over data access.
Introduction to the CKAN API
The API allows you to perform most actions available through the web interface.
API Base URL: https://dadosabertos.artesp.sp.gov.br/api/3/action/
For example, to list datasets, the action name is `package_list`, and the full URL would be: https://dadosabertos.artesp.sp.gov.br/api/3/action/package_list
JSON Response Structure
A typical API response is a JSON object with the following structure:
{
"help": "Help text for the called action...",
"success": true, // or false in case of error
"result": [ /* ...data returned by the action... */ ],
"error": { // Present only if "success" is false
"message": "Error message",
"__type": "Error Type"
}
}
help
: A documentation string for the API function you called.success
: A boolean indicating if the call was successful (true) or not (false). Always check this field.result
: The data returned by the function. Its structure depends on the specific action.error
: If `success` is false, this object contains details about the error.
API Version
The current and recommended API version is v3. It is good practice to include `/api/3/` in your request URLs to ensure compatibility.
Authentication
Most read actions on public data do not require authentication. However, actions that modify data (create, update, delete) or access private datasets require an API Key. This key should be included in the `Authorization` HTTP header.
Example header: Authorization: YOUR_API_KEY_HERE
You can usually find your API key on your user profile page on the CKAN site.
Common API Actions for Data Retrieval
Here are some common read-only actions useful for accessing data and metadata:
List Datasets (package_list)
Returns a list of the names (IDs) of all public datasets.
Example using cURL:
curl -X GET "https://dadosabertos.artesp.sp.gov.br/api/3/action/package_list"
Show Dataset Details (package_show)
Returns complete information about a specific dataset, including its resources.
Parameters:
id
(string): The name (ID) or UUID of the dataset.
Example using cURL (replace `your-dataset-id` with an actual ID):
curl -X GET "https://dadosabertos.artesp.sp.gov.br/api/3/action/package_show?id=your-dataset-id"
Search Datasets (package_search)
Allows searching for datasets based on various criteria.
Common Parameters:
q
(string): Search term (e.g., `q=transport`).fq
(string): Filter query using Solr syntax (e.g., `fq=tags:economy organization:artesp`).rows
(int): Number of results per page (default 10).start
(int): Offset for pagination.sort
(string): Sorting criteria (e.g., `sort=score desc, metadata_modified desc`).
Example using cURL (searching for "rodovias" and limiting to 5 results):
curl -X GET "https://dadosabertos.artesp.sp.gov.br/api/3/action/package_search?q=rodovias&rows=5"
Other Listing and Show Actions
Similar actions are available for other CKAN entities:
organization_list
/organization_show
: For organizations.group_list
/group_show
: For groups.tag_list
/tag_show
: For tags.resource_show
: To get details of a specific resource (file/link within a dataset). Requires resource ID.
API Actions Cheat Sheet
The following table provides a quick summary of common API actions:
Action Name | Description | HTTP Method | Key Parameters (in JSON body or URL query) |
---|---|---|---|
package_list |
Returns a list of the names (IDs) of all public datasets. | GET or POST | limit (int, optional), offset (int, optional) |
package_show |
Returns detailed metadata for a specific dataset. | GET or POST | id (string, required: dataset ID or name) |
package_search |
Searches datasets based on various criteria. | GET or POST | q (string, optional: search term), fq (string, optional: filter query), rows (int, optional), start (int, optional) |
resource_show |
Returns detailed metadata for a specific resource. | GET or POST | id (string, required: resource ID) |
organization_list |
Returns a list of names (IDs) of all public organizations. | GET or POST | limit (int, optional), offset (int, optional), all_fields (boolean, optional) |
organization_show |
Returns detailed metadata for a specific organization. | GET or POST | id (string, required: organization ID or name), include_datasets (boolean, optional) |
group_list |
Returns a list of names (IDs) of all public groups. | GET or POST | limit (int, optional), offset (int, optional), all_fields (boolean, optional) |
tag_list |
Returns a list of all tag names. | GET or POST | query (string, optional), vocabulary_id (string, optional) |
package_create |
Creates a new dataset. (Requires Auth) | POST | name (string, required), owner_org (string, required: organization ID), title (string, optional), resources (list, optional) |
resource_create |
Adds a new resource to a dataset. (Requires Auth) | POST | package_id (string, required), url (string, if not uploading) or upload (file, for direct upload), name (string, optional) |
package_update / package_patch |
Updates an existing dataset (fully or partially). (Requires Auth) | POST | id or name (string, required), other dataset fields to modify. |
resource_update / resource_patch |
Updates an existing resource (fully or partially). (Requires Auth) | POST | id (string, required), other resource fields to modify. |
package_delete |
Marks a dataset as deleted. (Requires Auth) | POST | id (string, required: dataset ID or name) |
Using the `ckanapi` Python Client and CLI
For Python users and system administrators, the `ckanapi` library offers a convenient way to interact with the CKAN API, both as a Python module and a command-line interface (CLI) tool.
Installation:
pip install ckanapi
CLI Examples:
- List datasets:
ckanapi action package_list -r https://dadosabertos.artesp.sp.gov.br
- Show dataset details (replace `your-dataset-id`):
ckanapi action package_show id=your-dataset-id -r https://dadosabertos.artesp.sp.gov.br
The `ckanapi` library is highly recommended for scripting interactions with the API.
API Usage Tips and Best Practices
- Check `success` Field: Always verify the `success` field in the API response, not just the HTTP status code, to confirm the action was successful.
- Error Handling: Implement robust error handling by parsing the `error` object when `success` is `false`.
- Pagination: For actions that return lists (like `package_search` or `package_list`), use parameters like `rows` (or `limit`) and `start` (or `offset`) to paginate through results.
- Rate Limiting: Be aware that the API might have rate limits. Design your applications to handle potential throttling gracefully.
- Full Documentation: For a comprehensive list of API actions, their parameters, and more detailed examples, refer to the official CKAN API documentation (often found at `/api/3` on the CKAN instance) or specific guides provided by this portal.
Last updated: June 2025