Harvesting
This page provides information on how to harvest data from the ARTESP Open Data Portal. Harvesting allows you to automatically collect and synchronize datasets from our portal to your own systems. We offer multiple methods for harvesting data, making it easy to integrate our open datasets into your applications, analysis tools, or other data platforms.
What is Harvesting?
Harvesting is the process of automatically collecting metadata and data from one data portal to another. It allows organizations and individuals to keep a local copy of datasets that are synchronized with the original source. This is particularly useful for:
- Creating federated or aggregated data catalogs
- Building applications that need regular data updates
- Integrating open data into your own systems
- Performing analysis across multiple datasets
Available Harvesting Methods
DCAT RDF Endpoints
Our portal supports the Data Catalog Vocabulary (DCAT) standard, which provides a framework for describing datasets in a catalog. We offer the following DCAT endpoints:
Catalog Endpoint
Access all datasets in our catalog through:
https://dadosabertos.artesp.sp.gov.br/catalog.{format}
where {format} can be xml, ttl, n3, or jsonld
Parameters:
page={number}
- For pagination (default: 1)modified_since={ISO-date}
- Filter datasets modified since a specific dateq={query}
- Search query to filter datasets
Example: https://dadosabertos.artesp.sp.gov.br/catalog.xml?page=2&modified_since=2023-01-01
Individual Dataset Endpoints
Access metadata for a specific dataset:
https://dadosabertos.artesp.sp.gov.br/dataset/{dataset-id}.{format}
where {format} can be xml, ttl, n3, or jsonld
Example: https://dadosabertos.artesp.sp.gov.br/dataset/acidentes.xml
Content Negotiation
Our portal also supports content negotiation, allowing clients to request specific formats using HTTP Accept headers:
application/rdf+xml
for RDF/XML formattext/turtle
for Turtle formattext/n3
for N3 formatapplication/ld+json
for JSON-LD format
Example using curl: curl -H "Accept: text/turtle" https://dadosabertos.artesp.sp.gov.br/dataset/rodovias-concedidas
DCAT Configuration
Our DCAT implementation is configured with the following settings:
- RDF profile: DCAT-AP 3.0
- RDF endpoints enabled
- Content negotiation enabled
- 100 datasets per page configuration
Setting Up a Harvester in CKAN
If you are using CKAN to harvest data from our portal, you can use the CKAN Harvester extension. Here are the basic steps:
- Install the ckanext-harvest extension in your CKAN instance
- Configure the harvester to use either the CKAN harvester (for CKAN-to-CKAN harvesting) or the DCAT RDF harvester (for harvesting via our DCAT endpoints)
- Create a new harvest source pointing to our portal URL
- Configure the harvester with appropriate options (frequency, filters, etc.)
- Start the harvesting process
Example Configuration for DCAT RDF Harvester
When setting up a DCAT RDF harvester, you can use this configuration:
{ "rdf_format": "xml", "profiles": ["euro_dcat_ap_3"], "default_extras": { "harvest_source_title": "ARTESP Open Data Portal", "harvest_source_url": "https://dadosabertos.artesp.sp.gov.br/" } }
Need Help?
If you encounter any issues while setting up harvesting from our portal, please contact us for assistance.
Last updated: June 2025