Harvesting

This page provides information on how to harvest data from the ARTESP Open Data Portal. Harvesting allows you to automatically collect and synchronize datasets from our portal to your own systems. We offer multiple methods for harvesting data, making it easy to integrate our open datasets into your applications, analysis tools, or other data platforms.

What is Harvesting?

Harvesting is the process of automatically collecting metadata and data from one data portal to another. It allows organizations and individuals to keep a local copy of datasets that are synchronized with the original source. This is particularly useful for:

  • Creating federated or aggregated data catalogs
  • Building applications that need regular data updates
  • Integrating open data into your own systems
  • Performing analysis across multiple datasets

Available Harvesting Methods

DCAT RDF Endpoints

Our portal supports the Data Catalog Vocabulary (DCAT) standard, which provides a framework for describing datasets in a catalog. We offer the following DCAT endpoints:

Catalog Endpoint

Access all datasets in our catalog through:

  • https://dadosabertos.artesp.sp.gov.br/catalog.{format} where {format} can be xml, ttl, n3, or jsonld

Parameters:

  • page={number} - For pagination (default: 1)
  • modified_since={ISO-date} - Filter datasets modified since a specific date
  • q={query} - Search query to filter datasets

Example: https://dadosabertos.artesp.sp.gov.br/catalog.xml?page=2&modified_since=2023-01-01

Individual Dataset Endpoints

Access metadata for a specific dataset:

  • https://dadosabertos.artesp.sp.gov.br/dataset/{dataset-id}.{format} where {format} can be xml, ttl, n3, or jsonld

Example: https://dadosabertos.artesp.sp.gov.br/dataset/acidentes.xml

Content Negotiation

Our portal also supports content negotiation, allowing clients to request specific formats using HTTP Accept headers:

  • application/rdf+xml for RDF/XML format
  • text/turtle for Turtle format
  • text/n3 for N3 format
  • application/ld+json for JSON-LD format

Example using curl: curl -H "Accept: text/turtle" https://dadosabertos.artesp.sp.gov.br/dataset/rodovias-concedidas

DCAT Configuration

Our DCAT implementation is configured with the following settings:

  • RDF profile: DCAT-AP 3.0
  • RDF endpoints enabled
  • Content negotiation enabled
  • 100 datasets per page configuration

Setting Up a Harvester in CKAN

If you are using CKAN to harvest data from our portal, you can use the CKAN Harvester extension. Here are the basic steps:

  1. Install the ckanext-harvest extension in your CKAN instance
  2. Configure the harvester to use either the CKAN harvester (for CKAN-to-CKAN harvesting) or the DCAT RDF harvester (for harvesting via our DCAT endpoints)
  3. Create a new harvest source pointing to our portal URL
  4. Configure the harvester with appropriate options (frequency, filters, etc.)
  5. Start the harvesting process

Example Configuration for DCAT RDF Harvester

When setting up a DCAT RDF harvester, you can use this configuration:

{
  "rdf_format": "xml",
  "profiles": ["euro_dcat_ap_3"],
  "default_extras": {
    "harvest_source_title": "ARTESP Open Data Portal",
    "harvest_source_url": "https://dadosabertos.artesp.sp.gov.br/"
  }
}
  

Need Help?

If you encounter any issues while setting up harvesting from our portal, please contact us for assistance.

Last updated: June 2025