Skip to content

Ingesting Datasets

Enabling Dataset Ingestion

As described in Exploring Data Sources, users can configure automated ingestion of datasets into the Snowflake database from the Dataset Information view. The INGESTION toggle and interval selector at the top of the view provide controls for managing dataset ingestion tasks.

To enable ingestion for a dataset, switch the INGESTION toggle to the ON position and select the desired update frequency—daily, weekly, or monthly. This sets up automated updates of the dataset into a Snowflake table.

If the data source connection has access credentials configured (see Managing Connections), the ingestion task automatically inherits those settings.

Ingestion Duration

Ingestion runs as an asynchronous task and typically completes within a few minutes to several tens of minutes. The duration depends on two main factors:

  1. Data source response times: Many geospatial data sources respond slowly or limit the volume of data returned per request.

  2. Dataset size: Even datasets with few rows can contain complex geometry data. Large row counts further increase processing time.

The process is fully automated and requires no intervention once configured.

Note: Ingestion tasks have a maximum duration of 12 hours. If an ingestion run does not complete within this time, it is automatically stopped and the task is marked as failed. This limit accommodates even the largest datasets and slowest data sources. If a task consistently fails due to timeouts, contact support via the Geo Data Connector listing on Snowflake Marketplace.

Monitoring Ingestion Tasks

Select Tasks from the main menu to open the Ingestion Tasks view, where you can monitor and manage all configured ingestion tasks.

Ingestion Tasks ViewIngestion Tasks View

Ingestion Tasks view showing all configured tasks with status and schedule

Task Details

ColumnDescription
Task NameA unique technical identifier of the ingestion task
TypeThe type of the data source (e.g., WFS)
HealthA colored indicator showing the current connectivity status of the data source used by this task. See Managing Connections for status meanings.
Data SourceThe title of the geospatial data source. Falls back to the source URL for tasks created before this feature.
DatasetThe title of the dataset being ingested. Falls back to the technical layer name for older tasks.
Target TableThe fully qualified name of the target table in Snowflake, following the format <database>.<schema>.<table> — based on the ingestion target database configured in the application settings
StatusThe status of the most recent ingestion run. While ingestion is running, shows the current phase (e.g., "IN PROGRESS - 3/6 Retrieving"). When failed, a tooltip icon shows the reason for the failure. Possible status values: N/A, IN PROGRESS, SUCCESS, FAILED.
Next ETAThe estimated time when the next ingestion run is expected to complete (GMT)
Last RunThe date and time of the most recent ingestion attempt
CreatedThe date and time when the ingestion task was created (GMT)
OnWhether the task is active (ON) or paused (OFF). Click the toggle to switch between states.
ScheduleThe update frequency selected for the task (Daily, Weekly, or Monthly)
ActionsImmediate ingestion triggering and deletion of the ingestion task available via action icons

Actions

ActionDescription
Immediate ingestionTriggers ingestion immediately, regardless of the configured schedule. The status changes to IN PROGRESS until complete.
DeleteRemoves the ingestion task

Task Details and Run History

Click any task row in the Ingestion Tasks view to open the Task Details view. This view shows the task's metadata cards at the top and a full history of ingestion runs below.

Task Details ViewTask Details View

Task Details view with metadata cards and ingestion run history

The metadata cards summarize the task configuration and current state:

CardDescription
Source TypeData source type (e.g., WFS, WMS)
Data SourceTitle of the data source
DatasetTitle of the dataset being ingested
Target TableFully qualified Snowflake target table
StatusCurrent ingestion status with color-coded indicator
Next IngestionEstimated completion time for the next or current run
Last IngestionCompletion time of the most recent run
CreatedWhen the ingestion task was created
StateWhether the task is Enabled or Disabled
ScheduleUpdate frequency (Daily, Weekly, or Monthly)

The Ingestion Task Runs table below shows the history of all ingestion runs for the task:

ColumnDescription
NameRun identifier
Start TimeWhen the run started
End TimeWhen the run completed (N/A if still running)
StatusRun outcome. While running, shows the current phase (e.g., "RUNNING - 3/6 Retrieving"). When failed, a tooltip icon shows the error reason. Possible values: RUNNING, SUCCEEDED, FAILED.
DurationHow long the run took
TriggerWhat started the run (scheduled or manual)
Rows LoadedNumber of rows written to the target table

Note: The run history status values (RUNNING, SUCCEEDED, FAILED) reflect the outcome of each individual ingestion run. These are distinct from the task-level statuses (such as IN PROGRESS, SUCCESS, RETRYING, FAILED) shown in the task list, which reflect the overall state of the task.

Click Refresh to update the run history with the latest data.

Target Table Naming

Each ingested dataset is written to a fully qualified target table in Snowflake, identified by the format:

<database>.<schema>.<table>
ComponentDescription
DatabaseThis is the Connector-wide ingestion target database, configured in the Application Settings. All ingested data is stored within this database.
SchemaThe schema is automatically generated based on the domain of the data source URL (e.g., example.orgorg_example). Each unique domain results in a separate schema, organizing datasets by their source.
TableThe table name is derived from the data source type (e.g., WFS) and the dataset name. To ensure uniqueness—especially when multiple datasets have similar names or come from the same source—a randomized suffix may be appended to the table name, depending on the naming context and source type.

Ingestion Task Statuses

StatusDescription
IN PROGRESSIngestion is currently underway. The status column shows the current execution phase (e.g., "IN PROGRESS - 3/6 Retrieving") so you can track progress through the six phases: Triggering, Retrieving, Processing, Loading, Transforming, and Finalizing.
SUCCESSThe most recent ingestion completed successfully. Data from the source was ingested into the target Snowflake table without errors.
RETRYINGThe most recent ingestion attempt failed, but the Connector is automatically retrying. The task will be retried up to a configured limit before being marked as FAILED.
INACTIVEThe ingestion task exists but has been disabled.
FAILEDThe most recent ingestion attempt did not complete successfully. Hover over the information icon next to the status to see the failure reason, including which phase the error occurred in (e.g., "Failed in phase 2/6 Retrieving: Data retrieval took too long and was stopped.").

Failed Task Handling

Failed tasks are automatically disabled to prevent repeated unsuccessful attempts. To resume, manually re-enable the task from the Ingestion Tasks view or Dataset Details view.

Tasks include built-in retry logic — a task is retried multiple times before being marked as failed. Failures are often caused by temporary outages of the data source service, so re-enabling a failed task usually succeeds on the next run.

Tip: Enable Task Failure email notifications in Application Settings to receive an alert when an ingestion task fails. This way you can respond promptly without needing to check the Tasks view manually.

In some cases, a data source may advertise datasets that cannot actually be retrieved — for example, when the service lists a dataset but the server rejects the data request, requires non-standard query parameters, or has a server-side misconfiguration. The Connector validates data accessibility during ingestion and marks the task as failed if the data cannot be downloaded, rather than leaving the task in a perpetual processing state.

Understanding Failure Patterns

The Task Details view provides run history that can help determine whether a failure is temporary or persistent:

  • A task that previously succeeded but now fails is typically experiencing a temporary data source issue — re-enabling it is usually sufficient.
  • A task that has never succeeded may indicate a persistent problem such as a dataset size limit, incompatible data format, or non-standard service behavior.
  • The Rows Loaded column in the run history indicates whether any data was retrieved before the failure occurred — a value of zero suggests the failure happened early (connectivity, permissions, or size limit), while a non-zero value suggests a mid-transfer issue.

For detailed troubleshooting guidance, see Troubleshooting: Ingestion Task Failed.

Managing Ingestion Tasks

Users can manage ingestion tasks from both the Ingestion Tasks view and the Dataset Details view.

ViewAvailable Actions
Ingestion Tasks viewDisable, re-enable, and delete ingestion tasks
Dataset Details viewDisable, re-enable, and reschedule ingestion tasks

Disable an Ingestion Task

Switch the INGESTION toggle to the OFF position either on the task's row in the ingestion task list or on the corresponding dataset's Dataset Information view.

Note: Disabling the task stops further updates to the dataset but does not remove any data already ingested into the Snowflake target table if the task had completed successfully before disabling.

Re-enable an Ingestion Task

Switch the INGESTION toggle to the ON position either on the task's row in the ingestion task list or on the corresponding dataset's Dataset Information view. The Next Ingestion ETA column shows the estimated completion time of the next scheduled ingestion.

Delete an Ingestion Task

Click the delete icon in the Actions column of the corresponding task's row on the Ingestion Tasks view.

Note: Deleting the task stops further updates, permanently removes the ingestion task, but does not delete any data already ingested into the Snowflake target table if the task had completed successfully before deletion.

Reschedule an Ingestion Task

Change the update frequency using the schedule selector on the Dataset Information view.

Data Source Type Differences

While the ingestion workflow is consistent across all data source types, the output format varies depending on the type of data being ingested. Each traditional OGC service type has a modern OGC API equivalent that produces the same output format.

Vector Data Sources (WFS, OGC API Features)

WFS (Web Feature Service) and OGC API Features provide vector data—features with geometry and attributes. When ingested:

  • Output format: GeoJSON stored in a Snowflake table
  • Schema: Columns match the feature attributes from the source
  • Geometry: Stored as GeoJSON in a dedicated geometry column
  • Querying: Use standard SQL with Snowflake's geospatial functions (see Working with Ingested Data)

Raster/Image Data Sources (WMS, WMTS, WCS, OGC API Maps, OGC API Coverages)

WMS, WMTS, WCS, and their OGC API equivalents (OGC API Maps, OGC API Coverages) provide raster data—map images or coverage data. When ingested, two artifacts are always created:

  1. Pixel data table (primary target table): Contains per-pixel data with spatial coordinates, band values, and GeoJSON geometry. This is the table shown in the Ingestion Target Table column on the Tasks view.
  2. Metadata table: A companion table named with the __metadata suffix (e.g., WMS__MY_LAYER_A1B2C3D4__metadata) containing image properties such as dimensions, coordinate reference system, band information, and geographic extent.

WMS, WMTS, and their OGC API equivalents (OGC API Maps) also produce a third artifact:

  1. Image stage: A Snowflake stage containing the georeferenced image file(s) for direct download.

WCS and OGC API Coverages deliver raw raster data (elevation, temperature, satellite bands) and do not produce an image stage.

For details on querying these tables and accessing image files, see Working with Ingested Data.

WMS (Web Map Service)

WMS delivers rendered map images. Key characteristics:

  • Layer selection: Choose specific map layers to ingest from the dataset list
  • Image format: The Connector requests the best available format (GeoTIFF preferred, then PNG)
  • Coordinate system: Images are georeferenced using the layer's native coordinate reference system
  • Use case: Ideal for basemaps, thematic maps, and visualizations where you need the rendered appearance

WMTS (Web Map Tile Service)

WMTS provides pre-rendered map tiles. Key characteristics:

  • Tiled delivery: Maps are delivered as a grid of tiles at multiple zoom levels
  • Efficient access: Tiles are optimized for fast web display
  • Merged output: Tiles are merged into a single image during ingestion
  • Use case: Ideal for basemaps and services designed for web mapping applications

WCS (Web Coverage Service)

WCS provides raw raster coverage data. Key characteristics:

  • Raw data: Delivers actual raster values (e.g., elevation, temperature, satellite bands)
  • Multi-band support: Can include multiple data bands per coverage
  • Large dataset handling: Datasets exceeding 500 MB are automatically split into smaller parts for reliable processing
  • Maximum dataset size: Datasets exceeding the current maximum size limit cannot be ingested. If a coverage exceeds this limit, the ingestion task will fail immediately. This limit is expected to be raised in a future release. Contact support for assistance with bbox subsetting to ingest a specific geographic region of the coverage.
  • Use case: Ideal for scientific analysis, remote sensing data, and raster analytics

Note: Large WCS and OGC API Coverages datasets may take longer to ingest due to automatic tiling and processing. The ingestion duration shown in the Task Details view accounts for this additional processing time.

OGC API Equivalents

The modern OGC API standards provide the same data types as their traditional counterparts, and produce identical output when ingested:

OGC API ServiceTraditional EquivalentOutput Type
OGC API FeaturesWFSVector (GeoJSON table)
OGC API MapsWMSRaster (pixel table + metadata + image)
OGC API TilesWMTSVector (GeoJSON table)
OGC API CoveragesWCSRaster (pixel table + metadata)

The ingestion workflow, target table format, and all features described above apply equally to both traditional and OGC API data sources.

© 2016-2026 Smart Data Hub Ltd.