Introducing our Snowflake Data Cloud Native Application: AI-Driven Data Quality built into SQL statements! Learn More

AI-Powered Batch File Processing API

Automate and integrate AI-driven data validation and data enrichment capabilities via an API call for an entire dataset or file

Overview

This explains how to leverage our high-performance, parallel-processing cloud architecture to run and/or automate the running of API data validation and data enrichment jobs for datasets and files, all with a single API call. Not only can you validate and enrich data quickly and easily, you can also schedule processing, add to business processes, enrich multiple datasets, and integrate into ETL/ELT processes.

Automation

Schedule and integrate validation and enrichment jobs via API into your ETL/ELT processes, workflows, devops, or other operations, constantly gauging levels of data quality and data value across all of your data assets.

Multiple Data Sources

Support for various dataset formats, including local files, cloud files, and other data sources.

Single Command

Execute or schedule powerful, highly-performant, API-driven data enrichment and validation capabilities with a single HTTP API request.

How It Works

The data processing is initiated via API using an HTTP request "query string", which can be embedded into any process, batch file, scheduler, or scripted series of commands, including from a browser address bar, a command line, using cURL, and any other method that enables an API call.

CSV Data Source Example: Validating and Enriching Email Addresses
Description:

The API provides validation information for email addresses to aid in deliverability and to prevent sending email to bad addresses. It also offers additional demographic and descriptive data useful for marketing, personalization and segmentation purposes. Examples include descriptions, revenue, number of employees, Twitter/X handles, location data, generic/disposable email indicators, and more. The source file is a URL as the sample file is stored on AWS S3.

Example API Call:
Try it out with Curl from the command line.

curl "https://connect.interzoid.com/run?function=email-info&apikey=your-api-key&source=csv&connection="https://your-file-location"&table=csv&column=1"

API Parameters

Use these parameters in your HTTP query string/API call.

Parameter Description Required
function Which function (API) to call for the processing. Options:
  • email-info: Validate and enrich email addresses
  • city-standard: Standardize city names, including globally
  • state-standard: Standardize US state and Canadian province names
  • country-standard: Standardize country names
  • country-standard-info: Standardize country names with additional enrichment data
  • phone-info: Append and enrich global phone numbers with location data
  • entity-type: Determine/label entity type of data (company, person, location, email, etc.)
  • name-origin: Determine likely country of origin based on name
  • gender: Determine likely gender based on name
  • translate-to-english: Translate column data from any language to English
  • translate-to-any: Translate column data from any language to any target language
Required
apikey Your API Key. Login to www.interzoid.com to obtain one. New users can register at www.interzoid.com/register-api-account Required
source Source of delimited data file, either 'CSV' or 'TSV'. Required
connection Location of CSV or TSV file - the full URL of the raw file location (S3, Azure Storage, Google Storage, Github, etc.). Required
table Table name to access the source data. Use "CSV", "TSV", etc. for delimited text files. Required
column Column number for CSV or TSV files, starting with 1 for the leftmost column (or only column) of a file. Required
reference An additional column from the source file to display in the output results, such as a primary key. This is also a number. Optional
target The target text file delimited format for output, such as "CSV" (comma-delimited) or "TSV" (tab-delimited). Default is CSV. Optional
showall Set to true (&showall=true) to output all source columns with the new columns appended to the right. Optional

Supported Data Sources: Connection Strings

Values to use for the API source and connection parameters

Source Value Description Connection String Value Example
csv URL path of CSV file
https://www.mywebaddress.com/files/myfile.csv
tsv URL path of TSV file
https://www.mywebaddress.com/files/myfile.tsv

Running with cURL Example

You can run the command from a Linux, Windows, or macOS command line using cURL:

Linux & Mac

curl 'https://connect.interzoid.com/run?function=email-info&apikey=your-api-key&source=csv&connection="https://your-file-location"&table=csv&column=1'

Windows

curl "https://connect.interzoid.com/run?function=email-info&apikey=your-api-key&source=csv&connection="https://your-file-location"&table=csv&column=1

Redirecting Output

Output from these curl commands can be redirected to output files for further processing using the greater-than symbol in both Linux & Windows.

Linux & Mac

$ curl '[HTTP query string]' > output.csv

Windows

curl "[HTTP query string]" > output.csv

Examples

Here is an additional example demonstrating a batch API call and mass data processing from file sources.

TSV Data Source Example: Appending Phone Geographic data
Description:

This API analyzes international phone numbers and provides corresponding geographic information. The API uses the fourth column in the tab-delimited file as the global phone number to use as the basis of the analysis and geographic information discovery. Note that the ShowAll flag is also true, meaning all the columns in the input source file will be included in the output results file with the new columns appended to it.

API Call:

curl "https://connect.interzoid.com/run?function=phone-info&apikey=your-api-key&source=tsv&connection=https://your-file-location.tsv&table=tsv&column=4&showall=true"

Need help with your query/API call or other batch/bulk processing? Contact us at support@interzoid.com