Introducing our Snowflake Data Cloud Native Application: AI-Driven Data Quality built into SQL statements! Learn More

Full Dataset Matching API

Automate and integrate AI-driven matching capabilities via an API call for an entire dataset, file, or database table

Overview

This explains how to leverage our high-performance, parallel-processing cloud architecture to run and/or automate the running of matching jobs for datasets, files, or database tables, all with a single API call. Not only can you quickly identify and matching inconsistent and duplicate data quickly and easily, you can also schedule processing, add to business processes, match & merge multiple datasets, and integrate into ETL/ELT processes.

Automation

Schedule and integrate matching jobs via API into your ETL/ELT processes, workflows, devops, or other operations, constantly gauging levels of data quality across all of your data assets.

Multiple Data Sources

Support for various dataset formats, including local files, cloud files, the most popular database platforms, and othe data sources.

Single Command

Execute or schedule powerful, highly-performant, API-driven matching capabilities with a single HTTP API request.

How It Works

The matching process is initiated via API using an HTTP request "query string", which can be embedded into any process, batch file, scheduler, or scripted series of commands, including from a browser address bar, a command line, using cURL, and any other method that enables an API call.

CSV Data Source Example: Matching Organization Names (Match Report)
Description:

Generates a match report of matching/inconsistent organization names from the first column in a CSV file. The similar organization names are clustered together in groups in the report. The source file is a URL as the sample file is stored on AWS S3. Since it is a recognized sample file, no API key is necessary. It can be run as-is by cutting and pasting into a URL address bar in your browser. This same call can be used with your own data in CSV file format.

API Call:

https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=CSV&connection=https://dl.interzoid.com/csv/companies.csv&table=CSV&column=1&process=matchreport&category=company&html=true

API Parameters

Use these parameters in your HTTP query string to configure your API call.

Parameter Description Required
function Use 'match' for performing matching capabilities. Required
process Processing type for match processing. Options:
  • matchreport (default): Generate a Match Report
  • keysonly: Generate a similarity key for every value in the dataset
  • gensql: Generate SQL Insert statements for inserting similarity keys into a database
  • createtable: Create a new table in an existing database containing similarity keys
Required
category Indicates which set of AI-driven matching algorithms to use for the target data to be analyzed. Options:
  • company: For matching company names
  • individual: For matching individual names
  • address: For matching addresses
Required
apikey Your API Key. Login to www.interzoid.com to obtain one. New users can register at www.interzoid.com/register-api-account Required
source Source of data, such as 'CSV', 'Snowflake', 'Postgres', etc. See the Supported Data Sources section for the full list. Required
connection Connection string to access database, or for CSV/TSV files, the full URL of the file location. Required
table Table name to access the source data. Use "CSV" or "TSV" for delimited text files. Required
column Column name within the table to access the source data. For CSV or TSV files, use a number starting from 1 (leftmost column). Required
reference An additional column from the source table to display in the output results, such as a primary key. For text files, this is also a number. Optional
newtable The name of the new table if the output results are written to a new SQL table. Optional
json Set to true (&json=true) to display the output formatted as JSON. Optional
html Set to true (&html=true) to pad line breaks into the output results for better readability in a browser when run from the address bar. Optional

Data Source Connection Strings

Values to use for the API source and connection parameters

Source Value Description Connection String Value Example
snowflake Account/Warehouse connection
user:password@zwa55555/database/schema
postgres PostgreSQL, AWS/RDS/Aurora Postgres, Google Cloud SQL, ElephantSQL, CockroachDB, etc.
postgres://user:password@domain/database?sslmode=disable
mysql MySQL, MariaDB, SkySQL, AWS/RDS/Aurora MySQL, Google Cloud SQL, etc.
root:password@tcp(domain)/database
databricks SQL Warehouse example
token:dapi1ab2c34defabc567890123d4efa56789@dbc-a1b2345c-d6e7.cloud.databricks.com:443/sql/1.0/endpoints/a1b234c5678901d2
sqlserver Microsoft, Azure, and AWS SQL Server
server={servername}.database.windows.net;user id={youruserid};password={yourpassword};port=1433;database=mysample;
csv URL path of CSV file
https://www.mywebaddress.com/files/myfile.csv
tsv URL path of TSV file
https://www.mywebaddress.com/files/myfile.tsv
excel URL path of Excel file
https://www.mywebaddress.com/files/myfile.xlsx
parquet URL path of Parquet file
https://www.mywebaddress.com/files/myfile.parquet

Note: to process and analyze local files, use our browser-based wizard.

Running with cURL Example

You can run the command from a Linux, Windows, or macOS command line using cURL:

Linux & Mac

$ curl 'https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=CSV&connection=https://dl.interzoid.com/csv/companies.csv&table=CSV&column=1&process=matchreport&category=company'

Windows

curl "https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=CSV&connection=https://dl.interzoid.com/csv/companies.csv&table=CSV&column=1&process=matchreport&category=company"

Redirecting Output

Output from these curl commands can be redirected to output files for further processing using the greater-than symbol in both Linux & Windows.

Linux & Mac

$ curl '[HTTP query string]' > output.csv

Windows

curl "[HTTP query string]" > output.csv

Other File Source Examples

Additional examples where a file is the data source that will be used for data matching.

CSV Data Source Example: Matching Individual Names (Match Report)
Description:

Generates a match report of individual names from the first column in a CSV file. The similar individual names are clustered together in groups in the report. The source file is a URL as the sample file is stored on AWS S3. Since it is a recognized sample file, no API key is necessary. It can be run as-is by cutting and pasting into a URL address bar in your browser. This same call can be used with your own data in CSV file format.

API Call:

https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=CSV&connection=https://dl.interzoid.com/csv/peoplenames.csv&table=CSV&column=1&category=individual&process=matchreport&html=true

CSV Data Source Example: Matching Street Addresses (Match Report)
Description:

Generates a match report of street addresses from the first column in a CSV file. The similar street addresses are clustered together in groups in the report. The source file is a URL as the sample file is stored on AWS S3. Since it is a recognized sample file, no API key is necessary. It can be run as-is by cutting and pasting into a URL address bar in your browser. This same call can be used with your own data in CSV file format.

API Call:

https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=CSV&connection=https://dl.interzoid.com/csv/addresses.csv&table=CSV&column=1&category=address&process=matchreport&html=true

CSV Data Source Example: Company/Organization Name Similarity Keys (Append)
Description:

Generates a match report of matching/inconsistent organization names from the first column in a CSV file. The similar organization names are clustered together in groups in the report. The source file is a URL as the sample file is stored on AWS S3. Since it is a recognized sample file, no API key is necessary. It can be run as-is by cutting and pasting into a URL address bar in your browser. This same call can be used with your own data in CSV file format.

API Call:

https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=CSV&connection=https://dl.interzoid.com/csv/companies.csv&table=CSV&column=1&process=keysonly &category=company&html=true

Connecting to Cloud SQL Data Tables

Snowflake Data Source Example: Matching Company/Organization Names (Match Report)
Description:

Generates a match report of matching/inconsistent organization names from the specified column in a Snowflake database table. The similar organization names are clustered together in groups in the report. Using a native Snowflake driver, a connection is achieved with the provided Snowflake connection string. Upon connecting, the specified Snowflake table will be traversed, generating similarity keys on the fly. After scanning the entire table (or view), the match report will be generated. This is a cURL example using sample Snowflake values.

API Call:

curl "https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=snowflake&connection=username:password@account/database/schema&table=companies&column=company&category=company&process=matchreport"

Snowflake Data Source Example: Company/Organization Name Similarity Keys (Generate SQL)
Description:

Generates a similarity key and Insert SQL command for every record in a specified column in a Snowflake database table. Using a native Snowflake driver, a connection is achieved with the provided Snowflake connection string. Upon connecting, the specified Snowflake table will be traversed, generating similarity keys on the fly. For each of these, an Insert SQL statement is generated. After scanning the entire table (or view), the entire output of Insert SQL statements is available for review, and then can be executed if desired within Snowflake. This is a cURL example using sample Snowflake values.

API Call:

curl "https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=snowflake&connection=username:password@account/database/schema&table=companies&column=company&category=company&process=gensql"

Snowflake Data Source Example: Company/Organization Name Similarity Keys (Create Table & Insert SQL)
Description:

A new Snowflake table is created using the name in the 'newtable' parameter. A similarity key is generated and an Insert SQL command issued for every record in a specified column in a Snowflake database table. Using a native Snowflake driver, a connection is achieved with the provided Snowflake connection string. Upon connecting, the specified Snowflake table will be traversed, generating and inserting similarity keys on the fly within the newly created Snowflake table for use as desired within Snowflake. This is a cURL example using sample Snowflake values.

API Call:

curl "https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=snowflake&connection=username:password@account/database/schema&table=companies&column=company&category=company&process=createtable&newtable=mytablename"

Azure SQL Data Source Example: Matching Company/Organization Names (Match Report)
Description:

Generates a match report of matching/inconsistent organization names from the specified column in an Azure SQL/SQL Server database table. The similar organization names are clustered together in groups in the report. Using a native SQL Server driver, a connection is achieved with the provided SQL Server connection string. Upon connecting, the specified Azure SQL table will be traversed, generating similarity keys on the fly. After scanning the entire table (or view), the match report will be generated.

API Call:

https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=sqlserver&connection=your-specific-connection-string&table=companies&column=company&process=matchreport&category=company

AWS RDS Data Source Example: Matching Company/Organization Names (Match Report)
Description:

Generates a match report of matching/inconsistent organization names from the specified column in an AWS RDS/Aurora database table using Postgres. The similar organization names are clustered together in groups in the report. Using a native Postgres driver, a connection is achieved with the provided SQL Server connection string. Upon connecting, the specified AWS Postgres table will be traversed, generating similarity keys on the fly. After scanning the entire table (or view), the match report will be generated.

API Call:

https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=postgres&connection=your-specific-connection-string&table=companies&column=company&process=matchreport&category=company

Google Cloud SQL Data Source Example: Matching Company/Organization Names (Match Report)
Description:

Generates a match report of matching/inconsistent organization names from the specified column in a Google Cloud SQL database table using Postgres (you can also use MySQL). The similar organization names are clustered together in groups in the report. Using a native Postgres driver, a connection is achieved with the provided Google Cloud SQL Postgres connection string. Upon connecting, the specified Google Cloud SQL table will be traversed, generating similarity keys on the fly. After scanning the entire table (or view), the match report will be generated.

API Call:

https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=postgres&connection=your-specific-connection-string&table=companies&column=company&process=matchreport&category=company

PostgreSQL Data Source Example: Matching Company/Organization Names (Match Report)
Description:

Generates a match report of matching/inconsistent organization names from the specified column in a PostgreSQL database table. The similar organization names are clustered together in groups in the report. Using a native Postgres driver, a connection is achieved with the provided PostgreSQL connection string. Upon connecting, the specified PostgreSQL table will be traversed, generating similarity keys on the fly. After scanning the entire table (or view), the match report will be generated.

API Call:

https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=postgres&connection=your-specific-connection-string&table=companies&column=company&process=matchreport&category=company

MySQL Data Source Example: Matching Company/Organization Names (Match Report)
Description:

Generates a match report of matching/inconsistent organization names from the specified column in a MySQL database table. The similar organization names are clustered together in groups in the report. Using a native MySQL driver, a connection is achieved with the provided MySQL connection string. Upon connecting, the specified MySQL table will be traversed, generating similarity keys on the fly. After scanning the entire table (or view), the match report will be generated.

API Call:

https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=mysql&connection=your-specific-connection-string&table=companies&column=company&process=matchreport&category=company