Overview
This explains how to leverage our high-performance, parallel-processing cloud architecture to run and/or automate the running of matching jobs for datasets, files, or database tables, all with a single API call. Not only can you quickly identify and matching inconsistent and duplicate data quickly and easily, you can also schedule processing, add to business processes, match & merge multiple datasets, and integrate into ETL/ELT processes.
Automation
Schedule and integrate matching jobs via API into your ETL/ELT processes, workflows, devops, or other operations, constantly gauging levels of data quality across all of your data assets.
Multiple Data Sources
Support for various dataset formats, including local files, cloud files, the most popular database platforms, and othe data sources.
Single Command
Execute or schedule powerful, highly-performant, API-driven matching capabilities with a single HTTP API request.
How It Works
The matching process is initiated via API using an HTTP request "query string", which can be embedded into any process, batch file, scheduler, or scripted series of commands, including from a browser address bar, a command line, using cURL, and any other method that enables an API call.
CSV Data Source Example: Matching Organization Names (Match Report)
Description:
Generates a match report of matching/inconsistent organization names from the first column in a CSV file. The similar organization names are clustered together in groups in the report. The source file is a URL as the sample file is stored on AWS S3. Since it is a recognized sample file, no API key is necessary. It can be run as-is by cutting and pasting into a URL address bar in your browser. This same call can be used with your own data in CSV file format.
API Call:
https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=CSV&connection=https://dl.interzoid.com/csv/companies.csv&table=CSV&column=1&process=matchreport&category=company&html=true
API Parameters
Use these parameters in your HTTP query string to configure your API call.
Parameter | Description | Required |
---|---|---|
function |
Use 'match' for performing matching capabilities. | Required |
process |
Processing type for match processing. Options:
|
Required |
category |
Indicates which set of AI-driven matching algorithms to use for the target data to be
analyzed. Options:
|
Required |
apikey |
Your API Key. Login to www.interzoid.com to obtain one. New users can register at www.interzoid.com/register-api-account | Required |
source |
Source of data, such as 'CSV', 'Snowflake', 'Postgres', etc. See the Supported Data Sources section for the full list. | Required |
connection |
Connection string to access database, or for CSV/TSV files, the full URL of the file location. | Required |
table |
Table name to access the source data. Use "CSV" or "TSV" for delimited text files. | Required |
column |
Column name within the table to access the source data. For CSV or TSV files, use a number starting from 1 (leftmost column). | Required |
reference |
An additional column from the source table to display in the output results, such as a primary key. For text files, this is also a number. | Optional |
newtable |
The name of the new table if the output results are written to a new SQL table. | Optional |
json |
Set to true (&json=true ) to display the output formatted as JSON. |
Optional |
html |
Set to true (&html=true ) to pad line breaks into the output results for better
readability in a browser when run from the address bar.
|
Optional |
Data Source Connection Strings
Values to use for the API source and connection parameters
Source Value | Description | Connection String Value Example |
---|---|---|
snowflake |
Account/Warehouse connection |
user:password@zwa55555/database/schema
|
postgres |
PostgreSQL, AWS/RDS/Aurora Postgres, Google Cloud SQL, ElephantSQL, CockroachDB, etc. |
postgres://user:password@domain/database?sslmode=disable
|
mysql |
MySQL, MariaDB, SkySQL, AWS/RDS/Aurora MySQL, Google Cloud SQL, etc. |
root:password@tcp(domain)/database
|
databricks |
SQL Warehouse example |
token:dapi1ab2c34defabc567890123d4efa56789@dbc-a1b2345c-d6e7.cloud.databricks.com:443/sql/1.0/endpoints/a1b234c5678901d2
|
sqlserver |
Microsoft, Azure, and AWS SQL Server |
server={servername}.database.windows.net;user id={youruserid};password={yourpassword};port=1433;database=mysample;
|
csv |
URL path of CSV file |
https://www.mywebaddress.com/files/myfile.csv
|
tsv |
URL path of TSV file |
https://www.mywebaddress.com/files/myfile.tsv
|
excel |
URL path of Excel file |
https://www.mywebaddress.com/files/myfile.xlsx
|
parquet |
URL path of Parquet file |
https://www.mywebaddress.com/files/myfile.parquet
|
Note: to process and analyze local files, use our browser-based wizard.
Running with cURL Example
You can run the command from a Linux, Windows, or macOS command line using cURL:
Linux & Mac
$ curl 'https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=CSV&connection=https://dl.interzoid.com/csv/companies.csv&table=CSV&column=1&process=matchreport&category=company'
Windows
curl "https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=CSV&connection=https://dl.interzoid.com/csv/companies.csv&table=CSV&column=1&process=matchreport&category=company"
Redirecting Output
Output from these curl commands can be redirected to output files for further processing using the greater-than symbol in both Linux & Windows.
Linux & Mac
$ curl '[HTTP query string]' > output.csv
Windows
curl "[HTTP query string]" > output.csv
Other File Source Examples
Additional examples where a file is the data source that will be used for data matching.
CSV Data Source Example: Matching Individual Names (Match Report)
Description:
Generates a match report of individual names from the first column in a CSV file. The similar individual names are clustered together in groups in the report. The source file is a URL as the sample file is stored on AWS S3. Since it is a recognized sample file, no API key is necessary. It can be run as-is by cutting and pasting into a URL address bar in your browser. This same call can be used with your own data in CSV file format.
API Call:
https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=CSV&connection=https://dl.interzoid.com/csv/peoplenames.csv&table=CSV&column=1&category=individual&process=matchreport&html=true
CSV Data Source Example: Matching Street Addresses (Match Report)
Description:
Generates a match report of street addresses from the first column in a CSV file. The similar street addresses are clustered together in groups in the report. The source file is a URL as the sample file is stored on AWS S3. Since it is a recognized sample file, no API key is necessary. It can be run as-is by cutting and pasting into a URL address bar in your browser. This same call can be used with your own data in CSV file format.
API Call:
https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=CSV&connection=https://dl.interzoid.com/csv/addresses.csv&table=CSV&column=1&category=address&process=matchreport&html=true
CSV Data Source Example: Company/Organization Name Similarity Keys (Append)
Description:
Generates a match report of matching/inconsistent organization names from the first column in a CSV file. The similar organization names are clustered together in groups in the report. The source file is a URL as the sample file is stored on AWS S3. Since it is a recognized sample file, no API key is necessary. It can be run as-is by cutting and pasting into a URL address bar in your browser. This same call can be used with your own data in CSV file format.
API Call:
https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=CSV&connection=https://dl.interzoid.com/csv/companies.csv&table=CSV&column=1&process=keysonly &category=company&html=true
Connecting to Cloud SQL Data Tables
Snowflake Data Source Example: Matching Company/Organization Names (Match Report)
Description:
Generates a match report of matching/inconsistent organization names from the specified column in a Snowflake database table. The similar organization names are clustered together in groups in the report. Using a native Snowflake driver, a connection is achieved with the provided Snowflake connection string. Upon connecting, the specified Snowflake table will be traversed, generating similarity keys on the fly. After scanning the entire table (or view), the match report will be generated. This is a cURL example using sample Snowflake values.
API Call:
curl "https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=snowflake&connection=username:password@account/database/schema&table=companies&column=company&category=company&process=matchreport"
Snowflake Data Source Example: Company/Organization Name Similarity Keys (Generate SQL)
Description:
Generates a similarity key and Insert SQL command for every record in a specified column in a Snowflake database table. Using a native Snowflake driver, a connection is achieved with the provided Snowflake connection string. Upon connecting, the specified Snowflake table will be traversed, generating similarity keys on the fly. For each of these, an Insert SQL statement is generated. After scanning the entire table (or view), the entire output of Insert SQL statements is available for review, and then can be executed if desired within Snowflake. This is a cURL example using sample Snowflake values.
API Call:
curl "https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=snowflake&connection=username:password@account/database/schema&table=companies&column=company&category=company&process=gensql"
Snowflake Data Source Example: Company/Organization Name Similarity Keys (Create Table & Insert SQL)
Description:
A new Snowflake table is created using the name in the 'newtable' parameter. A similarity key is generated and an Insert SQL command issued for every record in a specified column in a Snowflake database table. Using a native Snowflake driver, a connection is achieved with the provided Snowflake connection string. Upon connecting, the specified Snowflake table will be traversed, generating and inserting similarity keys on the fly within the newly created Snowflake table for use as desired within Snowflake. This is a cURL example using sample Snowflake values.
API Call:
curl "https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=snowflake&connection=username:password@account/database/schema&table=companies&column=company&category=company&process=createtable&newtable=mytablename"
Azure SQL Data Source Example: Matching Company/Organization Names (Match Report)
Description:
Generates a match report of matching/inconsistent organization names from the specified column in an Azure SQL/SQL Server database table. The similar organization names are clustered together in groups in the report. Using a native SQL Server driver, a connection is achieved with the provided SQL Server connection string. Upon connecting, the specified Azure SQL table will be traversed, generating similarity keys on the fly. After scanning the entire table (or view), the match report will be generated.
API Call:
https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=sqlserver&connection=your-specific-connection-string&table=companies&column=company&process=matchreport&category=company
AWS RDS Data Source Example: Matching Company/Organization Names (Match Report)
Description:
Generates a match report of matching/inconsistent organization names from the specified column in an AWS RDS/Aurora database table using Postgres. The similar organization names are clustered together in groups in the report. Using a native Postgres driver, a connection is achieved with the provided SQL Server connection string. Upon connecting, the specified AWS Postgres table will be traversed, generating similarity keys on the fly. After scanning the entire table (or view), the match report will be generated.
API Call:
https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=postgres&connection=your-specific-connection-string&table=companies&column=company&process=matchreport&category=company
Google Cloud SQL Data Source Example: Matching Company/Organization Names (Match Report)
Description:
Generates a match report of matching/inconsistent organization names from the specified column in a Google Cloud SQL database table using Postgres (you can also use MySQL). The similar organization names are clustered together in groups in the report. Using a native Postgres driver, a connection is achieved with the provided Google Cloud SQL Postgres connection string. Upon connecting, the specified Google Cloud SQL table will be traversed, generating similarity keys on the fly. After scanning the entire table (or view), the match report will be generated.
API Call:
https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=postgres&connection=your-specific-connection-string&table=companies&column=company&process=matchreport&category=company
PostgreSQL Data Source Example: Matching Company/Organization Names (Match Report)
Description:
Generates a match report of matching/inconsistent organization names from the specified column in a PostgreSQL database table. The similar organization names are clustered together in groups in the report. Using a native Postgres driver, a connection is achieved with the provided PostgreSQL connection string. Upon connecting, the specified PostgreSQL table will be traversed, generating similarity keys on the fly. After scanning the entire table (or view), the match report will be generated.
API Call:
https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=postgres&connection=your-specific-connection-string&table=companies&column=company&process=matchreport&category=company
MySQL Data Source Example: Matching Company/Organization Names (Match Report)
Description:
Generates a match report of matching/inconsistent organization names from the specified column in a MySQL database table. The similar organization names are clustered together in groups in the report. Using a native MySQL driver, a connection is achieved with the provided MySQL connection string. Upon connecting, the specified MySQL table will be traversed, generating similarity keys on the fly. After scanning the entire table (or view), the match report will be generated.
API Call:
https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=mysql&connection=your-specific-connection-string&table=companies&column=company&process=matchreport&category=company