In addition to being able to interactively identify matches of inconsistent, duplicate/redundant data of a text file or database table using this Cloud Connect application, you can now also automate the running of these matching jobs so they can be scheduled, added to any business processes or workflow, matched & merged with multiple datasets, or be part of a data pipeline in ETL/ELT processes. This is a powerful capability that can be delivered with a single command.
This is achieved via an HTTP request "query string", which can then be embedded directly into any process, batch file, scheduler, or series of commands.
For example, the following match process can be tested against our demo company name file (CSV source, no credits used). Just put the following URL in your browser address bar and hit enter. You will see a CSV file with a column of company names clustered and sorted by the algorithmically generated similarity keys:
https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=CSV&connection=https://dl.interzoid.com/csv/companies.csv&table=CSV&column=1&process=matchreport&category=company&html=true
You can also run this command from a Linux, Windows, or Macintosh command line using "Curl" (must use double quotes within Curl on Windows). Curl (also known as cURL) is a command line HTTP client tool that is generally available by default on most computers:
Linux & Mac
$ curl 'https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=CSV&connection=https://dl.interzoid.com/csv/companies.csv&table=CSV&column=1&process=matchreport&category=company'
Windows
> curl "https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=CSV&connection=https://dl.interzoid.com/csv/companies.csv&table=CSV&column=1&process=matchreport&category=company"
Output from these curl commands can be redirected to output files for further processing using the greater-than symbol in both Linux & Windows.
Linux & Mac
$ curl '[HTTP query string]' > output.csv
Windows
> curl "[HTTP query string]" > output.csv
Here are some examples of using the same HTTP query string to match an entire database table of company names. See more about connection strings.
(Snowflake example) https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=Snowflake&connection=your-specific-connection-string&table=companies&column=company&process=matchreport&category=company
(Azure SQL example) https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=azure sql&connection=your-specific-connection-string&table=companies&column=company&process=matchreport&category=company
(AWS RDS example) https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=aws rds postgres&connection=your-specific-connection-string&table=companies&column=company&process=matchreport&category=company
(Google Cloud SQL example) https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=postgres&connection=your-specific-connection-string&table=companies&column=company&process=matchreport&category=company
(Postgres example) https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=postgres&connection=your-specific-connection-string&table=companies&column=company&process=matchreport&category=company
(MySQL example) https://connect.interzoid.com/run?function=match&apikey=use-your-own-api-key-here&source=mysql&connection=your-specific-connection-string&table=companies&column=company&process=matchreport&category=company
Parameters specific to Data Matching to be set as part of the HTTP query string:
function Required. Use 'match' for data matching.
process Required. The process defines the report or action that will occur with the dataset. Process types
available are 'matchreport', 'keysonly', 'gensql', and 'createtable'. A 'match report' will generate
a report of all found clusters of similar data. The 'keysonly' value outputs a generated similarity key
for every record in the dataset, whereas 'gensql' is similar however it generates the SQL INSERT
statements to store the similarity keys in a database. The 'createtable' value will actually create
a new table in the source database with all of the similarity keys for each record in the source table
so they can be used for additional queries.
category Required. This category type indicates which set of Machine Learning and matching algorithms
to make use of based on type of data content. Use 'company','individual',or 'address'.
Additional parameters that can set as part of the HTTP query string:
apikey Required. Login to www.interzoid.com to obtain your API Key. It is how we track and manage usage.
If you do not yet have one, register at www.interzoid.com/register-api-account
source Required. Source of data, such as 'CSV', 'Snowflake', 'Postgres', etc.
See source list on interactive page for entire list.
connection Required. Connection string to access database, or in the case of a CSV or TSV file,
use the full URL of the location of the file.
table Required. Table name to access the source data. Use "CSV" or "TSV" for delimited text files.
column Required. Column name within the table to access the source data. This is a number for CSV or TSV files,
starting with number 1 from the left side of the file.
reference An additional column from the source table to display in the output results, such as a primary key.
newtable The name of the new table if the output results are written to a new table.
json Set to true (&json=true) to display the output formatted as JSON.
html Set to true (&html=true) to pad line breaks into the output results for better readability in
a browser when run from the address bar.
Also see our quick and easy Data Matching Tutorial.
Questions? Contact support@interzoid.com - we are happy to help.
All content (c) 2018-2023 Interzoid Incorporated. Questions? Contact support@interzoid.com
201 Spear Street, Suite 1100, San Francisco, CA 94105-6164
Interested in Data Cleansing Services?
Start Here
Terms of Service
Privacy Policy
Use the Interzoid Cloud Connect Data Platform and Start to Supercharge your Cloud Data Now: connect.interzoid.com
API Integration Code Examples and SDKs: github.com/interzoid
Documentation and Overview: Docs site
Interzoid Product and Technology Newsletter: Subscribe
Partnership Interest? Inquire