Export Data With Data Cloud Ship
NinjaCat’s Data Cloud Ship provides two methods for accessing data collected through NinjaCat's Data Cloud:
BI Connect directly shares the database with the user, fully replicated and synchronized. Whether it's via a reader account or your own, pre-existing Snowflake account you get full access to the database. It is always up-to-date with read-only access.
Data Export allows exporting data on an interval that you control. NinjaCat sends data to you. NinjaCat is not replicating or managing the replication like in BI Connect. You are responsible for managing and monitoring your data.
Accessing Data Cloud Ship
1. After logging in to NinjaCat and creating at least one dataset, click on the "Ship" option under "Data Cloud" in the main navigation
BI Connect
Using BI Connect creates a new Snowflake reader account and stores all data streams for external use by a NinjaCat client.
1. After Accessing Data Cloud Ship click on BI Connect in the top left under "Ship Data"
2. Click the Setup BI Connect button near the center
NinjaNote: After the button is clicked the BI Connect page displays a waiting state. Behind the scenes is the provisioning of a volume to store the new database, the creation of a new database, and the creation of a database access account. A wait time will be displayed on the page.
3. Once setup is complete the basic Connection Details will display on the page
4. Click the link labeled "Set Password" near the center to create a password for the new Snowflake user account
5. Enter the new desired password into the fields, once entered you will be granted access to the Snowflake web interface
6. After gaining access to the Snowflake Reader Account, the Connection Details on the NinjaCat BI Connect page will be more detailed. The full Connection Details are:
- Account - Account Name/ID
- Security Admin - User Name
- Warehouse - Snowflake data storage area name
- Database - Snowflake database name
- Schema - Snowflake database schema name
Also on the Connection Details page is a link to access the Snowflake web interface and a link to reset the Security Admin user password.
NinjaNote: The type of user that gets created by default is a Security Admin. This type of user possesses the permissions necessary for account management. It is not recommended to use the Security Admin user to access data. The best practice would be to create specific users for different purposes. Information on Snowflake user roles is linked here
Creating Amazon & Google Credentials
AWS
Data Cloud Ship can be configured to send CSV files of your datasets to an Amazon S3 bucket. Before you can set up the export in NinjaCat the following steps must be taken:
Create the AWS Bucket - The S3 bucket must be created in AWS before configuration. Any managed folders may be pre-created if desired, but it is not required.
Create an IAM User - Go to the IAM section of the AWS Console and create a new user that does not have console login permissions, but has a role/policy attached. Specify Permissions by attaching one of the following:
1. An attached role of the AWS Managed Policy: AmazonS3FullAccess
(not recommended) or
An attached policy containing minimally
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:ListBucket"
],
"Resource": [
"arn:aws:s3:::target-bucket-name",
"arn:aws:s3:::target-bucket-name/*"
]
}
]
}
2. Review and create the policy
Create an Access Key
1. Once the user has been created, and the policy attached, go back to the main user screen and select the user
2. Then go to the "Security Credentials" tab and create a new access key
3. Select the "other" use case, and then set a name for the access key
Access Credentials
Once created, you can view the AccessKey and SecretKey to add to your NinjaCat DataCloud Ship Credential record.
Google Cloud Storage
Data Cloud Ship can be configured to send CSV files of your datasets to a Google Cloud Storage bucket. Before you can set up the export in NinjaCat the following steps must be taken:
Create the GCS Bucket - The cloud storage bucket must be created in GCP before configuration. Any managed folders may be pre-created if desired, but it is not required.
Create the Service Account - Go to the IAM section of the Google Cloud Console and create a new Service Account with (minimally) the following role against the bucket:
* storage.objectCreator
or * storage.admin
Then a new Access Key in JSON
format must be created for the service account and downloaded.
Once created, you can copy the Access Key to add to your NinjaCat DataCloud Ship Credential record.
Google Big Query
Data Cloud Ship can be configured to transfer your designated datasets to tables in Google Big query. The CSV exports are staged in a Google Cloud Storage bucket and then merged into tables in Big Query. Before you can set up the export in NinjaCat the following steps must be taken:
Create the Google Cloud Storage Bucket - The cloud storage bucket must be created in the Google Cloud Platform before configuration. Any managed folders may be pre-created if desired, but it is not required.
Create the Service Account
Go to the IAM -> Service Accounts section of the Google Cloud Console and create a new Service Account with (minimally) the following role against the bucket:
* Big Query Data Editor
* Big Query Job User
* Storage Admin (`storage.admin`) - or, minimally, Storage Object User
NinjaNote: To be completely clear: the minimum role assigned to the user being created to access this bucket must be "Storage Object User"
Then a new Access Key in JSON
format must be created for the service account and downloaded.
Add To Data Cloud Ship Credential Record
1. Access Data Cloud Ship
2. Click on "Credentials" under "Ship Data"
3. Click on the button labeled "Add Credential" near the top right
4. Select the desired storage destination (AWS or Google Cloud Platform)
5. Click the button labeled "Select and Continue"
6. Complete the appropriate fields to give the credential record a recognizable label
7. Complete the appropriate fields to input the credential information into the record (Access Key ID and Secret Key for AWS & Service Account JSON for Google Cloud Platform)
8. Click the button labeled "Save" near the bottom right
Create Data Cloud Ship Export
AWS
Data Cloud Ship can be configured to send CSV files of your datasets to an Amazon S3 bucket.
After creating AWS credentials, adding the credentials to the credential record, and accessing Data Cloud Ship
1. Click on the button labeled "New Data Export" in the top right
2. Click on "Amazon S3" then the "Select and Continue" button in the bottom right
3. Fill in the fields on the "Data" tab to customize the details of the data export
- Choose Data
- Dataset - Select the desired dataset from the drop-down menu
- Columns - Select to export all columns or select only the desired columns
- Surrogate Key - Check the box to add a "hash_key" column to the export with a calculated surrogate key. When enabled, if any duplicate keys are found, the export will fail. Changes to the export will require a full refresh.
- Filter Data
- Accounts - Select to export the data for all advertisers/accounts or select a single or combination of desired advertiser(s)/account(s)
- Date Range - Select to not restrict the amount of data based on date or select to restrict the amount of data in the export based on a date range and a reference point to frame the date range using these two fields:
- Export - Select a time window consisting of a matter of days, weeks, months, or "Since Last Export"
- of data based on - Day, Metric Date, Executed At, Created At, Updated At, Expired At - NinjaNote: For a conventional, incremental export we suggest the "Export" field be set to "Since Last Export" and the "of data based on" field being set to "_Executed at_". This constitutes a sync statement of "Export the amount collected SINCE LAST EXPORT of data based on the date the last export was EXECUTED AT.
4. Fill in the fields on the "Destination" tab to designate the desired Amazon S3 bucket to receive the export
- Choose Credentials
- Credentials - Select the credentials for the desired export destination from the drop-down menu
- Configure Destination
- AWS Region
- S3 Bucket Name
- S3 Bucket Path (optional)
- Filename Prefix (optional)
- Enable gzip compression
- Export Schedule
- Recurrence cron
- Timezone
5. Click the "Save" button in the bottom right
Google Cloud Storage
Data Cloud Ship can be configured to send CSV files of your datasets to a Google Cloud Storage bucket.
After creating Google Cloud Storage credentials, adding the credentials to the credential record, and accessing Data Cloud Ship
1. Click on the button labeled "New Data Export" in the top right
2. Click on "Google Cloud Storage" then the "Select and Continue" button in the bottom right
3. Fill in the fields on the "Data" tab to customize the details of the data export
- Choose Data
- Dataset - Select the desired dataset from the drop-down menu
- Columns - Select to export all columns or select only the desired columns
- Surrogate Key - Check the box to add a "hash_key" column to the export with a calculated surrogate key. When enabled, if any duplicate keys are found, the export will fail. Changes to the export will require a full refresh.
- Filter Data
- Accounts - Select to export the data for all advertisers/accounts or select a single or combination of desired advertiser(s)/account(s)
- Date Range - Select to not restrict the amount of data based on date or select to restrict the amount of data in the export based on a date range and a reference point to frame the date range using these two fields:
- Export - Select a time window consisting of a matter of days, weeks, months, or "Since Last Export"
- of data based on - Day, Metric Date, Executed At, Created At, Updated At, Expired At - NinjaNote: For a conventional, incremental export we suggest the "Export" field be set to "Since Last Export" and the "of data based on" field being set to "_Executed at_". This constitutes a sync statement of "Export the amount collected SINCE LAST EXPORT of data based on the date the last export was EXECUTED AT.
4. Fill in the fields on the "Destination" tab to designate the desired GCS bucket to receive the export
- Choose Credentials
- Credentials - Select the credentials for the desired export destination from the drop-down menu
- Configure Destination
- GCP Bucket Name
- GCP Bucket Path (optional)
- Filename Prefix (optional)
- Enable gzip compression
- Export Schedule
- Recurrence cron
- Timezone
5. Click the "Save" button in the bottom right
Google Big Query
Data Cloud Ship can be configured to transfer your designated datasets to tables in Google Big query. The .CSV exports are staged in a Google Cloud Storage bucket and then merged into tables in Big Query.
NinjaNote: Two things to know about the export to Big Query:
- Due to the merging of data the generation of a Surrogate Key is required. A Surrogate Key is a Hash, a mathematical calculation based on specific columns from the data that prevents duplicate data from being merged. Any changes to the export - either of the Surrogate key, or the columns exported (or their headers) will result in the export FAILING on the next run. Changes will require you to rename the old table (or delete it) and run a full export of all rows to recreate and resync the table.
- In the Big Query destination a temp_xxx table is created to stage the loaded data. Those tables are set to expire in 2 hours, but you may see them.
After creating Google BigQuery credentials, adding the credentials to the credential record, and accessing Data Cloud Ship
1. Click on the button labeled "New Data Export" in the top right
2. Click on "Google BigQuery" then the "Select and Continue" button in the bottom right
3. Fill in the fields on the "Data" tab to customize the details of the data export - NinjaNote: BigQuery differs from Amazon S3 and Google Cloud Storage because when a new BigQuery export is created the Columns setting will default to "Selected Columns" instead of "All Columns". The Date Range Setting will default to "Since Last Export" and "_Executed at_". The Sync mode will default to "Upsert Mode". If "All Columns" is selected If the source data changes, the mismatch in data will cause the merging of data into BigQuery to fail. If you select your columns you will be more aware of changes to the organization of your dataset. When changes to your data structure happen you will be able to prepare for a fresh export to BigQuery.
- Choose Data
- Dataset - Select the desired dataset from the drop-down menu
- Columns - Select to export all columns or select only the desired columns
- Surrogate Key - Check the box to add a "hash_key" column to the export with a calculated surrogate key. When enabled, if any duplicate keys are found, the export will fail. Changes to the export will require a full refresh.
- Filter Data
- Accounts - Select to export the data for all advertisers/accounts or select a single or combination of desired advertiser(s)/account(s)
- Date Range - Select to not restrict the amount of data based on date or select to restrict the amount of data in the export based on a date range and a reference point to frame the date range using these two fields:
- Export - Select a time window consisting of a matter of days, weeks, months, or "Since Last Export"
- of data based on - Day, Metric Date, Executed At, Created At, Updated At, Expired At - NinjaNote: For a conventional, incremental export we suggest the "Export" field be set to "Since Last Export" and the "of data based on" field being set to "_Executed at_". This constitutes a sync statement of "Export the amount collected SINCE LAST EXPORT of data based on the date the last export was EXECUTED AT.
- Sync Mode
- Use the drop-down to select between:
- Upsert Mode - Exported rows are updated or inserted in the destination based on the surrogate key
- Replace Mode - All rows for the selected date range are first deleted from the destination and then exported rows are inserted - NinjaNote: Select "Upsert Mode" for an incremental export. Select "Replace Mode" if you export data from a provider that restates data for older entries, adding more detailed data for already exported rows.
- Use the drop-down to select between:
4. Fill in the fields on the "Destination" tab to designate the desired GCP bucket to receive the export and BigQuery details
- Choose Credentials
- Credentials - Select the credentials for the desired export destination from the drop-down menu
- Configure Destination
- GCP Bucket Name
- GCP Bucket Path (optional)
- Filename Prefix (optional)
- BigQuery Dataset Name - DB should exist in the same region and project as the security token
- BigQuery Table Name - Table will be created on initial export. Configuration changes may require a delete and full refresh.
- Enable gzip compression
- Export Schedule
- Recurrence cron
- Timezone
5. Click the "Save" button in the bottom right
Updated 2 months ago