Structured Data
Canopy’s advanced processing prepares structured data to be easily added to the entity database.
During processing, databases are restored and data-based tables are presented to the user in a mappable form. Reviewers can select a header row, import from a selected row, and view the total rows present within each document. Clicking on the “Map” button will initiate mapping to entities. The following database file types are supported:
Extensions | File type |
---|---|
.accdb, .mdb | Access DB |
.bak, .mdf | MS SQL |
Binary file type formats containing structured data include the following:
Extensions | File type |
---|---|
.sas7bdat | A database storage file created by Statistical Analysis System (SAS) software. It contains a binary encoded dataset used for advanced analytics, business intelligence, data management, predictive analytics, and more. |
.dcm | Digital Imaging and Communications in Medicine (DICOM). See more information on DICOM files here. |
During processing, Canopy attempts to detect delimited structured data and automatically create a modified mappable version of the file. Specifically, processing searches for comma (,), tab (\t), pipe (|), colon (:), and semicolon (;) delimiters in any mime/media type equals ’text.'
This conversion is helpful in speeding up the review process because the user can simply click on the modified version of the file to map it.
While the file types where delimited text can be present are endless, we test against the following:
Extensions | File type |
---|---|
.csv | CSV (Comma-Separated Values): CSV files use commas as delimiters to separate data fields. These files are widely used for storing and exchanging structured data. |
.tsv, .tab | TSV / TAB (Tab-Separated Values): TSV files use tabs as delimiters. These files are similar to CSV files, but use tabs instead of commas to separate values. |
.psv | PSV (Pipe-Separated Values): PSV files use the pipe character (\) as the delimiter to separate data fields. These files are less common than CSV and TSV, but are still used in some applications. |
.txt | TXT (Text files): Plain txt files may contain any type of delimiters to separate data fields. Canopy can detect delimiters included in these files. |
.dat | DAT (DAT files): DAT files are another form of a CSV type file. |