1.58
We are centralizing export activities for a more consistent and secure user experience. The improvements include:
- Changing the UI location of export management
- Including activity exports on the export management page
- Adding column containing compressed export size
- Enhancing date range filter
- Ensuring consistent secure actions for all background export tasks
- Adding activity history to track export downloads
- Creating standard file naming convention for exports
We are creating a single location within Project → Settings → Exports to find and manage all exports on a project. The Exports page link from the “Data” tab in the nav bar has been removed.
When viewing the Activity History, users can press Export and view the Export History from the Export modal.
In line with the goal of centralizing Export Management, the Exported Files tab has been moved to a tab on the Project → Settings → Exports page.
The Export Management data grid now supports Canopy’s standard date filter.
DBR-8185 Save and Display Compressed Export Size (Also DBR-8186, DBR-8187,DBR-8188, DBR-8189, DBR-8190)
We will now calculate and display the compressed export size after we create the export. This feature allows users to gauge how much local storage they will need and how long it might take to download the export. We will support this feature for all exports in Export Management, including document, metadata, entity, and activity log exports.
With Export Management, you have full visibility into the Create Export Link and Download Export From Link user actions, all logged in the activity history.
You can filter on the “Create Export Link” activity to see its activity history.
You can filter on the “Downloaded Export From Link” activity to see its activity history, including the IP address of the browser that downloaded the export.
All exports from the Export Management interface will be named using the following standard naming convention:
{{name}}_{{type}}_{{export_creation_datetime}}.zip
Where the:
- filename will be changed to lowercase except for ‘T’ in the time string.
- export_creation_datetime = yyyy-MM-dd’T’HH-mm-ss
- timezone for the datetime is the same timezone used to display the export creation datetime on the Export Management page.
Example:
test_activities_2000-10-31T01:30:00.zip
Canopy now supports the Moving Picture Experts Group’s (MPEG) High Efficiency Image File Format (HEIF), a container format for storing individual digital images and image sequences. The HEIF standard format covers multimedia files that can also include other media streams, such as timed text, audio, and video. Apple Inc.’s standard image format is HEIC.
HEIF/HEIC files are containers containing one or more images. Canopy extracts the images from a container file and converts them into JPG for viewing, classification, and OCR.
MIME Type
During processing, we will identify these files by MIME type only, ignoring the extension:
- image/heic
- image/heif
- image/heic-sequence
- image/heif-sequence
Metadata
The following metadata is collected:
Common Name | Meta Field | Elastic Field | Supported in UI |
---|---|---|---|
File Name | FileName | short_name | ✔️ |
Mime Type | MIMEType | type | ✔️ |
Extension | FileTypeExtension | extension | ✔️ |
Path (Where) | SourceFile | meta.full_name | ✔️ |
File Size | FileSize | size | ✔️ |
Created Date | CreateDate | meta.metadata_created_datetime | ✔️ |
Modified Date | FileModifyDate | meta.metadata_modified_datetime | ✔️ |
Dimensions | ImageSize | dimension | ✔️ |
Latitude | GPSLatitude | meta.image_location (example: {’lat’: 111, ’lon’: 111}) | |
Longitude | GPSLongitude | meta.image_location (example: {’lat’: 111, ’lon’: 111}) |
Classification
Individual images are classified per Canopy’s normal classification methods.
Document View Filter Panel
The HEIF extensions are supported both under the Images category filter and its own extension.
- Include in Filter → File Type → Images
- Include inFilter → File Type → Videos
Image Sequences
As of this release, we only process the first image in a HEIC file sequence:
- image/heic-sequence
- image/heif-sequence
These sequences are likely “live” pictures or movies. At this time, you may export them and review them externally to the application.
Look to future releases for additional support for image sequences.
PII detection is being refactored in order to break up the pipeline codebase into smaller, more manageable parts.
PII elements have been normalized, on the backend, to improve speed and accuracy of matching fields while propagating and suggesting entities. The normalization will increase the success of propagation and suggestions, but is otherwise imperceptible to the user.
- Data fields (“dateofbirth”, “dateofdeath”, “patient_dates”)
- Normalized to
%Y-%m-%d
format. - Address fields (“address”, “militaryaddress”)
- Normalized by removing
/n
, extra spaces, and striped single and double quotes. - Name
- Normalized by ignoring all symbols except period and comma, keeping only alphabets, and removing extra spaces, new line and capitalization of the first letter.
- Other fields
- Normalized by ignoring all symbols, keeping only alphabets and numbers, ignoring case, and removing extra spaces.
Automatically mapping tables with a large number of columns slows down the rendering of an Excel or a mappable table. For tables with larger than 50 columns, the system will first render the table then present a button for the user to choose to auto map columns.
This change improves the performance of rendering the document view.
We have improved our HTML sanitization to prevent XSS attack due to processed HTML in files.
The previous technique changed unsafe characters to safer representations, for instance, &
would be converted to &
, <
would be converted to <
, etc. The new technique removes unsafe data while preserving readable content. For instance, a comment containing the phrase Sticks & Stones
will now be saved and displayed as Sticks & Stones
.
User Interface Prior to Release 1.58.0
User Interface On and After Release 1.58.0
We updated processing to first process Excel files under our standard processing and, if they fail, automatically retry to process them according to DBR-7943, released in 1.57.0.
We created a new document view smart filter category for video files. The impact assessment report will also include a new category called “Videos.”
This category includes the following video extensions:
Ext | Description |
---|---|
‘.avi’, | Audio Video Interleave (AVI) |
‘.mov’, | QuickTime File Format (MOV) |
‘.wmv’, | Windows Media Video (WMV) |
‘.mkv’, | Matroska Multimedia Container (MKV) |
‘.flv’, | Flash Video (FLV) |
‘.mpeg’, ‘.mpg’, ‘.mpe’ | Moving Picture Experts Group (MPEG) |
‘.mp4’, ‘.m4v’ | MPEG-4 Part 14 (MP4) |
‘.3gp’, ‘.3g2’ | Third Generation Partnership Project (3GP) |
‘.heic’ , ‘.heif’ | High Efficiency Image File Format (HEIF) sequence files |
When processing data, Canopy skips the resource fork file by using the MIME type multipart/appledouble. These files typically contain application specific metadata that is not user created. These files can be found using the Skipped filter in the processing interface.
Details:
In AppleDoublebyte file typically refers to a file format used by macOS for storing metadata and resource fork information.
-
Resource Fork and Data Fork: Classic Mac files often consisted of two parts, the data fork (the main data content) and the resource fork (metadata like icons, window positions, etc.). Most modern file systems, like those used by Windows or UNIX-based systems, don’t support resource forks directly.
-
AppleDouble: To maintain compatibility when transferring files between systems that do not support resource forks, macOS uses the AppleDouble format. This format splits the resource fork and the data fork into two separate files:
- Data Fork: The primary content of the file.
- Resource Fork: The metadata, stored in a separate file.
File Extensions and Naming Conventions
-
.AppleDouble: The resource fork is often stored in a separate file with a name starting with “._” followed by the original filename. For instance, for a file named example.txt, the resource fork would be ._example.txt.
-
File Handling: When processing data, Canopy skips the resource fork using the MIME type multipart/appledouble. These files can be found using the Skipped filter in the processing interface.
Upgraded AWS Instance Metadata Service Version 1 (IMDSv1) to IMDSv2.
When thousands of empty columns are present, Canopy takes extra processing steps to ensure the correct number of columns are rendered.
We have addressed a critical bug where an error was thrown when we were attempting to set headers after sending to the client. This fix ensures that headers are only set when appropriate, preventing this server-side error.
Advanced search clears search query but not results prior to running the next advanced search query. The result combined the current and previous results. Now Advanced Search is populated with the active query and the query is not cleared until the user does so deliberately.
Canopy fixed the blank Excel sheet scenario so as not to create a blank browser page.