1.58 Release notes - Data Breach Response - 1.58.0

Epic

DBR-7531 Manage Exports

We are centralizing export activities for a more consistent and secure user experience. The improvements include:

Changing the UI location of export management
Including activity exports on the export management page
Adding column containing compressed export size
Enhancing date range filter
Ensuring consistent secure actions for all background export tasks
Adding activity history to track export downloads
Creating standard file naming convention for exports

DBR-7532 Change UI Location of Export Management Page

We are creating a single location within Project → Settings → Exports to find and manage all exports on a project. The Exports page link from the “Data” tab in the nav bar has been removed.

Screen Shot of Project Settings

DBR-7533 Move Activity Log Export History to the Export Page

When viewing the Activity History, users can press Export and view the Export History from the Export modal.

In line with the goal of centralizing Export Management, the Exported Files tab has been moved to a tab on the Project → Settings → Exports page.

DBR-7534 Improved Date Filter for Exports

The Export Management data grid now supports Canopy’s standard date filter.

DBR-8185 Save and Display Compressed Export Size (Also DBR-8186, DBR-8187,DBR-8188, DBR-8189, DBR-8190)

We will now calculate and display the compressed export size after we create the export. This feature allows users to gauge how much local storage they will need and how long it might take to download the export. We will support this feature for all exports in Export Management, including document, metadata, entity, and activity log exports.

DBR-7593 Track Export Downloads in Activity History

With Export Management, you have full visibility into the Create Export Link and Download Export From Link user actions, all logged in the activity history.

You can filter on the “Create Export Link” activity to see its activity history.

You can filter on the “Downloaded Export From Link” activity to see its activity history, including the IP address of the browser that downloaded the export.

Standardized Export Naming Convention

All exports from the Export Management interface will be named using the following standard naming convention:

{{name}}_{{type}}_{{export_creation_datetime}}.zip

Where the:

filename will be changed to lowercase except for ‘T’ in the time string.
export_creation_datetime = yyyy-MM-dd’T’HH-mm-ss
timezone for the datetime is the same timezone used to display the export creation datetime on the Export Management page.

Example: test_activities_2000-10-31T01:30:00.zip

DBR-8490 Support for High Efficiency File Format (HEIF) Family

Canopy now supports the Moving Picture Experts Group’s (MPEG) High Efficiency Image File Format (HEIF), a container format for storing individual digital images and image sequences. The HEIF standard format covers multimedia files that can also include other media streams, such as timed text, audio, and video. Apple Inc.’s standard image format is HEIC.

DBR-8417, DBR-8420, DBR-8452 Support HEIF/HEIC Files

HEIF/HEIC files are containers containing one or more images. Canopy extracts the images from a container file and converts them into JPG for viewing, classification, and OCR.

MIME Type

During processing, we will identify these files by MIME type only, ignoring the extension:

image/heic
image/heif
image/heic-sequence
image/heif-sequence

Metadata

The following metadata is collected:

Common Name	Meta Field	Elastic Field	Supported in UI
File Name	FileName	short_name	✔️
Mime Type	MIMEType	type	✔️
Extension	FileTypeExtension	extension	✔️
Path (Where)	SourceFile	meta.full_name	✔️
File Size	FileSize	size	✔️
Created Date	CreateDate	meta.metadata_created_datetime	✔️
Modified Date	FileModifyDate	meta.metadata_modified_datetime	✔️
Dimensions	ImageSize	dimension	✔️
Latitude	GPSLatitude	meta.image_location (example: {’lat’: 111, ’lon’: 111})
Longitude	GPSLongitude	meta.image_location (example: {’lat’: 111, ’lon’: 111})

Classification

Individual images are classified per Canopy’s normal classification methods.

Document View Filter Panel

The HEIF extensions are supported both under the Images category filter and its own extension.

Include in Filter → File Type → Images

Include inFilter → File Type → Videos

Image Sequences

As of this release, we only process the first image in a HEIC file sequence:

image/heic-sequence
image/heif-sequence

These sequences are likely “live” pictures or movies. At this time, you may export them and review them externally to the application.

Look to future releases for additional support for image sequences.

Story

DBR-4800 Refactor PII Detection

PII detection is being refactored in order to break up the pipeline codebase into smaller, more manageable parts.

DBR-7396 Create Normalized Versions of PII Elements to Improve Speed and Accuracy of Matching

PII elements have been normalized, on the backend, to improve speed and accuracy of matching fields while propagating and suggesting entities. The normalization will increase the success of propagation and suggestions, but is otherwise imperceptible to the user.

Data fields (“dateofbirth”, “dateofdeath”, “patient_dates”): Normalized to %Y-%m-%d format.
Address fields (“address”, “militaryaddress”): Normalized by removing /n , extra spaces, and striped single and double quotes.
Name: Normalized by ignoring all symbols except period and comma, keeping only alphabets, and removing extra spaces, new line and capitalization of the first letter.
Other fields: Normalized by ignoring all symbols, keeping only alphabets and numbers, ignoring case, and removing extra spaces.

DBR-8063 Add Button to Show Auto Mapped Columns in Smartmap View

Automatically mapping tables with a large number of columns slows down the rendering of an Excel or a mappable table. For tables with larger than 50 columns, the system will first render the table then present a button for the user to choose to auto map columns.

This change improves the performance of rendering the document view.

DBR-8502 Upgrade to HTML Sanitization

We have improved our HTML sanitization to prevent XSS attack due to processed HTML in files. The previous technique changed unsafe characters to safer representations, for instance, & would be converted to &, < would be converted to <, etc. The new technique removes unsafe data while preserving readable content. For instance, a comment containing the phrase Sticks & Stones will now be saved and displayed as Sticks & Stones.

User Interface Prior to Release 1.58.0

Old User Interface

User Interface On and After Release 1.58.0

New User Interface

DBR-8684 Automatically Retry Failed Excel Files Using Alternate Processing Method

We updated processing to first process Excel files under our standard processing and, if they fail, automatically retry to process them according to DBR-7943, released in 1.57.0.

DBR-8822 Add Smart Filter for Video Files

We created a new document view smart filter category for video files. The impact assessment report will also include a new category called “Videos.”

This category includes the following video extensions:

Ext	Description
‘.avi’,	Audio Video Interleave (AVI)
‘.mov’,	QuickTime File Format (MOV)
‘.wmv’,	Windows Media Video (WMV)
‘.mkv’,	Matroska Multimedia Container (MKV)
‘.flv’,	Flash Video (FLV)
‘.mpeg’, ‘.mpg’, ‘.mpe’	Moving Picture Experts Group (MPEG)
‘.mp4’, ‘.m4v’	MPEG-4 Part 14 (MP4)
‘.3gp’, ‘.3g2’	Third Generation Partnership Project (3GP)
‘.heic’ , ‘.heif’	High Efficiency Image File Format (HEIF) sequence files

DBR-8604 Support for Apple “DoubleByte” Files

When processing data, Canopy skips the resource fork file by using the MIME type multipart/appledouble. These files typically contain application specific metadata that is not user created. These files can be found using the Skipped filter in the processing interface.

Details:

In AppleDoublebyte file typically refers to a file format used by macOS for storing metadata and resource fork information.

Resource Fork and Data Fork: Classic Mac files often consisted of two parts, the data fork (the main data content) and the resource fork (metadata like icons, window positions, etc.). Most modern file systems, like those used by Windows or UNIX-based systems, don’t support resource forks directly.
AppleDouble: To maintain compatibility when transferring files between systems that do not support resource forks, macOS uses the AppleDouble format. This format splits the resource fork and the data fork into two separate files:
- Data Fork: The primary content of the file.
- Resource Fork: The metadata, stored in a separate file.

File Extensions and Naming Conventions

.AppleDouble: The resource fork is often stored in a separate file with a name starting with “._” followed by the original filename. For instance, for a file named example.txt, the resource fork would be ._example.txt.
File Handling: When processing data, Canopy skips the resource fork using the MIME type multipart/appledouble. These files can be found using the Skipped filter in the processing interface.

DBR-8505 Upgrade AWS Instance Metadata Service Version

Upgraded AWS Instance Metadata Service Version 1 (IMDSv1) to IMDSv2.

DBR-8832 Better Handling of Excels with a Large Number of Empty Columns

When thousands of empty columns are present, Canopy takes extra processing steps to ensure the correct number of columns are rendered.

Bug

DBR-7784 Cannot Set Headers After Sending to Client

We have addressed a critical bug where an error was thrown when we were attempting to set headers after sending to the client. This fix ensures that headers are only set when appropriate, preventing this server-side error.

DBR-8076 Advanced Search for Document ID Returning Inaccurate Result

Advanced search clears search query but not results prior to running the next advanced search query. The result combined the current and previous results. Now Advanced Search is populated with the active query and the query is not cleared until the user does so deliberately.

DBR-8846 Clicking on Blank Excel Sheets Makes Page Go Blank

Canopy fixed the blank Excel sheet scenario so as not to create a blank browser page.