4.0 Release notes - Data Breach Response - 4.0.0

Epic

Version 4.0.0 & Precision Search Launch

We are excited to introduce Canopy 4.0.0.

This release features significant enhancements to our core search functionality, providing you with more control and better results. Our advanced search capabilities allow you to tailor your search strategy to your specific needs by leveraging two new modes:

Precision Search provides exact, literal search results, ideal for users who need to find specific terms that contain special characters and stopwords.

Recall Search offers a broader search approach, capturing variations of words and phrases to ensure comprehensive results.

Below are the key differences between Precision Search and Recall Search:

Precision Search	Recall Search	Search Behavior
Index text by splitting on spaces, tabs, and newlines	Index text by applying linguistic rules that normalize English words
Case insensitive	Case insensitive	`Apple`, `APPLE` or `AppLE` will match `apple` for both search modes
Can search on Stopwords	Cannot search on Stopwords	Precision Search: A search for `"Account No"` will return documents containing the exact phrase `"Account No"`. Recall Search: A search for `"Account No"` will not return documents containing the exact phrase `"Account No"` because `No` is a stopword, and it is removed when indexing.
No English stemming	English stemming	Precision Search: A search for `Account` will return documents containing the exact word `Account`. Recall Search: A search for `Account` will return documents containing variations of the word `Account`, e.g., `Accounts`, `Accounting`, etc., due to stemming.
Can search on possessive and contraction (’)	Cannot search on possessive and contraction (’)	Precision Search: A search for `Account’s` will return documents containing the exact word `Account’s`. Recall Search: A search for `Account’s` will not return documents containing `Account’s`. Instead, it will return documents with a variation of the word `Account`.
Can search on Symbols and Punctuation	Cannot search on Symbols and Punctuation	Precision Search: A search for `"Account #"` will return documents containing the exact phrase `"Account #"`. Recall Search: A search for `"Account #"` will not return documents containing the exact phrase `"Account #"` because `#` is removed when indexing.

By default, all searches utilize Precision Search. To switch to Recall Search, simply add recall: to your search query. The precision: prefix is optional, but it’s available if you want to be explicit about using Precision Search. Below are some of the examples:

report returns Precision Search results for "report"
recall:report returns Recall Search results for report
precision:report returns Precision Search results for report

For detailed information on using Canopy’s search functionality, please refer to our Search Guide.

Major Release Indicator

This new search functionality is available only for projects created in Canopy 4.0.0 or later.

Version 4.0.0 and later: Projects created in this version and forward have access to both Precision Search and Recall Search.
Version 3.0.0 and earlier: Projects created in these versions will continue to use Recall Search exclusively.

To help users distinguish between projects’ release versions, we’ve labeled the projects and added a Version column to the View All Projects page.

DBR-12523: Smarter Document Review with Review Fields

To streamline your document review workflows, Canopy has introduced Review Fields in the Document View.

This new feature places key email and document metadata at the top of the document view, allowing reviewers to access critical information quickly and easily.

Additionally, since these review fields often contain Personally Identifiable Information (PII), we’ve incorporated automatic PII detection and highlighting, making it faster and easier to identify and flag PII information during your review.

The Review Fields are treated as text, which means they can be searched using the search bar.

Enhanced Metadata Search

We’ve added 13 new processing fields and updated two fields in the Document List column of the Review module. These fields provide deeper insight into your documents, making it easier to understand your data at a glance and perform more precise searches. The new and updated fields include:

Author: The original author of the document or sender of the email/message, extracted from the file’s metadata.

Search syntax for this field is author:"author name"; e.g., author:"John Doe" searches for documents authored by “John Doe”.

Batch: The batch name within a batch set.

Search syntax for this field is batch:"batch name", e.g., batch:"prefix_001".

Batch Set: User-created batch set, available in the Batches module.

Search syntax for this field is batch_set:"batch set", e.g., batch_set:"SSN Documents".

Batch Type: Canopy’s supported batch types, including “Review” batch, “QA” batch & “Alt Workflow” batch.

Search syntax for this field is batch_type:"batch type", e.g., batch_type:"review".

Email Thread Index: Email’s thread index, extracted from the email header.

Search syntax for this field is email_thread_index:"thread index".

Confirmed Entity Count: The total number of confirmed entities in the document.

Search syntax for this field is confirmed_entity_count:n, where n is the number of confirmed entities; e.g.,confirmed_entity_count:[2 TO 10].

File Name: The name of the file as it was originally named when uploaded to Canopy.

Search syntax for this field is file_name:"file name". e.g., file_name:"invoice.pdf" searches for documents named “invoice.pdf”.

Has Attachments: Boolean field indicating whether the email has attachments or not.

Search syntax for this field is has_attachments:<boolean value>; e.g., has_attachments:false searches for documents without attachments.

Image Dimensions: The dimension of image in pixels (Width * Height).

Search syntax for this field is image_dimensions.width:>200 image_dimensions.height:<5000.

Recipient Domain: The domain of the email recipient/recipients.

Search syntax for this field is recipient_domain:"domain", e.g., recipient_domain:"example.com".

Sender Domain: The domain of the email sender.

Search syntax for this field is sender_domain:"domain", e.g., sender_domain:"example.com".

Suggested Entity Count: The total number of entities that Canopy’s system has automatically identified.

Search syntax for this field is suggested_entity_count:n, where n is the number of suggested entities; e.g.,suggested_entity_count:>2.

Subject: The subject line of emails or documents, automatically extracted from their metadata.

Search syntax for this field is subject:"subject", e.g., subject:"Annual Report" searches for documents with the title “Annual Report”.

Text Source: The method by which text content was obtained from a file during and after Processing, and is ready for review.

Search syntax for this field is text_source:"text source", e.g., text_source:"ocr text".

Title: The title of the document, extracted from the file’s metadata.

Search syntax for this field is title:"title", e.g., title:"Annual Report" searches for documents with the title “Annual Report”.

This update is part of our ongoing effort to make data analysis and document review faster and more efficient.

For more information, please refer to Fields and Field Search documentation.

Processing Improvements

DBR-13147 Optimized Processing during PII Detection

We’ve refactored our PII detection engine to significantly improve Processing speed and overall performance. The system now stops detection early on documents that fail to find PII after three consecutive attempts.

Two new tags will give you a clearer status:

pii_detection_failed for documents that failed during the first 3 PII detections
pii_detection_incomplete for partial results where some PII was found, but the process timed out a total of 3 PII detections.

DBR-13383 Processing Enhancement - Processing Check Service

Canopy now includes a Processing Check Service that continuously monitors the status of ongoing processing tasks every 10 minutes. This service ensures that any interruptions or failures during processing are promptly detected and addressed, eliminating processing stalls and improving overall reliability.

Users can track the status of their processing check in real-time in the Activity History under the Processing Check class.

DBR-14002 Processing Enhancement - Faster and Better Performance

Canopy has made significant improvements to the overall processing experience, resulting in faster performance, enhanced capabilities, and greater stability and reliability. Key enhancements include:

Enhanced Speed and Throughput:
- Triple Bulk Processing Capability: We’ve increased the threshold for bulk processing threefold to improve overall throughput.
- Maximized CPU Usage: We’ve increased CPU count to boost processing speed for spreadsheets and PDFs.
- Efficient Data Distribution: Data-intensive spreadsheets and CSVs are randomly distributed across the population to prevent resource exhaustion and eliminate bottlenecks.
- Increased Worker Capacity: We’ve nearly doubled the number of processing workers across all regions.
- Faster Thumbnail Generation: We’ve sped up thumbnail creation for Gallery View and implemented logic to avoid retries when generation fails.
- Parallel Ingestion: We’ve parallelized the ingestion of each spreadsheet into Elastic Search.
Stability and Reliability:
- Optimized Entity Accuracy: We’ve improved processing speed and entity accuracy by deprecating suggested entities on known PDF forms.
- Reduced Unnecessary Retrying: We’ve eliminated retries for specific “unknown class type” PST messages when the user’s initial retry fails.
- Better Quality Control: We’ve enhanced processing quality and handling of blank or empty files.
- Conversion Process Bug Fix: We’ve resolved a bug in the document conversion process that sometimes marked successfully normalized document structures as failures.

Story

DBR-13627: Use Regular Expression with Proximity Search

You can now combine Regular Expressions (RegEx) with Proximity Search in Canopy’s search bar to create more advanced and flexible search queries.

For example: "ssn [0-9]{3}-[0-9]{2}-[0-9]{4}"~~5 "ssn \d{3}-\d{2}-\d{4}"~~5 returns documents where the word “ssn” appears within 5 words before a pattern that matches a Social Security Number (e.g., 123-45-6789).

For more details, refer to our Search Guide documentation.

Canopy’s Proximity Search with RegEx requires the use of the double tilde (~~) operator followed by a number (e.g., ~~5), which preserve the words order. The single tilde (~) operator is not supported in this context.

DBR-12835: Proximity Search - Preserve Order of Words

Canopy’s enhanced the Proximity Search to give you more precise control over your search results.

You can now use the order-preserving proximity search to ensure keywords appear in the exact order you specify in your search query. Use the ~~ operator.

For example: "data breach"~~5 returns documents where “data” appears before “breach”, with no more than 5 words between them.

For more information on Proximity Search, refer to our Search Guide documentation.

DBR-12809: Enhancement to Document Page - Column Filters, Column Selector, and Export Interface

To enhance user experience and streamline your workflow, we’ve made several improvements to the Document Page, focusing on better order, consistency, and ease of use.

Column Filters: All pre-populate dropdown menus are now sorted in alphabetical order, making it easier to find and select options quickly.
Column Selector: The Column Selector interface now exactly mirrors the order of the columns shown in the document list. Unselected fields remain available at the bottom of the list, in alphabetical order. You can select, deselect, and rearrange columns as needed.
Export Interface: The Export Interface now reflects the order of columns shown in your document list. When you initiate an export, all currently visible fields are pre-selected and ordered for you, ensuring that your exported data matches your document list view exactly. You can still customize the field selection before running the export.

DBR-12985: Delete Tenant Tags

Users can now delete tenant tags from the Manage Tags page in the Tenant Settings. Please note that documents already tagged with a deleted tag will continue to display that tag until users manually remove it from the documents.

Navigate to Tenant Settings > Templates > Manage Tags.
Either:
- Click the trash icon next to the tag you want to delete, or
- Select multiple tags using the checkboxes and click Delete at the top of the list.
Confirm the deletion in the pop-up dialog.

DBR-13095: Document Batching as a Background Task

We’ve enhanced our Batching processes to run as background tasks, allowing users to continue working uninterrupted in Canopy while the tasks run. Users will receive a notification once the task is complete.

This change results in a smoother and more responsive experience. The following tasks are now processed in the background:

Batching Documents
Deleting Batches
Splitting Batches
Remove Documents from Batches

DBR-14052 Greater Entity Transparency with New Cross-Reference IDs

We’ve made significant upgrades to our entity exports to enhance transparency and trackability, and to simplify quality control.

New cross-reference ID columns have been added to both the Consolidated and Raw Entity Exports, making it easier than ever to track and link every Raw Entity to the Master Entity it merged into, as well as its grouping status, directly within the Entity Export CSV files.

The new cross-reference ID columns in the CSV exports are:

Consolidated Entity Export:

Group ID: A shared, unique ID assigned to all raw entities that are part of the same cluster.
Merged Entities: A semicolon-separated list of all individual raw entity IDs that were merged into the Master Entity, including the Master Entity ID itself.
Grouped: A boolean field indicating whether an entity has been clustered into a group (True/False).

Raw Entity Export:

Group ID: A shared, unique ID assigned to all raw entities that are part of the same cluster.
Grouped: A boolean field indicating whether an entity has been clustered into a group (True/False).

DBR-13209: New Email Destination Category - “Indeterminate” Filter

We’ve now added a new Indeterminate filter within the Email Destination filter category of the Filter Panel in the Documents Module.

Indeterminate reflects the status of emails that could not be classified as either internal or external. This typically includes drafted emails that do not yet have recipients.

DBR-13417: Updated Task Notification

Canopy now send task completion notifications only to the user who initiated the task. This update helps minimize unnecessary alerts for users who are not directly involved.

DBR-12833: Complete “File Name” in Document View

We’ve updated the Document View to display the complete file name of documents, preventing truncation. This enhancement improves clarity and helps users quickly identify files.

Bug

DBR-13008: Review Metrics Report - Fixed Number Value Formatting

We’ve fixed an issue in the Review Metrics Report where numerical data was not displayed correctly. Now, all Number values in the report will be formated as Numbers, allowing for accurate sorting, copying, and analysis.

DBR-13009: Review Metrics Report - Fixed Review Rate metrics to document/man-hour calculation

We have corrected the calculation for all Review Rate metrics in the Review Metrics Report. The values now accurately reflect the number of documents reviewed per man-hour, providing more reliable insights into review productivity.