Product Documentation
Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Toggle Dark/Light/Auto mode Back to homepage

Fields


This guide provides an overview of the Fields available in Canopy Processing. These fields are automatically extracted from documents and mapped during the processing stage. Users can view, search, filter, and report on these fields throughout the review workflow.

Field Name Type of Field Description Example Value
Alt Workflow Reviewer Keyword Names of reviewers who review documents in the Alt Workflow batch type. Jane Doe; John Kim
Alt Workflow Reviewer Email Keyword The emails of the reviewers who review document in Alt Workflow batch type. Gareth Keenan
keenan.gareth@gmail.com
Alt Workflow Tags Keyword Tags created by users and applied to batches in the Alt Workflow batch type. Large Document
Audio Duration Long The length of audio or video files in seconds. 120
Author keyword The original author of the document or sender of the email/message, extracted from the file’s metadata. Jane Doe; John Kim
Batch Keyword An individual batch name within a batch set. prefix-1, TXT-1, A01-2
Batch Set Keyword User-created batch set, available in the Batches module. <user_created_name>
Batch Type Keyword Canopy’s supported batch types, including “Review” batch, “QA” batch & “Alt Workflow” batch. Review; QA; Alt Workflow
BCC Keyword The names, when available, and email addresses of the Blind Carbon Copy recipients of an email message. Gareth Keenan
keenan.gareth@gmail.com
Classification Keyword The text identification of image type. Social Security Cards
CC Keyword The names, when available, and email addresses of the Carbon Copy recipients of an email message. Gareth Keenan
keenan.gareth@gmail.com
Confirmed Entity Count Long The total number of confirmed entities the document. 10
Custodian Keyword All custodians, de-duplicated and primary, associated with a document. Jane Doe
Custom PII Tags Keyword Labels or Tags for each custom detection rule that returns a hit on the document. US_SSN_VLC; US_Passport_LC
Email Client Submit Date/Time Date The timestamp recorded by the sender’s email client (e.g., Outlook, Gmail in a web browser, Apple Mail) at the exact moment the sender hits the “Send” button. 2019-12-13T19:53:10Z
Email Conversation Index Keyword The email thread created by the email system. It refers to a hidden metadata field in an email, especially in Microsoft Outlook or Exchange Server environments. AQHc5fUAEuRWmZ2a2k6c7FyCkdK6R6kB
Email Created Date/Time Date The date/time at which an email was created by the user. 2019-12-13T19:53:10Z
Email Message ID Keyword The message number created by an email application and extracted from the email’s metadata. 1ee10ea6-d9c0-aab2-1940-f05f0deef8d8@cu.edu
Email Modified Date/Time Date The date/time an email was last modified. 2019-12-13T19:53:10Z
Email Provider Submit Date/Time Date The date/time the email server sent the email. 2019-12-13T19:53:10Z
Email Delivery Date/Time Date The timestamp that a recipient’s mail server records when it successfully accepts an email from the previous mail server in the delivery chain. 2019-12-13T19:53:10Z
Email Report Date/Time Date The date/time that the recipient’s mail server reported the user likely opened the email. 2019-12-13T19:53:10Z
Email Thread Index Keyword Email’s thread index, extracted from the email header. AcvXMOs3E1WvsR6hBkapQSV6HVwCHQ==
Family ID Keyword The search ID of the first file in the file family: email or loose file (word, ppt, pdf, etc.). This file will never be a container file. 2FG2G55FGF
File Created Date/Time Date The date/time the file was created. 2019-12-13T19:53:10Z
File Modified Date/Time Date The date/time the file was last saved. 2019-12-13T19:53:10Z
File Name Text The file name (file_name), or in the case of emails, filename.eml. “Project_Update.pdf”; “Project_Update_from_Jane_Doe.eml”
File Size Long The size of the file. 10.92 KB, 853 Bytes
File Type Keyword The text extension of the file. .doc, .pdf
From Keyword The name, when available, and email address of the sender of an email message. Gareth Keenan
keenan.gareth@gmail.com
Has Attachment Boolean Boolean field indicating whether the email has attachment or not. True; False
ID Keyword Canopy’s unique Search ID associated with a document. 2FG2G55FGF
Image Dimension Long The dimension of image in pixels (Width * Height). 1000 x 1000
Language Keyword The predominant language contained in documents. English, French
Language Confidence (in %) Long The % of confidence level in language detection. 80
MD5 Hash Keyword The MD5 hash value of the file.
NOTE: The Canopy application calculates and uses the SHA256 hash, which is our recommended standard for data integrity. For compatibility with some client tools and processes, Canopy also provides MD5 and SHA1 hashes.
WARNING: MD5 and SHA1 are cryptographically broken and should not be used for security-sensitive purposes. They are highly susceptible to collision vulnerabilities, meaning an attacker can create two entirely different files that produce the exact same hash. Relying on these hashes can expose you to significant security risks, including data tampering and impersonation.
5d41402abc4b2a76b9719d911017c592
Master Created Date/Time Date The Master Created Date/Time derived all the other date fields collected from the document. The Master Date is populated by the first date present in this prioritized list:
1. meta.eml_CreationTime
2 earliest eml date/time field from all available
3. meta.metadata_created_datetime
4. parent file Master Created Date/Time
5. meta.archive_created_datetime (date stored for file inside the archive)
6 meta.uploaded
2019-12-13T19:53:10Z
Master Created Date/Time Source Keyword The name of the field used to populate the Master Created Date/Time. Email Created Date/Time
Master Created Date/Time Source ID Keyword The document search ID associated with the Master Created Date/Time Source. 2FG2G55FGF
Master Modified Date/Time Date The Master Modified Date/Time derived all the other date fields collected from the document. The Master Date is populated by the first date present in this prioritized list:
1. meta.eml_ClientSubmitTime
2. meta.eml_LastModificationTime
3. oldest eml date/time field from all available
4. meta.metadata_modified_datetime
5. parent file Master Modified Date/Time
meta.archive_lastmodified_datetime (date stored for file inside the archive)
7. meta.uploaded
2019-12-13T19:53:10Z
Master Modified Date/Time Source Keyword The name of the field used to populate the Master Modified Date/Time. Email Modified Date/Time
Master Modified Date/Time Source ID Keyword The document search ID associated with the Master Modified Date/Time Source Field. 2FG2G55FGF
Name Text The file name (file_name), or in the case of emails, the email subject (subject). “Team_Meeting_Report.pdf”
Page Count Long The number of pages contained within the document. 10
Parent ID Keyword The search ID of the file from which a file was extracted. This can be an attachment, an embedding, or contained in a container file. 2FG2G55FGF
PII Tags Keyword A list of PII element types detected in the file. Name; Phone Number; SSN
Preserved Created Date/Time Date The file created date/time as recorded by the file system and preserved within the Zip or archive container. 2019-12-13T19:53:10Z
Preserved Modified Date/Time Date The file last modified date/time as recorded by the file system and preserved within the Zip or archive container. 2019-12-13T19:53:10Z
Processing Status Keyword Processing’s final status in the Review module. Done; Extraction Incomplete
QA Change Reason Keyword The applied change reason tagged by QA.
QA Reviewer Keyword Names of reviewers who review documents in the QA batch type. Jane Doe
QA Reviewer Email Keyword The emails of the reviewers who review document in QA batch type. Gareth Keenan
keenan.gareth@gmail.com
QA Status Keyword The QA status of the document. The status could be either “QA Accepted,” “QA Pending Review,” or “QA Reject,” QA Accepted, QA Pending Review, QA Reject
Recipient Count Long The number of recipients in an email. 4
Recipient Domain Keyword The email domain/domains of the email recipients. gmail.com; school.edu; govagency.gov; organization.org
Review Status Keyword The review status of the document. The status could be either “Reviewed,” “Pending Review,” or “Not Batched,” Reviewed, Pending Review, Not Batched
Reviewer Keyword Names of reviewers who review documents in the Review batch type. Jane Doe
Reviewer Email Keyword The emails of the reviewers who review document in Review batch type. Gareth Keenan keenan.gareth@gmail.com
Sender Domain Keyword The email domain of the email sender. gmail.com; school.edu; govagency.gov; organization.org
SHA1 Hash Keyword The SHA1 hash value of the file.
NOTE: The Canopy application calculates and uses the SHA256 hash, which is our recommended standard for data integrity. For compatibility with some client tools and processes, Canopy also provides MD5 and SHA1 hashes.
WARNING: MD5 and SHA1 are cryptographically broken and should not be used for security-sensitive purposes. They are highly susceptible to collision vulnerabilities, meaning an attacker can create two entirely different files that produce the exact same hash. Relying on these hashes can expose you to significant security risks, including data tampering and impersonation.
aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d
SHA256 Hash Keyword The SHA256 hash value of the file. c604a6840d44c89df5ff8b5a5c5e943be565735f4bbeb3ddb692ff58bbf6993c
Source Container Keyword The original source container of the file uploaded from UI. Master Demo.zip
Source Path Keyword The full directory path of the file within the container. “Source Path” does not include the file’s Name in the path. Master Demo.zip/Master Demo/Demo Files/Long Thread/Threading.pst/Top of Outlook data file/Inbox
Subject Keyword The subject line of emails or documents, automatically extracted from their metadata. “Finance Report”
Suggested Entity Count Long The total count of entities that Canopy’s system has automatically identified. 10
Tags Keyword The list of user-created tags applied to the file. Sensitive; Public; Private
Text Length Long The length of documents in number of text characters. 989
Text Source Keyword The method by which text content was obtained from a file during and after Processing, and is ready for review. OCR Text, Transcription, Extraction
Title Keyword The descriptive name of the document, automatically extracted from the file’s metadata. “Finance Report”
To Keyword The name, when available, and email address of the recipient/recipients of an email message. Gareth Keenan
keenan.gareth@gmail.com
Total PII Count Long The number of all PII detected in a document. 10
Uploaded End Date/Time Date The date/time when the document upload is completed in Canopy. 2020-01-30 00:00:00.000