Fields
This guide provides an overview of the Fields available in Canopy Processing. These fields are automatically extracted from documents and mapped during the processing stage. Users can view, search, filter, and report on these fields throughout the review workflow.
| Field Name | Type of Field | Description | Example Value |
|---|---|---|---|
| Alt Workflow Reviewer | Keyword | Names of reviewers who review documents in the Alt Workflow batch type. | Jane Doe; John Kim |
| Alt Workflow Reviewer Email | Keyword | The emails of the reviewers who review document in Alt Workflow batch type. | Gareth Keenan keenan.gareth@gmail.com |
| Alt Workflow Tags | Keyword | Tags created by users and applied to batches in the Alt Workflow batch type. | Large Document |
| Audio Duration | Long | The length of audio or video files in seconds. | 120 |
| Author | keyword | The original author of the document or sender of the email/message, extracted from the file’s metadata. | Jane Doe; John Kim |
| Batch | Keyword | An individual batch name within a batch set. | prefix-1, TXT-1, A01-2 |
| Batch Set | Keyword | User-created batch set, available in the Batches module. | <user_created_name> |
| Batch Type | Keyword | Canopy’s supported batch types, including “Review” batch, “QA” batch & “Alt Workflow” batch. | Review; QA; Alt Workflow |
| BCC | Keyword | The names, when available, and email addresses of the Blind Carbon Copy recipients of an email message. | Gareth Keenan keenan.gareth@gmail.com |
| Classification | Keyword | The text identification of image type. | Social Security Cards |
| CC | Keyword | The names, when available, and email addresses of the Carbon Copy recipients of an email message. | Gareth Keenan keenan.gareth@gmail.com |
| Confirmed Entity Count | Long | The total number of confirmed entities the document. | 10 |
| Custodian | Keyword | All custodians, de-duplicated and primary, associated with a document. | Jane Doe |
| Custom PII Tags | Keyword | Labels or Tags for each custom detection rule that returns a hit on the document. | US_SSN_VLC; US_Passport_LC |
| Email Client Submit Date/Time | Date | The timestamp recorded by the sender’s email client (e.g., Outlook, Gmail in a web browser, Apple Mail) at the exact moment the sender hits the “Send” button. | 2019-12-13T19:53:10Z |
| Email Conversation Index | Keyword | The email thread created by the email system. It refers to a hidden metadata field in an email, especially in Microsoft Outlook or Exchange Server environments. | AQHc5fUAEuRWmZ2a2k6c7FyCkdK6R6kB |
| Email Created Date/Time | Date | The date/time at which an email was created by the user. | 2019-12-13T19:53:10Z |
| Email Message ID | Keyword | The message number created by an email application and extracted from the email’s metadata. | 1ee10ea6-d9c0-aab2-1940-f05f0deef8d8@cu.edu |
| Email Modified Date/Time | Date | The date/time an email was last modified. | 2019-12-13T19:53:10Z |
| Email Provider Submit Date/Time | Date | The date/time the email server sent the email. | 2019-12-13T19:53:10Z |
| Email Delivery Date/Time | Date | The timestamp that a recipient’s mail server records when it successfully accepts an email from the previous mail server in the delivery chain. | 2019-12-13T19:53:10Z |
| Email Report Date/Time | Date | The date/time that the recipient’s mail server reported the user likely opened the email. | 2019-12-13T19:53:10Z |
| Email Thread Index | Keyword | Email’s thread index, extracted from the email header. | AcvXMOs3E1WvsR6hBkapQSV6HVwCHQ== |
| Family ID | Keyword | The search ID of the first file in the file family: email or loose file (word, ppt, pdf, etc.). This file will never be a container file. | 2FG2G55FGF |
| File Created Date/Time | Date | The date/time the file was created. | 2019-12-13T19:53:10Z |
| File Modified Date/Time | Date | The date/time the file was last saved. | 2019-12-13T19:53:10Z |
| File Name | Text | The file name (file_name), or in the case of emails, filename.eml. |
“Project_Update.pdf”; “Project_Update_from_Jane_Doe.eml” |
| File Size | Long | The size of the file. | 10.92 KB, 853 Bytes |
| File Type | Keyword | The text extension of the file. | .doc, .pdf |
| From | Keyword | The name, when available, and email address of the sender of an email message. | Gareth Keenan keenan.gareth@gmail.com |
| Has Attachment | Boolean | Boolean field indicating whether the email has attachment or not. | True; False |
| ID | Keyword | Canopy’s unique Search ID associated with a document. | 2FG2G55FGF |
| Image Dimension | Long | The dimension of image in pixels (Width * Height). | 1000 x 1000 |
| Language | Keyword | The predominant language contained in documents. | English, French |
| Language Confidence (in %) | Long | The % of confidence level in language detection. | 80 |
| MD5 Hash | Keyword | The MD5 hash value of the file. NOTE: The Canopy application calculates and uses the SHA256 hash, which is our recommended standard for data integrity. For compatibility with some client tools and processes, Canopy also provides MD5 and SHA1 hashes. WARNING: MD5 and SHA1 are cryptographically broken and should not be used for security-sensitive purposes. They are highly susceptible to collision vulnerabilities, meaning an attacker can create two entirely different files that produce the exact same hash. Relying on these hashes can expose you to significant security risks, including data tampering and impersonation. |
5d41402abc4b2a76b9719d911017c592 |
| Master Created Date/Time | Date | The Master Created Date/Time derived all the other date fields collected from the document. The Master Date is populated by the first date present in this prioritized list: 1. meta.eml_CreationTime 2 earliest eml date/time field from all available 3. meta.metadata_created_datetime 4. parent file Master Created Date/Time 5. meta.archive_created_datetime (date stored for file inside the archive) 6 meta.uploaded |
2019-12-13T19:53:10Z |
| Master Created Date/Time Source | Keyword | The name of the field used to populate the Master Created Date/Time. | Email Created Date/Time |
| Master Created Date/Time Source ID | Keyword | The document search ID associated with the Master Created Date/Time Source. | 2FG2G55FGF |
| Master Modified Date/Time | Date | The Master Modified Date/Time derived all the other date fields collected from the document. The Master Date is populated by the first date present in this prioritized list: 1. meta.eml_ClientSubmitTime 2. meta.eml_LastModificationTime 3. oldest eml date/time field from all available 4. meta.metadata_modified_datetime 5. parent file Master Modified Date/Time meta.archive_lastmodified_datetime (date stored for file inside the archive) 7. meta.uploaded |
2019-12-13T19:53:10Z |
| Master Modified Date/Time Source | Keyword | The name of the field used to populate the Master Modified Date/Time. | Email Modified Date/Time |
| Master Modified Date/Time Source ID | Keyword | The document search ID associated with the Master Modified Date/Time Source Field. | 2FG2G55FGF |
| Name | Text | The file name (file_name), or in the case of emails, the email subject (subject). | “Team_Meeting_Report.pdf” |
| Page Count | Long | The number of pages contained within the document. | 10 |
| Parent ID | Keyword | The search ID of the file from which a file was extracted. This can be an attachment, an embedding, or contained in a container file. | 2FG2G55FGF |
| PII Tags | Keyword | A list of PII element types detected in the file. | Name; Phone Number; SSN |
| Preserved Created Date/Time | Date | The file created date/time as recorded by the file system and preserved within the Zip or archive container. | 2019-12-13T19:53:10Z |
| Preserved Modified Date/Time | Date | The file last modified date/time as recorded by the file system and preserved within the Zip or archive container. | 2019-12-13T19:53:10Z |
| Processing Status | Keyword | Processing’s final status in the Review module. | Done; Extraction Incomplete |
| QA Change Reason | Keyword | The applied change reason tagged by QA. | |
| QA Reviewer | Keyword | Names of reviewers who review documents in the QA batch type. | Jane Doe |
| QA Reviewer Email | Keyword | The emails of the reviewers who review document in QA batch type. | Gareth Keenan keenan.gareth@gmail.com |
| QA Status | Keyword | The QA status of the document. The status could be either “QA Accepted,” “QA Pending Review,” or “QA Reject,” | QA Accepted, QA Pending Review, QA Reject |
| Recipient Count | Long | The number of recipients in an email. | 4 |
| Recipient Domain | Keyword | The email domain/domains of the email recipients. | gmail.com; school.edu; govagency.gov; organization.org |
| Review Status | Keyword | The review status of the document. The status could be either “Reviewed,” “Pending Review,” or “Not Batched,” | Reviewed, Pending Review, Not Batched |
| Reviewer | Keyword | Names of reviewers who review documents in the Review batch type. | Jane Doe |
| Reviewer Email | Keyword | The emails of the reviewers who review document in Review batch type. | Gareth Keenan keenan.gareth@gmail.com |
| Sender Domain | Keyword | The email domain of the email sender. | gmail.com; school.edu; govagency.gov; organization.org |
| SHA1 Hash | Keyword | The SHA1 hash value of the file. NOTE: The Canopy application calculates and uses the SHA256 hash, which is our recommended standard for data integrity. For compatibility with some client tools and processes, Canopy also provides MD5 and SHA1 hashes. WARNING: MD5 and SHA1 are cryptographically broken and should not be used for security-sensitive purposes. They are highly susceptible to collision vulnerabilities, meaning an attacker can create two entirely different files that produce the exact same hash. Relying on these hashes can expose you to significant security risks, including data tampering and impersonation. |
aaf4c61ddcc5e8a2dabede0f3b482cd9aea9434d |
| SHA256 Hash | Keyword | The SHA256 hash value of the file. | c604a6840d44c89df5ff8b5a5c5e943be565735f4bbeb3ddb692ff58bbf6993c |
| Source Container | Keyword | The original source container of the file uploaded from UI. | Master Demo.zip |
| Source Path | Keyword | The full directory path of the file within the container. “Source Path” does not include the file’s Name in the path. | Master Demo.zip/Master Demo/Demo Files/Long Thread/Threading.pst/Top of Outlook data file/Inbox |
| Subject | Keyword | The subject line of emails or documents, automatically extracted from their metadata. | “Finance Report” |
| Suggested Entity Count | Long | The total count of entities that Canopy’s system has automatically identified. | 10 |
| Tags | Keyword | The list of user-created tags applied to the file. | Sensitive; Public; Private |
| Text Length | Long | The length of documents in number of text characters. | 989 |
| Text Source | Keyword | The method by which text content was obtained from a file during and after Processing, and is ready for review. | OCR Text, Transcription, Extraction |
| Title | Keyword | The descriptive name of the document, automatically extracted from the file’s metadata. | “Finance Report” |
| To | Keyword | The name, when available, and email address of the recipient/recipients of an email message. | Gareth Keenan keenan.gareth@gmail.com |
| Total PII Count | Long | The number of all PII detected in a document. | 10 |
| Uploaded End Date/Time | Date | The date/time when the document upload is completed in Canopy. | 2020-01-30 00:00:00.000 |