
iContentCollector inspects file shares, SharePoint, Documentum and many other sources to collect and analyze content, identify & remove duplicates, recursively extract compound content types such as PST, MSG and embedded files, and can optionally exclude and filter content based on different criteria. All of this results in a reduced dependency on human labor and an overall reduction in overhead costs.
iContentCollector can perform the following:
- Deduplication – SHA1 hash calculations are made based on the file itself. Files with matching hash values exact duplicates and can be deleted and/or excluded from a collection.
- Logging – Every file inspected has a log entry. The log file contains vital content statistics such as the original source path, the hash, an indication of the file being an exclusion file, filtered type or duplicate, the file size, creation date, modified date, last accessed date, the file owner and file extension.
- Extraction – Compressed files, PSTs, MSGs, and embedded files are extracted during post collection processing. This is recursive, as embedded files may need to be further extracted.
- Exclusion – Preloading of content filters, whether they be general, as in the case of “exclude all drawing files” or specific, such as a delta collection; “exclude files that were previously collected”.
- Filtering – The ability to set file types, subfolders, date ranges and file sizes to include/exclude.
- Progress reporting – Constant status indications while the collection is running.
- Security – The collector will discover files based on the configured or currently logged on user.
- Encryption – All files are collected, compressed and encrypted.