FTF-Identify-Unsearchable-PDFs-Within-Folders

Fast Tip Friday | PDF Files

Fast Tip Friday – Identify Unsearchable PDFs Within Folders

ByAmy Bowser-Rollins 05/13/201608/02/2021

This fast tip demonstrates how to use a free tool called Count Anything to identify which PDF files are unsearchable.

Sean O'Shea shared this tip in his article entitled Where Are My Unsearchable PDFs.

Source: Sean O'Shea

Fast Tip Friday | PDF Files

Fast Tip Friday – Dynamic Stamps in Adobe Acrobat

ByAmy Bowser-Rollins 12/31/201508/02/2021

This fast tip demonstrates how use dynamic stamps in Adobe Acrobat, including how to add two new dynamic stamp files to your computer. Download Sample Files

Fast Tip Friday | MS PowerPoint

Fast Tip Friday – Remove Background From Photo Using PowerPoint

ByAmy Bowser-Rollins 12/22/201608/17/2021

This fast tip demonstrates how to make a photo transparent by removing the background using Microsoft PowerPoint.

Fast Tip Friday | PDF Files

Fast Tip Friday – Increase Search Speed Within a Large PDF

ByAmy Bowser-Rollins 06/12/201508/02/2021

Sometimes we receive very large PDF files with thousands of pages and of course we then need to run searches within the file. When the PDF file is very large, each search can take a while to run. This fast tip demonstrates how to use an embedded index to increase the search speed.

Fast Tip Friday | MS Excel

Fast Tip Friday – Convert Date-Time to Date-Only

ByAmy Bowser-Rollins 11/06/201508/02/2021

This fast tip demonstrates one way to convert a date-time value to date-only in Excel. Download Sample Files

Fast Tip Friday | PDF Files

Fast Tip Friday – Delete First Page of Every PDF File in a Folder

ByAmy Bowser-Rollins 02/05/201608/08/2021

In litigation matters, we deal with a lot of scenarios related to preparing documents for different stages of the discovery process. Many of these documents are in PDF format. There are scenarios where the lead attorney has a justified reason and may instruct a paralegal or litigation support to remove one or more pages from…

Fast Tip Friday | PDF Files

Fast Tip Friday – Search and Highlight and Extract Pages

ByAmy Bowser-Rollins 10/20/201608/02/2021

This fast tip demonstrates how to use an Adobe Acrobat action to perform searches across multiple PDF files, highlight all of the terms and extract the pages that have hits highlighted. The highlight colors available to use in the javascript are listed below: color.transparent color.black color.white color.dkGray color.gray color.ltGray color.red color.green color.blue color.cyan color.magenta color.yellow…

7 Comments

James Bell says:

05/13/2016 at 9:10 am

That is a pretty neat tool! Thank you for sharing.

Reply
1. Amy Bowser-Rollins says:
  
  05/13/2016 at 1:41 pm
  
  I agree. Sean finds good stuff.
  
  Reply
mgolab says:

05/15/2016 at 5:43 pm

Good one. It has a commandline facility as well, which means that you could have a kind of automated process (for identifying and then OCRing) if you wanted to.

Reply
1. Amy Bowser-Rollins says:
  
  05/15/2016 at 6:42 pm
  
  Hey Matthew – I noticed that too. I wouldn’t be surprised if Sean tried something like that. He’s a smart cookie.
  
  Reply
2. Eliot says:
  
  05/20/2016 at 10:08 am
  
  This command line idea has instant utility! I would be interested in seeing that work. Would it depend on a command prompt script? First, a script to call the Count Anything app to identify documents in a selection, then save a text delimited version in a specified location. Next we set up the excel spreadsheet and Identify the ones we want to OCR.
  
  Another command prompt to identify the selection of filenames, then OCR the documents in the background. I theoretically could use a VBA to do this, also, once the filenames are in excel. This solves a major issue I’m having with OCRing PDFs– the problem of one-by-one OCRing each one.
  
  I’d like to discover a way to OCR all the PDFs in the background. Right now, my database identifies the documents that need OCR, but it occupies my machine’s memory/processes to do so.
  
  Thanks for sharing Sean’s find.
  
  Reply
  1. mgolab says:
    
    05/21/2016 at 3:18 pm
    
    What I was thinking was:
    1) you have a tranche of PDFs, and you make a file/folder listing
    2) manipulate the file/folder listing so that you call up Count Anything for each file – say for example you have 2GB or 1,000 files, then what I would do is to sort this into say 4x 500MB chunks (your file listing would need to have the file size and you’d want to have an Excel formulae (or something) to work out a cumulative size)
    3) initiate each instance of the 4 commandline scripts (ie batch files) instantanesously – where each output to a unique name
    4) parse the output – I’m not a VBA wiz so this would be a manual step, however I think there is a way to parse a text file for a specific string of text where you want to know which files don’t contain text – ie count is zero or whatever the output is
    5) copy those [naughty] files that require OCRing to a ‘hot folder’ or somewhere
    6) if you have something like ABBYY or an equivalent which monitors a hot folder then it would automatically OCR the contents of the hot folder
    7) alternatively, then use your OCR weapon of choice – I’d also be looking at load balancing by cumulative file size as otherwise you get one machine completing a task quicker
    8) go get a coffee and reflect on the bad old days of how miserable you were with OCRing manually
    
    We have a fleet of virtual machines that are our workers to do things like this, not yet optimised for load balancing and farming out jobs, but still pretty good as we get to save significant time by doing lots of things in parallel.
    Good luck.
    
    Reply
Pingback: Finding the Right Resources: Terminology, Tips, and Tricks of the Trade - The Chronicle of eDiscovery

Similar Posts

7 Comments

Leave a Reply Cancel reply