Hardcopy Discovery – Unitization

When collecting documents in a litigation case, there can be electronic documents as well as hardcopy documents. Most of the time the hardcopy documents will be sent to a vendor to scan into a TIFF format that can easily be added to a document review database.

The hardcopy documents will most likely include staples, paper clips, binder clips and rubber bands in addition to folders, redwelds and binders. The vendor will ask a bunch of questions about how they should organize the documents during the scanning process.

One of the primary questions relates to “unitization” which can be described as “defining where a document begins and ends”. Every new document will equate to a new record in the database. The goal is to create a database that reflects how the hardcopy documents are organized. For instance, if a cover letter has several attachments then that should be reflected in the database or if a binder contains different documents, separated by tabs, the database should show that these documents are related, yet separate.

There are 4 database fields that contain the pertinent unitization information. They are BegDoc, EndDoc, BegAttach and EndAttach. The BegAttach and EndAttach fields describe what we refer to as an “attachment range”. An attachment range is used to reflect when there are related documents.

Let's imagine that a one-page cover letter will be numbered ABC000001. There are two attachments to the cover letter.  The first two-page attachment will be numbered ABC000002 – ABC000003. The second one-page attachment will be numbered ABC000004.

In the database, there will be 3 documents, each with a BegDoc and EndDoc as described above. In addition, for each of the 3 documents, the BegAttach and EndAttach fields will be populated with the first page of the first document (ABC000001) and the last page of last attachment (ABC000004). As shown in the image below, it is easy to see that the BegAttach and EndAttach fields are showing the relationship between the 3 documents. It is implied that these 3 documents were somehow related to each other, either physically or logically.

We refer to physical unitization when we define the beginning and ending of a document based on how the hardcopy documents are organized via physical objects like staples, paper clips, binder clips, rubber bands, folders and binders. We refer to logical unitization when we define a document by reading through the documents and determining the document relationships based on dates, authors, references within the text, etc. Logical unitization is also referred to as LDD or Logical Document Unitization.

The vendor will charge an additional fee to perform logical unitization. The task is performed after the documents are scanned and the turnaround time will be longer. Another option is to have a paralegal make decisions about unitization prior to sending the hardcopy documents to the vendor. Colored paper (slip-sheets) can be inserted between the hardcopies to show where a document break should occur. Using two different colored slip-sheets, we can tell the scanning vendor where a new attachment range (green slip-sheet) should begin.

Hardcopy unitization can be a little tricky. Decisions are made on the front-end that may or may not align with how the documents are interpreted during a document review or during the discovery phase. For instance, some attorneys will prefer that multiple documents are scanned together and are reflected in one database record, because they consider the documents to be one thing. On the flip side, other attorneys may prefer that documents not be combined together because they want to separate out one document and treat it differently.

In my experience, it is best to err on the side of separating the documents so that they can easily be tagged different ways within the database and easily produced separately. Keep in mind that it is difficult and time consuming to separate documents after they have been loaded into a database.


    Amy is a legal industry educator, passionate about helping legal professionals succeed. She even quit her day job to devote more time!

    Please note: I reserve the right to delete comments that are salesy, offensive or off-topic.

    • mgolab

      Thanks Amy. In the Australian market we refer to Unitization as delimiting. In addition to being quite a challenge with hardcopy documents, it is also very prevalent when a client scans documents or complete folders and sends them to you as a jumbo sized PDF.

      • LitSuppGuru

        Matthew – I just love it when you share the comparisons between our countries. It is very interesting. Yup, large PDFs with a bunch of documents is done here too. I just did a production last week that had a bunch of large PDFs because people at the corporation were creating 500 page PDF files and e-mailing them to each other.

    • Keeping documents separate or combined shoud not be a matter of personal preference. There should be a standard that requires all documents to be separated into minimal logical entities. Why? To quickly and efficiently identify duplicates and near-duplicates. It becomes nearly impossible, if documents are untized according to somebody’s whim.

    • Heather Townsend

      I am definitely going to print out the picture/diagram as a reference.

      • LitSuppGuru

        Hey Heather — welcome back to DC land. I’m so glad you decided to set down some roots nearby. I look forward to seeing how your career progresses.

    • Pingback: What is Litigation Support? Legal Technology Expert()