Scan at the Right Resolution (DPI Matters)
Resolution is the single biggest factor in OCR accuracy. DPI (dots per inch) determines how much detail your scanner captures. Here is what to aim for:
- 1.300 DPI — the standard. This is the recommended resolution for most text documents. It provides enough detail for OCR to recognize characters accurately without creating unnecessarily large files.
- 2.400-600 DPI — for small text. If your document has footnotes, fine print, or text smaller than 10 points, increase the resolution. The extra detail helps OCR distinguish between similar characters like 'l' and '1', or 'O' and '0'.
- 3.Below 200 DPI — avoid this. Low-resolution scans produce blurry characters that OCR cannot reliably recognize. If you receive a low-resolution scan from someone else, there is limited improvement possible without rescanning.
- 4.Above 600 DPI — diminishing returns. Scanning above 600 DPI creates much larger files but does not significantly improve OCR accuracy for standard printed text. Save storage space and processing time by staying at 300-600 DPI.
Optimize Lighting and Contrast
Use even, consistent lighting
Uneven lighting creates shadows across the page that confuse OCR. Flatbed scanners provide the best lighting. For phone scans, use natural daylight and position the document flat under even illumination — no desk lamps creating diagonal shadows.
Maximize text-to-background contrast
Black text on white paper gives the best OCR results. If your document has light gray text, a colored background, or a yellowed page, increase the contrast in your scanner settings. Higher contrast makes character edges sharper and easier to recognize.
Clean up phone scans first
Phone cameras introduce perspective distortion, shadows, and uneven exposure. Before running OCR, use OmnisPDF's Phone Scan Cleanup tool to automatically correct these issues. The cleaned-up version will produce significantly better OCR results.
Fix Page Orientation and Skew
OCR engines expect text to run in straight horizontal lines. When a page is skewed (slightly rotated) or upside down, accuracy drops dramatically. Here is how to fix common orientation problems:
- ✓ Straighten skewed pages. Even a 2-3 degree skew can cause OCR errors. If your scan looks slightly tilted, use Rotate PDF to correct the orientation before running OCR.
- ✓ Fix upside-down pages. If any pages in your PDF are rotated 180 degrees, OCR will either fail completely or produce gibberish. Rotate them right-side-up first.
- ✓ Handle mixed orientations. Some documents mix portrait and landscape pages. Make sure each page is oriented so the text reads left-to-right, top-to-bottom before processing.
- ✓ Use Phone Scan Cleanup for auto-correction. The Phone Scan Cleanup tool automatically detects and corrects skew in phone-captured documents, saving you the manual effort.
Select the Correct Language
Why Language Selection Matters
OCR engines use language-specific models that include character sets, dictionaries, and grammar rules. When you tell the OCR tool that your document is in English, it knows to look for the Latin alphabet and uses an English dictionary to resolve ambiguous characters. Setting the wrong language forces the engine to use the wrong character set, which can cause widespread errors.
Multilingual Documents
If your document contains text in multiple languages (for example, an English document with Spanish names or French legal terms), select the primary language. The OCR engine will handle occasional words from other Latin-based languages reasonably well. For documents that are roughly half in each language, you may need to run OCR twice with different language settings.
Non-Latin Scripts
Documents in Chinese, Japanese, Korean, Arabic, Hindi, or other non-Latin scripts require selecting the specific language. The character recognition models for these languages are completely different from Latin-based models, and using the wrong one will produce meaningless output.
Prepare Your Document Before Scanning
A few minutes of preparation before scanning can save you from hours of manual correction after OCR. Here are the highest-impact steps:
- ✓ Flatten the page. Wrinkles, folds, and curled edges create shadows and distortion. Place the document flat and use a book or glass to hold it down if needed.
- ✓ Clean the scanner glass. Dust, smudges, and fingerprints on the scanner glass appear as noise in the scan and can be mistaken for characters or punctuation by the OCR engine.
- ✓ Use the best copy available. If you have access to multiple copies of a document (original, photocopy, fax), always scan the one with the sharpest, darkest text.
- ✓ Remove staples and paper clips. These create shadows and can cause the page to sit unevenly on the scanner, producing skewed scans.
- ✓ Consider the output format. If you need to extract data into a spreadsheet after OCR, use PDF to Excel. For editable text, use PDF to Word. For raw text, use PDF to TXT.