Learn / OCR PDF

OCR Accuracy Tips (Get Better Text Recognition Results)

OCR is not magic — the quality of your results depends on scan quality, settings, and preparation. Here are the proven tips to get the most accurate text recognition from your scanned PDFs.

Apply these tips and try OmnisPDF's OCR Scanner (Pro).

OCR Scanner

Scan at the Right Resolution (DPI Matters)

Resolution is the single biggest factor in OCR accuracy. DPI (dots per inch) determines how much detail your scanner captures. Here is what to aim for:

  • 1.300 DPI — the standard. This is the recommended resolution for most text documents. It provides enough detail for OCR to recognize characters accurately without creating unnecessarily large files.
  • 2.400-600 DPI — for small text. If your document has footnotes, fine print, or text smaller than 10 points, increase the resolution. The extra detail helps OCR distinguish between similar characters like 'l' and '1', or 'O' and '0'.
  • 3.Below 200 DPI — avoid this. Low-resolution scans produce blurry characters that OCR cannot reliably recognize. If you receive a low-resolution scan from someone else, there is limited improvement possible without rescanning.
  • 4.Above 600 DPI — diminishing returns. Scanning above 600 DPI creates much larger files but does not significantly improve OCR accuracy for standard printed text. Save storage space and processing time by staying at 300-600 DPI.

Optimize Lighting and Contrast

1

Use even, consistent lighting

Uneven lighting creates shadows across the page that confuse OCR. Flatbed scanners provide the best lighting. For phone scans, use natural daylight and position the document flat under even illumination — no desk lamps creating diagonal shadows.

2

Maximize text-to-background contrast

Black text on white paper gives the best OCR results. If your document has light gray text, a colored background, or a yellowed page, increase the contrast in your scanner settings. Higher contrast makes character edges sharper and easier to recognize.

3

Clean up phone scans first

Phone cameras introduce perspective distortion, shadows, and uneven exposure. Before running OCR, use OmnisPDF's Phone Scan Cleanup tool to automatically correct these issues. The cleaned-up version will produce significantly better OCR results.

Fix Page Orientation and Skew

OCR engines expect text to run in straight horizontal lines. When a page is skewed (slightly rotated) or upside down, accuracy drops dramatically. Here is how to fix common orientation problems:

  • Straighten skewed pages. Even a 2-3 degree skew can cause OCR errors. If your scan looks slightly tilted, use Rotate PDF to correct the orientation before running OCR.
  • Fix upside-down pages. If any pages in your PDF are rotated 180 degrees, OCR will either fail completely or produce gibberish. Rotate them right-side-up first.
  • Handle mixed orientations. Some documents mix portrait and landscape pages. Make sure each page is oriented so the text reads left-to-right, top-to-bottom before processing.
  • Use Phone Scan Cleanup for auto-correction. The Phone Scan Cleanup tool automatically detects and corrects skew in phone-captured documents, saving you the manual effort.

Select the Correct Language

Why Language Selection Matters

OCR engines use language-specific models that include character sets, dictionaries, and grammar rules. When you tell the OCR tool that your document is in English, it knows to look for the Latin alphabet and uses an English dictionary to resolve ambiguous characters. Setting the wrong language forces the engine to use the wrong character set, which can cause widespread errors.

Multilingual Documents

If your document contains text in multiple languages (for example, an English document with Spanish names or French legal terms), select the primary language. The OCR engine will handle occasional words from other Latin-based languages reasonably well. For documents that are roughly half in each language, you may need to run OCR twice with different language settings.

Non-Latin Scripts

Documents in Chinese, Japanese, Korean, Arabic, Hindi, or other non-Latin scripts require selecting the specific language. The character recognition models for these languages are completely different from Latin-based models, and using the wrong one will produce meaningless output.

Prepare Your Document Before Scanning

A few minutes of preparation before scanning can save you from hours of manual correction after OCR. Here are the highest-impact steps:

  • Flatten the page. Wrinkles, folds, and curled edges create shadows and distortion. Place the document flat and use a book or glass to hold it down if needed.
  • Clean the scanner glass. Dust, smudges, and fingerprints on the scanner glass appear as noise in the scan and can be mistaken for characters or punctuation by the OCR engine.
  • Use the best copy available. If you have access to multiple copies of a document (original, photocopy, fax), always scan the one with the sharpest, darkest text.
  • Remove staples and paper clips. These create shadows and can cause the page to sit unevenly on the scanner, producing skewed scans.
  • Consider the output format. If you need to extract data into a spreadsheet after OCR, use PDF to Excel. For editable text, use PDF to Word. For raw text, use PDF to TXT.

Ready to Get Accurate OCR Results?

Apply these tips and upload your scanned PDF to OmnisPDF's OCR Scanner for the best possible text recognition.

Try OCR Scanner (Pro)

Frequently Asked Questions

What resolution should I scan at for OCR?

Scan at 300 DPI for standard text documents. For documents with small fonts (below 10pt), scan at 400-600 DPI. Scanning below 200 DPI will produce noticeably worse OCR results.

Does color vs. grayscale affect OCR accuracy?

For text-only documents, grayscale or black-and-white scans often produce better OCR results because there is more contrast between the text and background. Color scans are better when the document has colored text or colored backgrounds that affect readability.

Why is my OCR output full of errors?

Common causes include low scan resolution (below 200 DPI), skewed or rotated pages, poor lighting causing shadows, low contrast between text and background, or selecting the wrong language in the OCR settings. Fix these issues and re-run OCR for better results.

Can I improve OCR results on a document I already scanned?

Yes. You can improve an existing scan by adjusting contrast, straightening skewed pages, and removing noise using image editing software or OmnisPDF's Phone Scan Cleanup tool. Then re-run OCR on the improved version.

Does the font type affect OCR accuracy?

Yes. Standard fonts like Arial, Times New Roman, and Calibri produce the highest OCR accuracy. Decorative, script, or very thin fonts are harder to recognize. Handwritten text is the most challenging — see our guide on OCR and handwriting.

How accurate is modern OCR?

On clean, high-resolution scans with standard printed text, modern OCR achieves 95-99% character accuracy. This means on a page of 2,000 characters, you might see 20-100 that need correction. Accuracy drops with poor scan quality, unusual fonts, or complex layouts.