PHP Apache Tika – open source PHP OCR library allows developers to detect and extract metadata, HTML & structured text content from Pdf, DOCX, Images (JPEG, PNG) & other documents....Extract Text & Metadata from PDF and Images Open Source PHP Optical...Text, Metadata and HTML from PDF, DOCX, Images (JPEG, PNG) & Other...