本文主要介绍JAVA版本的OCR演示程序的运行和使用方法

1.下载与安装

在LEADTOOLS网站可以下载leadtools 试用版。本文所述版本为“LeadTools 全功能试用版”如需了解如何安装试用版，请查看试用版安装指南

安装完成后，在LEADTOOLS 主目录LEADTOOLS 19下，打开LEADTOOLS 19\Examples\Java\OcrDemo目录即为项目目录。

同时，需要找到安装试用版时，使用的license压缩包，解压其中的lic后缀文件到固定位置（最好在项目目录中）

2.操作流程

2.1使用Eclipse打开项目

打开后界面如下

2.2配置license

点击图中DemoUtilities.java文件，打开后修改LicenseFile的值为刚才解压的lic文件地址，developerkey的值为license压缩包中后缀为key文件的文本内容（使用记事本打开可见）。

2.3添加中文语言包（可选）

如需要识别中文需要添加中文识别语言包，如果不在此处添加的话，也可在运行参数中加参数设置

打开OcrDemo.java，搜索”startup“，在此语言之后添加下图黄色标注部分语句即可（如需识别其他附加语言，也可按需添加）。

2.4运行程序进行文字识别

进行文字识别需要在运行时加入参数，如果不加任何参数运行，会有提示消息如下面蓝字部分，提示用户加入参数运行

如无其他要求，只对整篇文档或者图片进行识别的话，输入 –i –o -f即可（输入文件路径，输出文件路径，输出格式）

如：-i "/home/user/LEADTOOLS19/Images/leadtools.pdf" -o "/home/user/LEADTOOLS19/Images/leadtools.docx" -f DOCX

参数加入方法，点击Run按钮即可生成识别成功的输出文件

如果有其他方面的需求，也可以根据下面的给出的蓝字提示中的信息自行添加

LEADTOOLS libPath: D:\LEADTOOLS 19 - 副本\Bin\Java\..\CDLLVC10\x64
The demo recognizes input image and saves the recognition results into one of the supported document formats using the below arguments:
-i input_file_path [-o output_file_path] [-f document_format] [-fp first_page] [-lp last_page] [-l language_code] [-z left,top,width,height] [-d engine_directory] [-ca characters_attributes_XML_Path] [-pi processed_image_path]
For example:
-i "/home/user/LEADTOOLS19/Images/leadtools.pdf" -o "/home/user/LEADTOOLS19/Images/leadtools.docx" -f DOCX -fp 1 -lp 3 -l en -z 50,50,150,150 -d "/home/user/LEADTOOLS19/Bin/Common/OcrAdvantageRuntime" -ca "/home/user/LEADTOOLS19/Images/leadtools.xml" -pi "/home/user/LEADTOOLS19/Images/processedimage.tif"

-f document_format, if this argument is not used, the result will be printed out as text into the console:
PDF: PDF with no embedded fonts
PDF_EMBED: PDF with embedded fonts
PDFA: Adobe Portable Document Archive (PDFA) file.
PDF_IMAGE_OVER_TEXT: PDF with image over text and with no embedded fonts.
PDF_EMBED_IMAGE_OVER_TEXT: PDF with image over text and with embedded fonts.
PDFA_IMAGE_OVER_TEXT: Adobe Portable Document Archive (PDFA) file with image over text.
DOCX: Non framed Microsoft Word Document file.
DOCX_FRAMED: Framed Microsoft Word Document file.
RTF: Non framed Rich Text Format file.
RTF_FRAMED: Framed Rich Text Format file.
TEXT: Non Formatted UTF-8 Text file.
TEXT_FORMATTED: Formatted UTF-8 Text file.
SVG: Scalable Vector Graphics file.
ALTO_XML: Analyzed Layout and Text Object XML file.

-l language_code:
This optional argument must be one of the supported ISO 639-1 language codes. For example:
en: English
fr: French

-z left,top,width,height:
This optional argument sets zone bounds, which will be used only on the first page of the document. All other pages (if any) will be auto-zoned.

-ca:
This optional argument sets an XML path for the recognized data with characters attributes mode. The XML will be saved for the first page only.

-pi:
This optional argument sets a path for the returned processed image. The image will be saved for the first page only.

3.注意事项

Java版本的OCR程序只能使用advantage识别引擎，对中文的识别准确率相对较低，如果对此识别效果不满意，可以考虑采用支持.Net开发的professional识别引擎，可以很好的识别中文内容。

产品支持论坛：

您在使用产品过程中有任何疑问，可以登录葡萄城开发者社区进行交流：了解更多

了解LEADTOOLS 产品更多特性：

http://leadtools.grapecity.com.cn/

下载产品体验产品功能：

http://leadtools.grapecity.com.cn/downloads/

关于葡萄城

葡萄城是专业的软件开发技术和低代码平台提供商，以“赋能开发者”为使命，致力于通过表格控件、低代码和BI等各类软件开发工具和服务，一站式满足开发者需求，帮助企业提升开发效率并创新开发模式。葡萄城开发技术始于1980年，40余年来始终聚焦软件开发技术，有深厚的技术积累和丰富的产品线。是业界能够同时赋能软件开发和低代码开发的企业。凭借过硬的产品能力、活跃的用户社区和丰富的伙伴生态，与超过3000家合作伙伴紧密合作，产品广泛应用于信息和软件服务、制造、交通运输、建筑、金融、能源、教育、公共管理等支柱产业。