Pattern Recognition and Machine LearningDespite large numbers of papers written on pattern recognition and machine learning and numerous experiments purporting various methods to be superior to various other methods, the choice and application of pattern recognition and machine learning algorithms still remains a black art.The goal of our projects in pattern recognition is to bring sound mathematical and scientific principles to bear on evaluating and comparing pattern recognition and machine learning algorithms, to characterizing data sets, and to creating tools that automate the selection, deployment, and testing of pattern recognition method. OCR, Information Retrieval, Digital LibrariesDocument analysis deals with the visual and geometric analysis of document images. The goal is to recover textual content, geometric structure, and logical structure. With the recent resurgent interest in digital libraries and large-scale scanning operations by organizations like Google, Microsoft, and the Internet Archive, document analysis has become a very important real-world problem again. We are addressing document analysis at all levels: camera-based document and book capture, OCR and handwriting recognition, document retrieval, and document enhancement. In addition to its practical applications, document analysis is also an important test cases for more general computer vision and machine learning algorithms due to the availability of large amounts of correctly ground-truthed data. OCR and Layout Analysis: OCRopus project OCRopus demo page layout analysis demo Camera-Based Document Capture: OSCAR camera-based document capture demo document dewarping demo Content Analysis and Information Extraction: appearance-based document retrieval demo bibliographic reference recognition demo Computing for the Humanities: historical document analysis/comparison demo Additional demos are listed on the IPeT Demo Page For a general overview, please see The OCRopus Open Source OCR System Document Image Security and Document ForensicsPaper-based documents are widely used for identification, authentication, and legal purposes. Forgery of such documents is a major component of insurance fraud, immigration fraud, tax evasion, and other white collar crime. Although optical security measures like holograms and special paper are partial solutions in areas such as currencies and passports, they are expensive and are not applicable when the creation of the document is not under the control of the organization needing to verify the documents. We are developing techniques that allow the authenticity of ordinary paper documents to be verified using optical techniques.
Image and Video Content AnalysisLarge amounts of images and videos are captured in numerous context: consumer digital imaging, surveillance, industrial inspection, satellite imagery, astronomy, and many other areas. We are applying image processing, pattern recognition, and machine learning techniques to problems such as the detection of anomalous behaviors, defect detection in industrial inspection, quantitative analysis of large amounts of astronomical image data, and media asset management. Intelligent Network Security
Today, network security relies largely on systems techniques like
secure protocols and rule/pattern-based methods.
|