Document UpLoad

Intended audience

This document is intended for people who want to understand how the Base and Tapestry library work together to upload documents: the Dithaka-Base and Dithaka-Tapestry components

Introduction

Dithaka-Base provides functionality to upload documents to the server and create indexes on the content for faster searching. Dithaka-Tapestry provides some components and pages to select the files to upload, provide description and then upload these files using the functionality in Dithaka-Base.

Implementation

The files are uploaded using the IUploadFile class in Tapestry. For each file uploaded we create and Document class which is used to store information about the file. An index is created for each file uploaded: as the file is uploaded, Lucene from Appache, POI and PDFBox are used to index the text contained in these files. The index is used for searching. The following collaboration diagram shows the main classes involved in uploading documents:

The core classes:

  • Document:
  • PDFExtractor: This class extracts text from pdf documents that are used to create an index of the content
  • WordExtractor: This class extracts text from word documents that are used to create an index of the content

Known issues and future enhancements

Some future enhancements:

  • Coming soon...

Some known issues:

  • Coming soon...