Web Extractor

Formats

The Web Extractor integrates seamlessly into your business process by maintaining multiple flexible output options. Whether it be pushing data into an index or writing to file, our extractor delivers structured data in the format of your choice.

These output options can be done either individually OR simultaneously, thus increasing the potential for your data.

 

Indexing

Pushing data directly into an index is as easy as choosing your index type, and entering the both the select and update urls. There are several indexing controls such as versioning and optimization built into the extractor to keep your index clean and healthy.

Currently our supported indexes are:

  • MarkLogic
  • Lucene/SOLR
  • Autonomy IDOL
 

Writing to File

Writing the extracted data to file enables you to feed this data into your own or a third party solution for processing. The number of documents written per file is also configurable; this batch size can be changed to suit your needs.

One of the biggest issues with writing files to disk is storage. This often forgotten about issue has been dealt with by adding the option to delete the old files. This simple and clever option ensures you have only the most recent data on file and puts to rest any concerns about an ever growing folder.

Currently our supported file types are:

  • TXT
  • CSV
  • Excel
  • XML (standard)
  • XML (custom)
  • XML for MarkLogic
  • XML for Lucene/SOLR
  • XML for Autonomy IDOL

Creating new and customer specific file formats are just one of many services that we offer. If you would like more details, on which output option is the best for your system, contact us directly or fill out the form to the left, and we will get back to you soon.

Connect with us
First Name:*
Last Name:*
eMail:*
Telephone:
Comment: