Internal Extractors


Now I have the data just the way I want it. It still has to get into another system that I can work with.

Loading, the important last step in ETL.


There are two aspects of Loading. The first is where do I send the data. The second is what in what format should it be delivered.

In regards to sending the data, it can either be pushed into the system or written to disk for pickup. The Extractors offer both possibilities and even allow for them to both happen for the same job. Some of the advantages of this are seen in testing and having data backups of extracted data which might not be available later or too costly to crawl again.


The Extractors can push the data into just about any system. Some of the more common ones are:

  • MarkLogic's XML Server
  • eXist the Apache XML Server
  • Databases of all kinds i.e. Oracle, MSSQL, Postgres
  • Autonomy's IDOL server
  • Lucene / Solr Search

For these systems and others, the Extractor sends the data directly to the systems over their APIs for immediate ingestion. This can even be done over a secure connection such as the ecrypted transmission protocal of sxcc for MarkLogic.

In regards to writing to disk, this can also be done in any specified formats and even customer defined formats. The most common formats though are:

  • XML
  • CSV
  • XLS
  • TXT
  • IDX

Should you have any questions or want to know about support loading into a specific system or delivering a format, please contact us directly or just fill out the form on the right.

