Internal Extractors

Meta-Data and Enrichment

Gathering quality meta-data and enriching your documents with named entities like the companies, locations, and people mentioned in them can surface value within your knowledge base that was untapped previously. Just a couple examples of that can be seen in the richness that this data can bring to the search experience and the connections in can bring in research.

We specialize in getting the most out of your data that you can make the most out of your internal knowledge base or extracted external content

Named Entity Recognition

The first step is to get the meta-data that is simply available like a filename, modification date, or title in document properties or HTML tags. With a little effort, most systems gather this information.

We then go to the next step. This one is to find the structure within the chaos. We do this through our Extractors which are able to find, recognize, and extract patterns from the source system's structure. This data is then gathered from all the different sources in their original formats and converted into a unified structure which normalizes the date i.e. dates written like Jan. 1, 2010 or 01/01/10 all end up in the system the same way.

The 3rd step is to actually add meta-data to the document based on the actual flowing text within the article. This is done by either first defining what information is of interest i.e. all company names from a particular industry or using linguistic technology which extracts and categorizes nouns based on rules. If these steps are not cared for, the navigation and thus the end user experience is lacking. When this is done right, the user can easily research information and discover new data efficiently and easily allowing more time for analysis and adding value.

Connect with us
First Name:*
Last Name:*