Our Technology >
GDM Components
Various Types of MetaCarta GDMs
MetaCarta Geographic Data Modules make up the core of MetaCarta products and hosted solutions. A GDM is a knowledge base used to identify and disambiguate geographic references, assign latitude/longitude coordinates, and confidence scores and relevance ranking. Each MetaCarta GDM contains linguistic statistics, gazetteer data, and natural language processing (NLP) logic.
MetaCarta currently provides a base GDM containing more than 10 million place name variations. In addition, MetaCarta provides several GDMs for language or industry-specific use.
With GDMs, MetaCarta has solved the problem of geographic entity resolution. Entity resolution takes a big step beyond entity extraction by providing an interpretation of the author's intended meaning. For example, within a document, instead of simply saying "this string is a name of a place," entity resolution would say "the author means this specific place" as identified by, say, an additional geographic reference within the same document.
GDM Components
1) Linguistic Statistics
The technical discipline of NLP is an area of computer science dealing with languages developed by humans for communication with other humans. Natural languages tend to be ambiguous, especially compared to machine languages. Geographic meaning resolution is a specific subfield of NLP.
Every MetaCarta GDM contains NLP logic which is used to:
- Recognize the jargons/data types that represent geographic entities; this includes the handling of name variants and contextual rules
- Disambiguate names and establish GeoConfidence
- Establish GeoRelevance
MetaCarta's NLP model uses a combination of linguistic statistics, which are measurements generated from manually tagged documents. MetaCarta's team of manual taggers establishes "ground truth" by annotating collections of documents in many genres and languages with geographic metadata. These carefully checked ground truth documents are used to generate linguistic statistics that form the core of MetaCarta's GDM-based products.
Linguistics statistics allow MetaCarta solutions to go beyond simple entity extraction and move into entity resolution - a.k.a. "disambiguation." People have named a huge number of places on Earth, and even on other planets.
Conservative estimates indicate that hundreds of millions of places have colloquial and formal names. While some of these places are widely known, the vast majority are referenced less often, because fewer people know them. This type of pattern is known as a "long-tailed distribution." In statistics, a long-tailed distribution describes any process where a large number of seemingly rare events occur. To see this property of geographic references in text, one can plot a curve with the number of mentions on the Y-axis and the number of locations with that many references on the X-axis — less frequently mentioned places account for huge number of references! Resolving geographic meanings from the long tail of geography requires large amounts of geo-linguistic data. With so many ways of referring to places, the analogue of the statistical translation table must capture a wide spectrum of contexts.
Instead of aligning a translated text with the original, the statistics relevant to geographic resolution come from counting the co-occurrence of manually identified location references with linguistic and syntactic clues. That is, one takes a document that contains references to places, has a human mark up the document with metadata indicating which substrings refer to particular locations, and then has a training system count how frequently various clues co-occur with these location references. By repeating this process with many manually tagged documents, the training system develops nuanced co-occurrence statistics that embody how real human authors refer to places.
MetaCarta generates linguistic statistics from documents that manual taggers (humans) have marked up with geotags. These geo-linguistic statistics are the foundation of MetaCarta's products.
Human beings have a remarkable ability to derive useful information from ambiguous and under-specified references using real-world knowledge and experience. They know, for example, that a reference to a place called “Madison”, in the absence of a state, is more likely to refer to “Madison, Wisconsin” than the smaller “Madison, Iowa”; and they know that James Madison and the Madison family do not refer to places at all.
MetaCarta imitates this human process using a combination of heuristics and data mining. We begin with a gazetteer described above, and the enclosure relationship between regions and points. A given name may refer to several points or regions, or refer to a non-geographic concept. To deal with ambiguity, for every potential reference of a name to a location point, we estimate the confidence that the written name really refers to a specific point. The relevance of the document to each mentioned location must also be determined, in order to present the results that best satisfy the need for both correctness and relevance to a query.
Back to top >>
2) Gazetteer
People have named hundreds of millions of locations. So far, civilization has only gathered a fraction of these names in digital collections called gazetteers. MetaCarta continuously gathers additional gazetteer data, because information about less commonly known locations tends to create additional insight for knowledge workers.
The MetaCarta gazetteer is a dictionary of geographic placenames and associated data about the placenames. Placenames can include any natural or manmade object that has a known location, such as continents, oceans, countries, states, provinces, regions, counties, cities, towns, landmarks, buildings and road names. The MetaCarta gazetteer is one of the largest collections in the world.
The gazetteer within any MetaCarta GDM contains:
- Name, e.g. "New York"
- Name variants, e.g. "Big Apple"
- Latitude & Longitude, e.g. 40.71416855, -74.00639343
- Container information (e.g. county, state, country)
- Polygons (for map drawing)
MetaCarta takes full advantage of gazetteers like the NGA Geographic Names Server and the USGS Geographic Names Information System. In addition, MetaCarta leverages other sources that include country gazetteers, lists of schools, hospitals, notable buildings, local landmarks, oil wells, platforms, fields, basins, government facilities, religious sites, and others.
Back to Top>>
Various Types of MetaCarta GDMs
1) MetaCarta Base GDM
The MetaCarta Base GDM is included with all MetaCarta products and hosted solutions and contains:
- More than 6 million unique geometric entities and 10 million name variations for:
- Cities and towns
- Administrative divisions (counties, states/provinces, countries)
- Regions and continents
- Natural physical features (mountains, bodies of water, canyons, etc.)
- Certain man-made features (prominent buildings, roads, tunnels, etc.)
- Global coverage from sources including NGA GNS, USGS NGIS, 10 additional major sources and dozens of smaller sources
- Location information patterned references including:
- Relative references (20 miles southeast of New Orleans)
- Military Grid References System (MGRS), e.g. “36SWF2248402617” means 36° 10'5''N 33° 15'0''E
- Universal Transverse Mercator (UTM), e.g. “357973N527260E ZONE 38” means 3° 14'19''N 45° 14'43''E
- Decimal Degrees (including transcription variants)
Back to Top>>
2) MetaCarta IHS Global Oil and Gas GDM
For oil and gas users, MetaCarta in partnership with IHS has created the MetaCarta IHS Global Oil and Gas GDM containing industry-specific geographic locations. Leveraging the strength of the IHS global data coverage, the new GDM includes wells, blocks, fields, basins and other oil and gas geography.
The MetaCarta IHS Global Oil & Gas GDM layers on top of the MetaCarta Base GDM and adds:
- More than 2.5 million unique geometric entities and an additional 2.7 million name variations
- Global coverage from sources including IHS, US Minerals Management Service and several additional sources
- Location information patterned references including:
- Lease Blocks
- Fields
- Platforms
- Wells
- Basins
Back to Top>>
3) MetaCarta U.S. Street Address GDM
The MetaCarta U.S. Street Address GDM is used to identify and locate United States street addresses. In addition to millions of actual street addresses, the MetaCarta U.S. Street Address GDM can interpolate between known addresses to determine the location of still more possible addresses.
The MetaCarta U.S. Street Address GDM layers on top of the MetaCarta Base GDM and adds:
- U.S. Street Geocoding for more that 100 million mail delivery points
- Global coverage from various sources including GDT
- Location information patterned references including a wide range of U.S. street address variations
Back to Top>>
4) MetaCarta Spanish Language GDM
The MetaCarta Spanish Language GDM enables MetaCarta GeoTagger to automatically identify the language and character set of each document and assign latitude and longitude coordinates and country code tags to each place name in the document. This GDM can identify Spanish placenames within all-Spanish documents, or mixed language documents (e.g. an English document containing a native Spanish location reference).
Eptisa is the exclusive value added reseller (VAR) in Spain and Portugal to offer MetaCarta's solutions, including the Spanish GDM.
The MetaCarta Spanish Language GDM can process unique Spanish linguistic patterns, prepositions, prefixes, suffixes, and can handle unique script identifiers such as:
- Tilde over 'n' (Ñ)
- Acute accent over vowels (á é í ó ú)
- Extra punctuation at beginning of questions (¿) and exclamations (¡)
|
 (Click Image to Enlarge)
|
Back to Top>>
5) MetaCarta Arabic Language GDM
The MetaCarta Arabic Language GDM enables MetaCarta GeoTagger to automatically identify the language and character set of each document and assign latitude and longitude coordinates and country code tags to each place name in the document. This GDM can identify Arabic placenames within all-Arabic documents, or mixed language documents (e.g. an English document containing a native Arabic location reference).
The MetaCarta Arabic Language GDM identifies can process Arabic numerical formats, anthroponyms (personal names), laqabs (nicknames or titles), and nisbas (adjectives).
|
 (Click Image to Enlarge)
|
Back to Top>>