Pankaj Mathur is the Vice President Sales for the infoUSA Licensing Division. He has been with infoUSA since 2005 and is currently managing the POI data licensing relationships with local search engines, navigation partners and LBS players. He works with the product team at infoUSA and with customers on new technologies and products.
Pankaj has an MBA in finance from the Carlson School of Management at the University of Minnesota. He brings tremendous experience to the table as he has worked in accounting, as a financial analyst and in project management roles in the past.
He attended the prestigious IIT (Indian Institute of Technology) for his Undergraduate Degree. He majored in Naval Architecture and supervised the construction of merchant ships in China, South Korea, and Thailand before deciding to pursue a Masters Degree in finance.
Eric Enge: Can you provide us with some background on yourself?
Pankaj Mathur: I graduated from the Indian Institute of Technology, and came to the US for an MBA (Finance) at the University of Minnesota in 2003. My undergrad major is in Naval Architecture, an engineering discipline focused on the design and construction of ships! I worked as a Naval Arch for a little over 10 years and supervised the design and construction of over 55 merchant ships. Right around 2001, I started thinking that I want to do more and something different from the rest of my professional life and here I am. Although on hindsight, I wish if someone had explained to me how cold Minnesota can be!
Eric Enge: And how long have you been with the company?
Pankaj Mathur: I have been with the company since July 2005, and my focus is on LBS clients, particularly local search engines, Internet Yellow Pages, navigation and Telematics (this is the science of receiving, sending, and storing information via telecommunication devices.).
Eric Enge: Right. So, the navigation clients mentioned are companies like TomTom?
Pankaj Mathur: Yes. It also includes clients like TCS TeleCommunication Systems, a company based in Oakland. TCS offers a server-based solution because on the mobile handsets and connected navigation units there is no need for storing all the information (maps, POIs, etc) on the device. The information can be retrieved real-time from the server as needed. While it may sound like a small difference in terms of storage of data but the difference in terms of customer experience is quite drastic.
Think of the difference between an email and a fax. In a printed format the message from email and fax may not look much different. But fax is an image (pixel-based) while email is text (character based).
Eric Enge: Tell me a little bit about the company?
Pankaj Mathur: We have been in business since 1972 and public (Ticker: IUSA) since 1992. In 2008, we did approximately $750 Million in revenue. Historically, we have been in the business of mailing and marketing list. So, if you are a local restaurant in Charleston, South Carolina, and you want to send fliers out to all the businesses in your area advertising your lunch buffet, you come to us.
Back in 1972, our founder Vin Gupta realized that most companies (including Fortune 500) do not really know where their customers and prospects are located. So he started collecting phonebooks in his garage and started building lists.
Over the years, we evolved into a very sophisticated large scale information compilation company. Currently, we maintain a database of about 15M businesses in the US and Canada and a database of about 250 Million consumers in North America.
The company started with a compilation of publicly available phonebooks. Over time, we started getting into more marketing specific attributes like employee size, sales volume, and corporate hierarchy, etc. that are not available in phonebooks. Another drawback of the phonebook is that not every business advertises in the phonebook. Even some of the popular consumer-centric categories like gas stations, car washes, small churches, Post offices, DMVs, etc. have scant coverage in phonebooks. You also have to do multiple books in geography as each book might have some incremental coverage, this creates a challenge of duplication of records and I will talk in more detail about it.
To improve coverage, we also compile sources such as government filings, country clerk data, utilities connections and disconnects, bankruptcy filings, tourism guides. With each additional source, there are always new compilation challenges. For example, legal filings or utility data we may come across a listing for John Doe LLC that is a franchisee of Taco Bell, and also owns a gas station with a convenience store. The legal entity (John Doe LLC in this case) may not even be incorporated at the physical location of these businesses.
So, to deal with these types of problems and removing duplicates we started calling businesses in 1992 and since then we have been calling EVERY business in North America at least once a year. In 2008, we completed 30 million phone interviews and dialed over 45 million numbers for the same.
Eric Enge: Right, validation.
Pankaj Mathur: Validation. It was primarily done to increase deliverability of mailing lists back in 1991. There were no commercial applications like Google or navigation devices back then. The phone validation helps us with a few key aspects of compilation such as:
- Collect consumer-facing information as opposed to legal entity information
- Completion of name, address, categories
- Removal of duplicates (KFC or Kentucky Fried Chicken)
- Standardization of common terms in names and addresses
- Removal of out-of-business listings (phonebooks are published only once a year)
The example below highlights all of the above points.
Don’t get me wrong, I am not trying to demean the value of yellow pages; they’ve done an amazing job of establishing value in local communities and the industry has been around for several decades. But in Location Based Services (“LBS”) environment people have started using data for turn-by-turn directions & people have started using data for more specific relevancy-based search. The yellow page data is not even close to fulfilling these tasks.
Eric Enge: Yes, absolutely.
Pankaj Mathur: So this is one of the reasons why the phone verification helped us. I mean KFC and Kentucky Fried Chicken is easy for us to de-dupe. But if there is a Joe’s Pizza and Joseph’s Italian Pizzeria, which is non-franchised in some unknown small town, it is extremely difficult to identify them as the same unique.
In this perspective, the definition of business becomes important. Here is a live example, see below for D&B data with a PO Box Number and phone number that is connected to business owner residence. Compare that to InfoUSA data, which capture all four locations of the Mexican restaurant owned by Annas. In an LBS environment, what do you think people would be looking for?
Also, note that even a PO Box number can be geo-coded (to center or zip) but the turn-by-turn navigation experience will not be near as good and not to mention that users will not get any food served!
Let’s look at another example:
The problem here is that it shows COSTO with a suite number. This is probably a management office but when people do a local search for COSTO, they are looking for places where they can shop. I hope you are getting the challenge of “definition of business” in LBS environment.
Let’s take a look at the problem of duplicate results:
Even for a popular category like Pizza, you can see in the example above the duplicates as a result of variations in names and addresses. Interestingly none of the four listings gets the combination right.
Eric Enge: Yes. They either don’t have it or they have the 108th in O Street.
Pankaj Mathur: Yes. Let’s look at why this is happening:
These are two actual captions for Valentino’s pizza delivery. Valentino’s is advertising in two different sections of the phonebook, maybe restaurants, and pizza delivery. So, this helps explain some of the errors and duplication in the records, and maybe there are more. I just didn’t go deep enough.
Here is another example for a gas station in Foster City, CA
Compare this to four gas stations that InfoUSA has captured by phone call:
All four gas stations above have been in business for over 10+ years. The reason for lack of these listings in yellow pages is simple; people do not look up yellow pages for gas stations. Hence these gas stations are not advertising in phonebooks.
Eric Enge: Right. So, this is the kind of thing that validation does for you, right?
Pankaj Mathur: Exactly. Some businesses may even discontinue their listing once they have established themselves in a local community. We keep calling these businesses year after year, even after they have dropped out of phonebooks.
We are seeing a trend that the coverage of phonebooks is actually dropping. So, back in 2001, we could get almost 98% of our listings from phonebooks, but as of today, that coverage has dropped to the low 70s.
Switching gears, I always get asked about specialty content providers like Zagat and user-generated content. From what I have seen, end users are very passionate about a few selected categories like eating, drinking, personal services, and shopping. Thus people may feel excited about a restaurant, a new IKEA store opening or a hair stylist. But InfoUSA compiles over 10,000 categories and a large chunk of these ignored by users for lack of personal interest.
Even across the popular categories where people have a personal interest, the users often end up creating duplicates (just like phonebooks) on account of variations in names and addresses. There is also a problem of ownership of reliably tracking out-of-business listings.
The restaurant below has ceased operations in Dec 2007 but is still being shown in search results as late as February 2009!
Eric Enge: Let’s look at your InfoUSA verses enhanced content example. Who wrote this description under products and services?
Pankaj Mathur: Actually, no one. And, that’s exactly the point I am trying to make. A lot of this so-called enhanced data that you find in the marketplace has been modeled. I am sure you can write at least 10 or 20 attributes about McDonald’s just from being there once.
In the example above, the In-N-Out Burger actual address is 260 Washington Street, Daly City, CA, 94015. The above discrepancy is happening because low-cost data vendors try to backfill information such as city name, state and zip (these are not published in phonebooks) to complete addresses. Thus, in this case, the data provider ended up creating a valid address, which is 260 Washington St, San Francisco, CA, 94111 as Washington St runs right through San Francisco and Daly City. The only problem is that there is no restaurant at the above-published address.
Here is another example of modeled information on COSTO that does not really exist!
Unfortunately, in the example of In-N-Out Burger and COSTO, it is always the end user of a local search application that discovers the inaccuracy of data the hard way. Can you imagine the impact such user experience on your brand name? I would probably not use the application ever again.
One important point for all the example shown so far, every data set will have inaccuracies and having one or two records wrong does not statistically amount to much of an interpretation. But the examples are good to enough to measure or think about the reliability of the data compilation process. How hard it is to identify a COSTO that does not exist? How difficult it is to remove duplicates of a high visibility franchise like In-N-Out? Given the present reliability level based on these examples, what is the reliability of data source for delivering accurate information on non-franchise and small local businesses?
Eric Enge: So, how often do you revalidate the data?
Pankaj Mathur: We validate every listing at least once a year. But there are some event-based validations that may trigger before the 12-month cycle. If there is a change of address at USPS or phone disconnect we will call the business ASAP to validate.
Eric Enge: So do you use an automated call system to validate the information, or how does that work?
Pankaj Mathur: We have a compilation team of about 800 people based in Omaha, NE for the compilation of data. Of these, about 300 are responsible for phone validation. We use a smart-dialer but the information is collected in a very human form. This is also the reason for InfoUSA phone validation being such top notch as operators are able to alter the tone, questions based on responses given. The cost of having people make calls is very high but given the lift in quality and accuracy, we feel that it is worth it. All of our customers are aware of quality differences between InfoUSA and low-cost alternatives and are willing to pay more for quality.
Eric Enge: So, it’s 14,000,000 businesses?
Pankaj Mathur: Yes.
We focus on three key aspects for the compilation of business information
- Coverage- to some extent the total number of listings is an indicator of coverage but there are other sources D&B, Acxiom, Localeze, etc. that claims over 17-18 million records. From our perspective, these numbers often include legal entities, duplicates, out of business listings and even COSTO stores and In-N-Out Burgers that never existed!
- Accuracy- we manually key in over 5,000 phonebooks every year, this helps removes duplications like KFC and Kentucky Fried Chicken. Phone validation helps complete information and helps identify more duplicates and redundant listings.
- Relevancy- this is the hardest metric to grasp and the LBS industry is currently still grappling with it. To give a simple example, a listing for “California Pizza Kitchen” inside San Francisco airport may be relevant if I am inside the airport and need to eat. The same listing becomes completely irrelevant if I am driving around looking for a place to eat. Another example, a teenager may not consider “PF Changs” as a valid result for Chinese food. By the same token, a business traveler may not consider “Panda Express” as a relevant result for the same keyword. Thus relevancy is often dictated by user’s intent, environment, demographics, time of the day, etc. Frankly speaking, the LBS industry has done a very sub-standard job of it so far.
The partial list below shows information compiled by InfoUSA to help with relevancy
- Storefront Photos & Door Step Lat/Long – Currently over 3 Million and collecting 150,000 new photos every month
- Franchise & Chains
- Specialties- for categories such as Attorneys and Physicians
- Make- over 100 makes tracked for categories such as Car Dealers and Tire Dealer
- Type- over 300 types tracked for categories such as Radio Station, Movie Theaters, Malls, Schools, Libraries, etc.
- Work-At-Home Flag-over 1.3 Million records flagged across categories like caterers, contractors, realtors, photographers, etc.
- Landmark Addresses- over 1.4 Million records for additional location information such as airports, malls, corporate campus, or another business
- Shopping Malls- classified over 6,500 malls into categories such as factory outlets, lifestyle malls, power malls, etc.
- Emergency Flag for Hospital – over 4,300 flagged for true emergency locations
- Date of last phone contact
- Hours of Operations
- Credit cards accepted
Eric Enge: Which are the companies currently licensing InfoUSA data?
Pankaj Mathur: Currently we are licensing data to all top five search engines (Google, Yahoo!, MSN, AOL, Ask, etc.), we are powering Points of interest data on over 90% of in-car navigation units (Toyota, Lexus, Honda, GM, Ford, Nissan, etc.) InfoUSA also licenses data to KGB (fka InfoNXX), mobile applications like Telenav, Tellme and Telematics services like OnStar.
One advantage of having such a diversified list of customers is that quality is driven by the most critical applications. Thus we have to compile gas stations with accuracy that Lexus can use for turn-by-turn directions and compile hospital with 24/7 flag that OnStar can use. But in the end, it sets the quality bar very high at our end.
Eric Enge: These are the people who are purchasing local business data from you?
Pankaj Mathur: Yes, either directly or indirectly. Sprint & AT&T has Telenav installed on handsets and their customers will use InfoUSA data through TeleNav application.
So you may ask if it is this easy to key in the phonebooks or call every business then why aren’t our competitors or anyone else doing it?
Eric Enge: Well, I can answer that question; it’s a problem of scale.
Pankaj Mathur: Exactly. The marketing side of the business, which still accounts for over 80% of corporate revenue, helps subsidize the cost of compilation. It is impossible for any company to phone validate information solely based on LBS clients as the market value is not high enough to justify incurring such a high cost.
Eric Enge: Right. Let’s go back to enhanced content a little bit. Do you ever try to capture any data like hours of operation or things like that?
Pankaj Mathur: Oh, yes. In spite of everything that I said about accuracy and phone validation, there is still a lot of other data that I would want to know as an end user. For example, what are the store hours? For a restaurant, I want to know soup of the day, price range or what other users have to say about it?
Thus there is still room left for information like ratings and reviews. But also note that all of this information is either too subjective or too volatile that it is not possible for InfoUSA to capture it using phone validation (even ignoring cost for the time being).
Thus there is no way InfoUSA can ever tell like Citysearch that this restaurant is four out of five, or that this pizza place sucks because it’s a very subjective opinion.
There are companies that are collecting such data and given below is a partial list.
- Restaurants (CitySearch, BooRah, Zagat, Open Table, wCities)
- Gas Prices (Gas Price Watch, OPIS, Gas Buddy)
- Users Reviews (City Search, Yahoo!, BooRah)
- News & Events (Topix, Zevents, etc.)
- Classifieds (LiveDeal, eBay, Craigslist)
- Editorials and Profiles (Merchant Circle, City Search)
- Contractors (Service Magic)
- Taxonomy, coupons, parking, golf courses, live inventory and several more
Lots of this data has consumer appeal and LBS players have to mix data from multiple sources, which brings another key topic to forefront i.e. Data Aggregation.
Look at examples below to understand the challenge an LBS application faces when trying to match data from multiple sources.
To alleviate this problem and help customers leverage the power of phone validation InfoUSA performs. We appended our record identifiers (InfoUSA IDs) to selected partners (see below), this extended the reliability of validation provided by InfoUSA to enhanced data partner and also solved the problem of data matching (using InfoUSA 9-digit IDs). Thus City Search can provide real-time feeds on ratings, reviews, prices, etc. to any of our customer, who can then match the record to a unique InfoUSA record without compromising the accuracy of search results and spending any time or resources in matching.
Another important aspect of our 9-digit identifiers is that it is very persistent and Mike Dobson has blogged in detail about the value of having a persistent identifier. You can read the blog at http://blog.telemapics.com/?p=92.
Eric Enge: So, this relates to another question I had which is, that when I was going through the process of finding how to add records to a search engine, Google and Yahoo were fairly straightforward, I got to Microsoft and where I landed was actually a page on the InfoUSA site. So, it’s huge that Microsoft is using you nearly as a sole source.
Pankaj Mathur: Even in cases when a customer of InfoUSA accepts your listing on their own branded page, many times the listing is delivered to us for validation. I cannot disclose the names of clients as many of them have advertised this as their own service. InfoUSA makes about 4,000-7,000 net corrections every month as a result of such feedback. But the reasons why these customers ask us to validate these listings corrections are two folds
- Objective validation of listing submissions- you cannot always assume that merchant submission is always 100% honest. We find that about 50% of listing correction submissions are genuine or warrants some kind of corrective actions. For the remaining 50%, there are cases when a taxi cab may try to list themselves under airport or someone playing mischievous and asking us to change a pizza delivery number to a residential number. There are also cases of a competitor calling and asking that a bar be also classified as an escort club!
Costs and Data Hygiene- our customers will have to incur the cost of validation and over time this creates a problem of data hygiene. For example, if one of our customers added a listing (after verification) for “BJ Brewery at Mariners Island and Fashion Blvd in San Mateo, CA.” InfoUSA might have captured the listings from another source as “BJ’s Breweries at 2206 Bridgepointe Pkwy, San Mateo, CA.” Thus you have a duplicate and you are also stuck with ownership and hygiene of second listing in perpetuity.
Eric Enge: When a local portal of some sort wants to license data from InfoUSA, how does that work? Let’s say I am SuperPages.com and I want a very large amount of data because I am trying to be complete with what I am doing. How is that structured?
Pankaj Mathur: The deal structures vary according to the needs of our partners. Because they are all in the local search space, there are some commonalities. For instance, all them would want physical location address for the merchant as opposed to direct mail clients who may want PO Box addresses for higher deliverability.
But even within LBS customers, we see several customizations which are application specific such as in-car navigation customers may want to drop categories like contractors or listings for lawyers. While mobile application clients may focus on a very narrow band of categories like eating, drinking, shopping, travel, and personal services and discard the rest.
Eric Enge: Can you give us a general sense as to what pricing looks like?
Pankaj Mathur: We have usage-based pricing, because as you can see we are continuously investing more and more into the content. Also, InfoUSA is investing heavily in improving content for driving better customer experience. Currently, we have projects running across categories like ATMs, Amtrak stations, fuel grades (electric cars), towing services, taxis, locksmiths, etc. So when it comes to pricing, we want some incentives to be able to continuously invest in improvements. We follow different kinds of usage pricing models based on applications such as CPM (Ad supported & web based), transaction (411), per subscriber (mobile), etc. The usage volumes are also drastic different, Internet players can probably deliver billion of pages views but the same cannot be expected from in-car navigation.
Eric Enge: Thanks Pankaj!
Pankaj Mathur: Thank you Eric!
Eric Enge leads the Digital Marketing practice for Perficient Digital. He designs studies and produces industry-related research to help prove, debunk, or evolve assumptions about digital marketing practices and their value. Eric is a writer, blogger, researcher, teacher, and keynote speaker and panelist at major industry conferences. Partnering with several other experts, Eric served as the lead author of The Art of SEO. Learn More About Eric Enge