icon-facebook icon-linkedin icon-blog icon-twitter icon-googleplus
   
 
 
Popular Pages
 
 
   
  Resources > Data Entry Outsourcing > Data Entry Glossary > Data Cleansing
   
  Data Cleansing
   
 

Data scrubbing or data cleansing is the act of detecting or correcting data records. Corrupted or inaccurate records can be removed from a table, set, or database when engaged in data cleansing.

Incorrect data in a database could be referred to as "dirty data" and could consist of out-of-date, incomplete, or duplicate data. Additionally, data that was not formatted correctly should be cleaned too.

In our business, we get a lot of data submitted to us. There are data fields for name, phone numbers, Skype numbers, emails, websites and more. There are many formatting issues involved. Phone numbers need to be formatted in a standard way, so that if you see them on a list, they will line up and look consistant. But, when you have phone numbers of differing lengths from all over the world, and you don't know which part of the number is the city code, or international code, its complicated. Some people don't include their international code when submitting phone numbers to us. Therefore, the phone data I received always needs to be "cleansed" or given a standardized format.

Web addresses are another issue. When people submit web address information to us, many will leave off the http://, or even the www. It got to the point where I asked the programmers to include the http:// as prepopulated text in the sign up form, so that it would not be omitted and so that the computer would be able to recognize that the URL information was correct. This was very helpful and a simple solution to an irritating problem.

A more difficult aspect of data cleansing or data scrubbing in our directory business is knowing or guessing when a company's profile is no longer up to date. There are various ways to do this, but it can be expensive. If companies are required to login every several months or years and click a link that says that they acknowledge that their information is still the same, then you will have a record that they logged in and saw their information. If they simply don't contact you or login, then you are left guessing. For those who can't or won't login or email you, you can call them. But, the labor involved in making thousands of phone calls for the sake of data scrubbing is very costly.

One type of data cleansing that many companies do is more of a sort of data sorting for quality control. If someone on a directory doesn't login for a very long time, you could place them lower and lower on the search results the longer they don't login. If you email the company with an outdated profile, and they update their info, then they could rise in the list. This requires sophisticated date sensitive programming, but is one highly effective way to at least do data cleaning for the top of the search results which are much more critical than the bottom. One could call it data refinement too since you are refining the information at the top of the list.