excel data mappingcleansing

Forum home » Delegate support and help forum » Microsoft Excel Training and help » Excel data mapping/cleansing

Excel data mapping/cleansing

resolvedResolved · High Priority · Version 2016

Amy has attended:
Excel Advanced course
Excel Advanced - Formulas & Functions course

Excel data mapping/cleansing

Hi,

I'm doing some data categorization for a client, and have a spreadsheet where there are a list of supplier names (nearly 9000), but some of the names are repeated throughout with slight variations e.g.:

A&L Dent
A&L Dent Solutions Ltd
AA Garage Service MB
AA Garage Services MH

someone has done some cleansing to get final "cleansed supplier name" column, so in the case above the adjacent column now reads:

A&L Dent
A&L Dent
AA Garage Service
AA Garage Service

However some of the cleansing has not worked effectively, e.g.

4 D Distributors
4D DISTRIBUTORS

Has been cleansed to:

4 D Distributors
4D Distributors

I'm trying to find these missed ones that haven't been cleansed effectively so that there is only one naming convention for each supplier in the "cleansed column".

I have no idea how the person mapped the supplier names in the first place as all formula's have been removed.

Is there a way for me to ensure that the cleansed supplier names are correct and are mapped properly to the original names?

Thanks
Amy

RE: Excel data mapping/cleansing

Hi Amy,

Thank you for the forum question.

Du you have a list of how the names should be?

Is this a repeating task or something you need to do one time?

Is there a pattern in the mistakes Excel can use to get it right?

If there is a pattern, it could be that some typed UK and others United Kingdom. Then you can tell Excel automatic to replace United Kingdom with UK.

If you have a list of the client's names spelled correct, you can merge the data in Power Query and if you use a anti left/right join Power Query can create a list of all misspelled client names.

You can also download the fuzzy lookup Add-in to Excel. Then you can set a threshold for accuracy. You can set a percentage. If it 90% the same map them.


Kind regards

Jens Bonde
Microsoft Office Specialist Trainer

Tel: 0207 987 3777
STL - https://www.stl-training.co.uk
98%+ recommend us

London's leader with UK wide delivery in Microsoft Office training and management training to global brands, FTSE 100, SME's and the public sector

RE: Excel data mapping/cleansing

Hi,

Thanks for your response, in response to your questions:

I don't have a list of how the names should be.

It's a one time task.

I can't identify a pattern in the mistakes. (In the example above it was the spacing between the number and letter)

I just have one list/column of names with numerous variations throughout that column and I want to end up with a new "cleansed" column/list with no variations, just one naming format for each name. So if I filter that supplier I just have one name to select from not a number of variations.

I'm not sure what was done to get the current cleansed list, but maybe I could start fresh and see how it works out? I'd appreciate any advise on how to do that please?


Many Thanks
Amy

RE: Excel data mapping/cleansing

Hi Amy,


If there is no pattern in the errors it can be very difficult to do. You may be able to do it with a combination of different methods.

If you create a column and use the Left function. If you have have the client names in column A you can in the new column write =Left(A2, 8). This will extract the first 8 characters from the client name.

A&L Dent = A&L Dent
A&L Dent Solutions Ltd = A&L Dent
AA Garage Service MB = AA Garag
AA Garage Services MH = AA Garag

But this method will not help you if you have the problem within the first 8 characters.


There is no functions in Excel which can do cleaning if there is absolutely no pattern. I am almost sure that who ever did the cleaning created a list of how he wanted the suppliers name to be and in the "cleaned" column the person used a Vlookup approximately match.

Kind regards

Jens Bonde
Microsoft Office Specialist Trainer

Tel: 0207 987 3777
STL - https://www.stl-training.co.uk
98%+ recommend us

London's leader with UK wide delivery in Microsoft Office training and management training to global brands, FTSE 100, SME's and the public sector

Tue 29 Sep 2020: Automatically marked as resolved.


 

Excel tip:

Generating Random Numbers

To generate a random number in Excel use the = RAND() function.

The value returned will always be between 0 and 1. To convert this to some other random value, you will need to multiply the result by the highest number you want to consider. For example, if you wanted a random number between 1 and 25, you could use the following code line:
= INT(25 * RAND()+ 1)

Since RAND() will always returns a value between 0 and 1 (but never 1 itself), multiplying what it returns by 25 and then using the Integer function INT on that result will return a whole number between 0 and 24.

Finally, 1 is added to this result, so that x will be equal to a number between 1 and 25, inclusive

View all Excel hints and tips


Server loaded in 0.11 secs.