1

I am working on the models of different product types and wish to generalize them to the same entity. For example, from the given list

Toshiba-A40C
Toshiba B30
Toshiba-Z40C411
Asus -X540
Asus R4
Dell XPS 15
Dell Inspiron 13

I would like to get a clean list of

Toshiba
Toshiba
Toshiba
Asus
Asus
Dell
Dell

Is there a programmatic way to get this one ? Is this called a record linkage? What are your recommendations ?

user_01
  • 113
  • 2
  • Index-Match or Vlookup in Excel could accomplish this simply. R base code allows you to look up a value in a dataset (like the one you have above) given an input value. – ERT Jul 24 '18 at 20:39
  • Thank you @ERT . What if the first word is misspelled such as Thoshiba- A40C. Is there a way we can generalize it to 'Toshiba' ? – user_01 Jul 24 '18 at 20:43

1 Answers1

3

I know that there are better solutions to this problem. I used a regular expression to match first space or dash and then cut the original list.

import re

example = """Toshiba-A40C
Toshiba B30
Toshiba-Z40C411
Asus -X540
Asus R4
Dell XPS 15
Dell Inspiron 13""".split('\n')

regular_exp = '(?: +)|(-)'

for i in range(len(example)):
    where = re.search(regular_exp, example[i]).span()
    example[i] = example[i][:where[0]]

print(example)
RobJan
  • 188
  • 5