Categories

# Vectorizing a function in pandas

I have a dataframe that contains a list of lat/lon coordinates:

``d = {'Provider ID': {0: '10001',   1: '10005',   2: '10006',   3: '10007',   4: '10008',   5: '10011',   6: '10012',   7: '10016',   8: '10018',   9: '10019'},  'latitude': {0: '31.215379379000467',   1: '34.22133455500045',   2: '34.795039606000444',   3: '31.292159523000464',   4: '31.69311635000048',   5: '33.595265517000485',   6: '34.44060759100046',   7: '33.254429322000476',   8: '33.50314015000049',   9: '34.74643089500046'},  'longitude': {0: ' -85.36146587999968',   1: ' -86.15937514799964',   2: ' -87.68507485299966',   3: ' -86.25539902199966',   4: ' -86.26549483099967',   5: ' -86.66531866799966',   6: ' -85.75726760699968',   7: ' -86.81407933399964',   8: ' -86.80242858299965',   9: ' -87.69893502799965'}} df = pd.DataFrame(d) ``

My goal is to use the haversine function to figure out the distances between every item in KM:

``from math import radians, cos, sin, asin, sqrt def haversine(lon1, lat1, lon2, lat2):     """     Calculate the great circle distance between two points      on the earth (specified in decimal degrees)     """     # convert decimal degrees to radians      lon1, lat1, lon2, lat2 = map(radians, [lon1, lat1, lon2, lat2])     # haversine formula      dlon = lon2 - lon1      dlat = lat2 - lat1      a = sin(dlat/2)**2 + cos(lat1) * cos(lat2) * sin(dlon/2)**2     c = 2 * asin(sqrt(a))      # 6367 km is the radius of the Earth     km = 6367 * c     return km ``

My goal is to get a dataframe that looks like the result_df below where the values are the distance between each provider id:

`` result_df = pd.DataFrame(columns = df['Provider ID'], index=df['Provider ID']) ``

I can do this in a loop, however it’s terribly slow. I’m looking for some help in converting this to a vectorized method:

``for first_hospital_coordinates in result_df.columns:     for second_hospital_coordinates in result_df['Provider ID']:         if first_hospital_coordinates == 'Provider ID':             pass         else:             L1 = df[df['Provider ID'] == first_hospital_coordinates]['latitude'].astype('float64').values             O1 = df[df['Provider ID'] == first_hospital_coordinates]['longitude'].astype('float64').values             L2 = df[df['Provider ID'] == second_hospital_coordinates]['latitude'].astype('float64').values             O2 = df[df['Provider ID'] == second_hospital_coordinates]['longitude'].astype('float64').values             distance = haversine(O1, L1, O2, L2)             crit = result_df['Provider ID'] == second_hospital_coordinates             result_df.loc[crit, first_hospital_coordinates] = distance ``