Categories
pandas python scikit-learn

Pandas – automatically detect date columns **at run time**

I was wondering if pandas is capable of automatically detecting which columns are datetime objects and read those columns in as dates instead of strings?

I am looking at the api and related stack overflow posts but I can’t seem to figure it out.

This is a black-box system that takes in arbitrary csv schema on production so I do not what the column names will be.

This seems like it would work but you have to know which columns are date fields:

import pandas as pd
#creating the test data
df = pd.DataFrame({'0': ['a', 'b', 'c'], '1': ['2015-12-27','2015-12-28', '2015-12-29'], '2': [11,12,13]})
df.to_csv('test.csv', index=False)
#loading the test data
df = pd.read_csv('test.csv', parse_dates=True)
print df.dtypes
# prints (object, object, int64) instead of (object,datetime, int64)

I am thinking if it cannot do this, then I can write something that:

  1. Finds columns with string type.
  2. Grab a few unique values and try to parse them.
  3. If successful then try to parse the whole column.

Edit. I wrote a simple method convertDateColumns that will do this:

import pandas as pd
from dateutil import parser
def convertDateColumns(self, df):
object_cols = df.columns.values[df.dtypes.values == 'object']
date_cols = [c for c in object_cols if testIfColumnIsDate(df[c], num_tries=3)]
for col in date_cols:
try:
df[col] = pd.to_datetime(df[col], coerce=True, infer_datetime_format=True)
except ValueError:
pass
return df
def testIfColumnIsDate(series, num_tries=4):
""" Test if a column contains date values.
This can try a few times for the scenerio where a date column may have
a couple of null or missing values but we still want to parse when
possible (and convert those null/missing to NaD values)
"""
if series.dtype != 'object':
return False
vals = set()
for val in series:
vals.add(val)
if len(vals) > num_tries:
break
for val in list(vals):
try:
if type(val) is int:
continue
parser.parse(val)
return True
except ValueError:
pass
return False