We are fans of the data science process as outlined by Hadley Wickham in R for Data Science:

Usually most of the work in modeling is at the data preparation and interpretation stages, ie before and after modeling. I’m an advocate of efficient data ingestion (estimated 50-80%spent on collection and preparation of data), the iterative process of exploratory data analysis and closing the gap between the starting with a question and sharing analytical results online and interactively to any audience.


