A programmer’s cleaning guide for messy sensor data | Opensource.com


This tutorial explains how to use Pandas and Python to work with messy data. If you have never used Pandas before and know the basics of Python, this tutorial is for you. This tutorial shows how to clean up messy data with Python and Pandas in several ways, such as:

  • reading a CSV file with proper structures,
  • sorting your dataset,
  • transforming columns by applying a function
  • regulating data frequency
  • interpolating and filling missing data
  • plotting your dataset


pandas is a Python package providing fast, flexible, and expressive data structures designed to make working with “relational” or “labeled” data both easy and intuitive.

pandas is well suited for many different kinds of data:

  • Tabular data with heterogeneously-typed columns, as in an SQL table or Excel spreadsheet
  • Ordered and unordered (not necessarily fixed-frequency) time series data.
  • Arbitrary matrix data (homogeneously typed or heterogeneous) with row and column labels
  • Any other form of observational / statistical data sets. The data actually need not be labeled at all to be placed into a pandas data structure


1 Comment

Leave a Comment

Your email address will not be published. Required fields are marked *