Python – Pandas returns “Passed header names mismatches usecols” error

pandas, python

The following works as expected. There are 190 columns that are all read in perfectly.

pd.read_csv("data.csv",              header=None,             names=columns,             # usecols=columns[:10],              nrows=10             )

I have used the usecols argument before, so I am perplexed as to why this is no longer working for me. I would guess that simply slicing the first 10 column names would trivially work, but I continue to get the "Passed header names mismatches usecols" error.

I am using pandas 0.16.2.

pd.read_csv("data.csv",              header=None,             names=columns,             usecols=columns[:10],              nrows=10             )

Traceback

---------------------------------------------------------------------------ValueError                                Traceback (most recent call last)<ipython-input-44> in <module>()      3                     nrows=10,      4                     header=None,----> 5                     names=columns,      6                     )/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in parser_f(filepath_or_buffer, sep, dialect, compression, doublequote, escapechar, quotechar, quoting, skipinitialspace, lineterminator, header, index_col, names, prefix, skiprows, skipfooter, skip_footer, na_values, na_fvalues, true_values, false_values, delimiter, converters, dtype, usecols, engine, delim_whitespace, as_recarray, na_filter, compact_ints, use_unsigned, low_memory, buffer_lines, warn_bad_lines, error_bad_lines, keep_default_na, thousands, comment, decimal, parse_dates, keep_date_col, dayfirst, date_parser, memory_map, float_precision, nrows, iterator, chunksize, verbose, encoding, squeeze, mangle_dupe_cols, tupleize_cols, infer_datetime_format, skip_blank_lines)    472                     skip_blank_lines=skip_blank_lines)    473 --> 474         return _read(filepath_or_buffer, kwds)    475     476     parser_f.__name__ = name/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in _read(filepath_or_buffer, kwds)    248     249     # Create the parser.--> 250     parser = TextFileReader(filepath_or_buffer, **kwds)    251     252     if (nrows is not None) and (chunksize is not None):/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in __init__(self, f, engine, **kwds)    564             self.options['has_index_names'] = kwds['has_index_names']    565 --> 566         self._make_engine(self.engine)    567     568     def _get_options_with_defaults(self, engine):/.../m9tn/lib/python2.7/site-packages/pandas/io/parsers.pyc in _make_engine(self, engine)    703     def _make_engine(self, engine='c'):    704         if engine == 'c':--> 705             self._engine = CParserWrapper(self.f, **self.options)    706         else:    707             if engine == 'python':/.../lib/python2.7/site-packages/pandas/io/parsers.pyc in __init__(self, src, **kwds)   1070         kwds['allow_leading_cols'] = self.index_col is not False   1071 -> 1072         self._reader = _parser.TextReader(src, **kwds)   1073    1074         # XXXpandas/parser.pyx in pandas.parser.TextReader.__cinit__ (pandas/parser.c:4732)()pandas/parser.pyx in pandas.parser.TextReader._get_header (pandas/parser.c:7330)()ValueError: Passed header names mismatches usecols

Best Solution

It turns out there were 191 columns in the dataset (not 190). Pandas automatically set my first column of data as the index. I don't quite know why it caused it to error out since all of the columns in usecols were in fact present in the parsed in dataset.

So, the solution is to confirm that the number of columns in names exactly corresponds to the number of columns in your dataset.

Also, I found this discussion on GitHub.