API

This part of the documentation covers all the interfaces of Tablib. For parts where Tablib depends on external libraries, we document the most important right here and provide links to the canonical documentation.

Dataset Object

class tablib.Dataset(*args, **kwargs)

The Dataset object is the heart of Tablib. It provides all core functionality.

Usually you create a Dataset instance in your main module, and append rows as you collect data.

data = tablib.Dataset()
data.headers = ('name', 'age')

for (name, age) in some_collector():
    data.append((name, age))

Setting columns is similar. The column data length must equal the current height of the data and headers must be set

data = tablib.Dataset()
data.headers = ('first_name', 'last_name')

data.append(('John', 'Adams'))
data.append(('George', 'Washington'))

data.append_col((90, 67), header='age')

You can also set rows and headers upon instantiation. This is useful if dealing with dozens or hundreds of Dataset objects.

headers = ('first_name', 'last_name')
data = [('John', 'Adams'), ('George', 'Washington')]

data = tablib.Dataset(*data, headers=headers)
Parameters:
  • *args – (optional) list of rows to populate Dataset
  • headers – (optional) list strings for Dataset header row

Format Attributes Definition

If you look at the code, the various output/import formats are not defined within the Dataset object. To add support for a new format, see Adding New Formats.

add_formatter(col, handler)

Adds a formatter to the Dataset.

New in version 0.9.5: :param col: column to. Accepts index int or header str. :param handler: reference to callback function to execute against each cell value.

append(row, tags=[])

Adds a row to the Dataset. See Dataset.insert for additional documentation.

append_col(col, header=None)

Adds a column to the Dataset. See Dataset.insert_col for additional documentation.

append_separator(text='-')

Adds a separator to the Dataset.

csv

A CSV representation of the Dataset object. The top row will contain headers, if they have been set. Otherwise, the top row will contain the first row of the dataset.

A dataset object can also be imported by setting the Dataset.csv attribute.

data = tablib.Dataset()
data.csv = 'age, first_name, last_name\n90, John, Adams'

Import assumes (for now) that headers exist.

Binary Warning

Dataset.csv uses rn line endings by default, so make sure to write in binary mode:

with open('output.csv', 'wb') as f:
    f.write(data.csv)

If you do not do this, and you export the file on Windows, your CSV file will open in Excel with a blank line between each row.

dbf

A dBASE representation of the Dataset object.

A dataset object can also be imported by setting the Dataset.dbf attribute.

# To import data from an existing DBF file:
data = tablib.Dataset()
data.dbf = open('existing_table.dbf').read()

# to import data from an ASCII-encoded bytestring:
data = tablib.Dataset()
data.dbf = '<bytestring of tabular data>'

Binary Warning

Dataset.dbf contains binary data, so make sure to write in binary mode:

with open('output.dbf', 'wb') as f:
    f.write(data.dbf)
dict

A native Python representation of the Dataset object. If headers have been set, a list of Python dictionaries will be returned. If no headers have been set, a list of tuples (rows) will be returned instead.

A dataset object can also be imported by setting the Dataset.dict attribute:

data = tablib.Dataset()
data.json = '[{"last_name": "Adams","age": 90,"first_name": "John"}]'
extend(rows, tags=[])

Adds a list of rows to the Dataset using Dataset.append

filter(tag)

Returns a new instance of the Dataset, excluding any rows that do not contain the given tags.

get_col(index)

Returns the column from the Dataset at the given index.

headers

An optional list of strings to be used for header rows and attribute names.

This must be set manually. The given list length must equal Dataset.width.

height

The number of rows currently in the Dataset. Cannot be directly modified.

html

A HTML table representation of the Dataset object. If headers have been set, they will be used as table headers.

..notice:: This method can be used for export only.

insert(index, row, tags=[])

Inserts a row to the Dataset at the given index.

Rows inserted must be the correct size (height or width).

The default behaviour is to insert the given row to the Dataset object at the given index.

insert_col(index, col=None, header=None)

Inserts a column to the Dataset at the given index.

Columns inserted must be the correct height.

You can also insert a column of a single callable object, which will add a new column with the return values of the callable each as an item in the column.

data.append_col(col=random.randint)

If inserting a column, and Dataset.headers is set, the header attribute must be set, and will be considered the header for that row.

See Dynamic Columns for an in-depth example.

Changed in version 0.9.0: If inserting a column, and Dataset.headers is set, the header attribute must be set, and will be considered the header for that row.

New in version 0.9.0: If inserting a row, you can add tags to the row you are inserting. This gives you the ability to filter your Dataset later.

insert_separator(index, text='-')

Adds a separator to Dataset at given index.

json

A JSON representation of the Dataset object. If headers have been set, a JSON list of objects will be returned. If no headers have been set, a JSON list of lists (rows) will be returned instead.

A dataset object can also be imported by setting the Dataset.json attribute:

data = tablib.Dataset()
data.json = '[{"age": 90, "first_name": "John", "last_name": "Adams"}]'

Import assumes (for now) that headers exist.

lpop()

Removes and returns the first row of the Dataset.

lpush(row, tags=[])

Adds a row to the top of the Dataset. See Dataset.insert for additional documentation.

lpush_col(col, header=None)

Adds a column to the top of the Dataset. See Dataset.insert for additional documentation.

ods

An OpenDocument Spreadsheet representation of the Dataset object, with Separators. Cannot be set.

Binary Warning

Dataset.ods contains binary data, so make sure to write in binary mode:

with open('output.ods', 'wb') as f:
    f.write(data.ods)
pop()

Removes and returns the last row of the Dataset.

rpop()

Removes and returns the last row of the Dataset.

rpush(row, tags=[])

Adds a row to the end of the Dataset. See Dataset.insert for additional documentation.

rpush_col(col, header=None)

Adds a column to the end of the Dataset. See Dataset.insert for additional documentation.

sort(col, reverse=False)

Sort a Dataset by a specific column, given string (for header) or integer (for column index). The order can be reversed by setting reverse to True.

Returns a new Dataset instance where columns have been sorted.

stack(other)

Stack two Dataset instances together by joining at the row level, and return new combined Dataset instance.

stack_cols(other)

Stack two Dataset instances together by joining at the column level, and return a new combined Dataset instance. If either Dataset has headers set, than the other must as well.

transpose()

Transpose a Dataset, turning rows into columns and vice versa, returning a new Dataset instance. The first row of the original instance becomes the new header row.

tsv

A TSV representation of the Dataset object. The top row will contain headers, if they have been set. Otherwise, the top row will contain the first row of the dataset.

A dataset object can also be imported by setting the Dataset.tsv attribute.

data = tablib.Dataset()
data.tsv = 'age     first_name      last_name\n90   John    Adams'

Import assumes (for now) that headers exist.

width

The number of columns currently in the Dataset. Cannot be directly modified.

wipe()

Removes all content and headers from the Dataset object.

xls

A Legacy Excel Spreadsheet representation of the Dataset object, with Separators. Cannot be set.

Note

XLS files are limited to a maximum of 65,000 rows. Use Dataset.xlsx to avoid this limitation.

Binary Warning

Dataset.xls contains binary data, so make sure to write in binary mode:

with open('output.xls', 'wb') as f:
    f.write(data.xls)
xlsx

An Excel ‘07+ Spreadsheet representation of the Dataset object, with Separators. Cannot be set.

Binary Warning

Dataset.xlsx contains binary data, so make sure to write in binary mode:

with open('output.xlsx', 'wb') as f:
    f.write(data.xlsx)
yaml

A YAML representation of the Dataset object. If headers have been set, a YAML list of objects will be returned. If no headers have been set, a YAML list of lists (rows) will be returned instead.

A dataset object can also be imported by setting the Dataset.yaml attribute:

data = tablib.Dataset()
data.yaml = '- {age: 90, first_name: John, last_name: Adams}'

Import assumes (for now) that headers exist.

Databook Object

class tablib.Databook(sets=None)

A book of Dataset objects.

add_sheet(dataset)

Adds given Dataset to the Databook.

size

The number of the Dataset objects within Databook.

wipe()

Removes all Dataset objects from the Databook.

Functions

tablib.detect(stream)

Return (format, stream) of given stream.

tablib.import_set(stream)

Return dataset of given stream.

Exceptions

class tablib.InvalidDatasetType

You’re trying to add something that doesn’t quite look right.

class tablib.InvalidDimensions

You’re trying to add something that doesn’t quite fit right.

class tablib.UnsupportedFormat

You’re trying to add something that doesn’t quite taste right.

Now, go start some Tablib Development.

About Tablib

Tablib is an MIT Licensed format-agnostic tabular dataset library, written in Python. It allows you to import, export, and manipulate tabular data sets. Advanced features include, segregation, dynamic columns, tags & filtering, and seamless format import & export.

Table Of Contents

Related Topics

This Page

Fork me on GitHub