We introduce a self-describing serialized format for bulk biomedical data
called the Portable Format for Biomedical (PFB) data. The Portable Format
for Biomedical data is based upon Avro and encapsulates a data model, a
data dictionary, the data itself, and pointers to third party controlled
vocabularies. In general, each data element in the data dictionary is
associated with a third party controlled vocabulary to make it easier for
applications to harmonize two or more PFB files. We also introduce an open
source software development kit (SDK) called PyPFB for creating, exploring
and modifying PFB files. We describe experimental studies showing the
performance improvements when importing and exporting bulk biomedical data
in the PFB format versus using JSON and SQL formats.