• Martin Thoma
  • Home
  • Categories
  • Tags
  • Archives
  • Support me

What are pfiles?

Contents

  • What are pfiles?
    • pfile_utils
      • pfile_create
    • See also

pfile is a binary file format that is used in ASR for storing feature vectors and their corresponding labels. This file format is sometimes also called ICSI feature file archive format. But this file format cannot be used for ASR only, but also for many other ML tasks.

The file consists of a fixed length ascii header followed by zero or more variable length binary sections. Each parameter in the header has a name and a list of zero or more value strings. The programmer's interface to pfiles (see param.h and pfile.h) allows each parameter value to be interpreted as integer, float, string, arrays, distributed vector, matrix, mapping tables, etc.

Some special parameter names are associated with a section in the binary part of the pfile. The value strings for these parameters give the size and offset (from the end of the header) of the binary section.

A binary section can be used as a one dimensional sequence of values, or as a sequence of fixed length rows in a two dimensional matrix.

Some parameters are automatically added by the pfile command. For example, pfile_header is a parameter that contains the length and version number of the header.

Source: old-site.clsp.jhu.edu/ws96/ris/man/pfile.doc

pfile_utils

pfile_utils is a toolset to manage pfiles. It is part of the SPRACHcore software package. The project is located at code.google.com/p/pfile-utilities and seems to be in version 0.51 by now. The code is written in C++.

pfile_info gives general information about the file:

$ pfile_info all.pfile
all.pfile
9581 sentences, 3158027 frames, 1 label(s), 42 features

pfile_create

You can call pfile_create like this:

$ ./pfile_create -i - -f 1 -l 1 -o output.pfile
0 0 1 1
0 1 2 0
0 2 1 1
0 3 1 1
0 4 42 4
1 0 0 0
1 1 1337 0
1 2 2 2
2 0 3 3

You can end the input with Ctrl + D.

The numbers are:

[sentence-nr] [frame-nr] [feature 1] [feature 2] ... [feature n] [label 1] [label 2] ... [label n]

where the option -f defines the number of features and -l defines the number of labels. Please note that within one sentence, the number of frames has to be increasing by exactly one. One sentence can have an arbitrary number of frames, but as soon as you make another sentence, you need to increase this number by exactly one.

See also

  • ICSI Speech FAQ: 3.3 What are the feature data formats?

Published

Jun 27, 2014
by Martin Thoma

Category

Code

Tags

  • ASR 4
  • Machine Learning 81
  • pfile 1

Contact

  • Martin Thoma - A blog about Code, the Web and Cyberculture
  • E-mail subscription
  • RSS-Feed
  • Privacy/Datenschutzerklärung
  • Impressum
  • Powered by Pelican. Theme: Elegant by Talha Mansoor