Classification
Models for object classification
MiniXception
paz.models.classification.xception.MiniXception(input_shape, num_classes, weights=None)
Build MiniXception (see references).
Arguments
- input_shape: List of three integers e.g.
[H, W, 3]
- num_classes: Int.
- weights:
None
or string with pre-trained dataset. Valid datasets include onlyFER
.
Returns
Tensorflow-Keras model.
References
ProtoEmbedding
paz.models.classification.protonet.ProtoEmbedding(image_shape, num_blocks)
Embedding convolutional network used for proto-typical networks
Arguments:
- image_shape: List with image shape
(H, W, channels)
. - num_blocks: Ints. Number of convolution blocks.
Returns:
Keras model.
References:
ProtoNet
paz.models.classification.protonet.ProtoNet(embed, num_classes, num_support, num_queries, image_shape)
Prototypical networks used for few-shot classification Arguments:
- embed: Keras network for embedding images into metric space.
- num_classes: Number of
ways
for few-shot classification. - num_support: Number of
shots
used for meta learning. - num_queries: Number of test images to query.
- image_shape: List with image shape
(H, W, channels)
.
Returns:
Keras model.
References:
CNN2Plus1D
paz.models.classification.cnn2Plus1.CNN2Plus1D(weights=None, input_shape=(38, 96, 96, 3), seed=305865, architecture='CNN2Plus1D')
Binary Classification for videos with 2+1D CNNs. Arguments
- weights:
None
or string with pre-trained dataset. Valid datasets include onlyVVAD-LRS3
. - input_shape: List of integers. Input shape to the model in following format: (frames, height, width, channels)
e.g. (38, 96, 96, 3).
- seed: Integer. Seed for random number generator.
- architecture: String. Name of the architecture to use. Currently supported: 'CNN2Plus1D', 'CNN2Plus1D_Filters', 'CNN2Plus1D_Layers', 'CNN2Plus1D_Light'. 'CNN2Plus1D_18' is only available without weights.
Reference
- A Closer Look at Spatiotemporal Convolutions for Action Recognition
- [Video classification with a 3D convolutional neural network]
(https://www.tensorflow.org/tutorials/video/video_classification#load_and_preprocess_video_data)
VVAD_LRS3_LSTM
paz.models.classification.vvad_lrs3.VVAD_LRS3_LSTM(weights=None, input_shape=(38, 96, 96, 3), seed=305865)
Binary Classification for videos using a CNN based mobile net with an TimeDistributed layer (LSTM). Arguments
- weights:
None
or string with pre-trained dataset. Valid datasets include onlyVVAD-LRS3
. - input_shape: List of integers. Input shape to the model in following format: (frames, height, width, channels)
e.g. (38, 96, 96, 3).
- seed: Integer. Seed for random number generator.
Reference
- [The VVAD-LRS3 Dataset for Visual Voice Activity Detection]
(https://api.semanticscholar.org/CorpusID:238198700)
- [VVAD-LRS3 GitHub Repository]
(https://github.com/adriandavidauer/VVAD)