igc.naive.NaiveTTest#
- class igc.naive.NaiveTTest(module, dataset, dtld_kwargs=None, forward_method_name=None, forward_method_kwargs=None, n_embedding_categories=None, dtype=torch.float32, dtype_cat=torch.int32)[source]#
Bases:
AbstractAttributionMethodNaive two-sample t-tests between inputs corresponding to categories based on outputs characteristics.
- Parameters:
module (torch.nn.Module) – PyTorch module defining the model under scrutiny.
dataset (torch.utils.data.Dataset) – PyTorch dataset providing inputs/outputs for any given index. See PyTorch documentation for more information. In addition, inputs must be organized in a specific manner, see warning below.
dtld_kwargs (dict) – Additional keyword arguments to the dataloaders (
torch.utils.data.DataLoader) constructed around thedataset, except:dataset,batch_size,shuffle,sampler,batch_sampler, andgenerator.forward_method_name (str) – Name of the forward method of the
module. IfNone, the defaultforwardis used.forward_method_kwargs (dict) – Additional keyword arguments to the forward method of the
module.n_embedding_categories (None | int | tuple(int)) – Enable the computation of attributions for categorical inputs associated with
torch.nn.Embeddinglayers, by providing the number of embedding categories.dtype (torch.dtype) – Default data type of all intermediary tensors. It also defines the NumPy data type of the attribution results.
dtype_cat (torch.dtype) – Default data type of the categorical input tensors.
Notes
Warning
When computing attributions on models using multiple inputs, e.g., x_1, x_2, and x_cat, with x_cat a categorical input, the dataset must return all inputs packed in a tuple, such as: (x_1, x_2, x_cat), y. Note that categorical inputs must be placed at the end of the tuple.
Note
Using categorical inputs with
torch.nn.Embeddinglayers modifies the output shapes of attributions associated with these categorical inputs. The number of embedding categories is appended to original shapes.- compute(cat_ranges, y_idx=None, batch_size=None, x_seed=None, n_x=None)[source]#
Compute the p-values of naive two-sample t-tests between inputs corresponding to categories based on outputs characteristics.
- Parameters:
cat_ranges (tuple(float)) – Tuple
(a, b)defines a first category with entriesy<aand a second category with entriesy>b.y_idx (None | int | ArrayLike) –
None :
y_idx_dtlditerates over all output component indicesy_idx.int : Select a specific output component index
y_idx.ArrayLike : Select multiple output component indices
y_idx.
batch_size (None | int | tuple(int)) –
None : Set
x_bsz= 1 andy_idx_bsz=n_y_idx.int : Total batch size budget automatically distributed between
x_bszandy_idx_bsz.tuple(int) : Set
x_bszandy_idx_bszindividually.
x_seed (None | int) – Seed associated with
x_dtld.n_x (None | int) –
None :
x_dtlditerates over the whole dataset.int : Number of
xinputs sampled from the dataset.
- Returns:
p-values of naive two-sample t-tests of shape (
n_y_idx, * unbatchedxshape)- Return type:
ArrayLike | tuple(ArrayLike)