igc.base.DataManager#

class igc.base.DataManager(attr, y_required=True)[source]#

Bases: object

Help to setup the appropriate dataloaders for each context.

Initialized dataloaders iterate over inputs x, true outputs y, baselines x_0, output component indices y_idx, or hidden layer component indices z_idx.

Parameters:
  • attr (AbstractAttributionMethod) – Attribution method.

  • y_required (bool) – Define if true outputs y are required by x_dtld.

Attributes

n_xint

Number of x samples.

x_bszint

Batch size of x_dtld.

x_nbint

Number of batches of x_dtld.

x_dtldtorch.utils.data.DataLoader | tuple(ArrayLike)

Dataloader iterating over inputs x (and true outputs y if y_required).

n_x_0int

Number of x_0 baselines.

x_0_bszint

Batch size of x_0_dtld.

x_0_nbint

Number of batches of x_0_dtld.

x_0_dtldtorch.utils.data.DataLoader | tuple(ArrayLike)

Dataloader iterating over baselines x_0.

n_z_idxint

Number of z_idx component indices.

z_idx_bszint

Batch size of z_idx_dtld.

z_idx_nbint

Number of batches of z_idx_dtld.

z_idx_dtldtorch.utils.data.DataLoader | tuple(ArrayLike)

Dataloader iterating over hidden layer component indices z_idx.

n_y_idxint

Number of y_idx component indices.

y_idx_bszint

Batch size of y_idx_dtld.

y_idx_nbint

Number of batches of y_idx_dtld.

y_idx_dtldtorch.utils.data.DataLoader | tuple(ArrayLike)

Dataloader iterating over output component indices y_idx.

add_data(x, x_0, y_idx, n_steps, batch_size, x_seed, x_0_seed)[source]#

Setup a data manager iterating over inputs x, baselines x_0, and output component indices y_idx (or z_idx).

Parameters:
  • x (None | int | ArrayLike | tuple(ArrayLike)) –

    • None : x_dtld iterates over the whole dataset.

    • int : Number of x inputs sampled from the dataset.

    • ArrayLike | tuple(ArrayLike) : Set new x used by x_dtld.

  • x_0 (None | int | float | ArrayLike | tuple(ArrayLike)) –

    • None : Zero baseline x_0.

    • int : Number of x_0 baselines sampled from the dataset.

    • float : Constant baseline x_0.

    • ArrayLike | tuple(ArrayLike) : Set x_0 baselines used by x_0_dtld.

  • y_idx (None | int | ArrayLike) –

    • None : y_idx_dtld iterates over all output component indices y_idx.

    • int : Select a specific output component index y_idx.

    • ArrayLike : Select multiple output component indices y_idx.

  • n_steps (int) – Number of steps of the Riemann approximation of supporting Integrated Gradients (IG) (see [STY17] for details).

  • batch_size (None | int | tuple(int)) –

    • None : Set x_bsz = 1, x_0_bsz = n_x_0, and y_idx_bsz = n_y_idx (or z_idx_bsz = n_z_idx).

    • int : Total batch size budget automatically distributed between x_bsz, x_0_bsz, and y_idx_bsz (or z_idx_bsz).

    • tuple(int) : Set x_bsz, x_0_bsz, and y_idx_bsz (or z_idx_bsz) individually.

  • x_seed (None | int) – Seed associated with x_dtld.

  • x_0_seed (None | int) – Seed associated with x_0_dtld.

Returns:

Resolved y_idx if it was None.

Return type:

torch.Tensor

add_data_bsc(x, x_0, y_idx, n_iter, x_0_batch_size, x_seed, x_0_seed)[source]#

Setup a data manager dedicated to Baseline Shapley and Baseline Shapley Correlation attribution methods (igc.bsc.BaselineShapley and igc.bsc.BslShapCorr).

Parameters:
  • x (None | int | ArrayLike) –

    • None : x_dtld iterates over the whole dataset.

    • int : Number of x inputs sampled from the dataset.

    • ArrayLike : Set new x used by x_dtld.

  • x_0 (None | int | float | ArrayLike) –

    • None : Zero baseline x_0.

    • int : Number of x_0 baselines sampled from the dataset.

    • float : Constant baseline x_0.

    • ArrayLike : Set x_0 baselines used by x_0_dtld.

  • y_idx (None | int | ArrayLike) –

    • None : y_idx_dtld iterates over all output component indices y_idx.

    • int : Select a specific output component index y_idx.

    • ArrayLike : Select multiple output component indices y_idx.

  • n_iter (int) – Number of iterations, i.e. the number of random sequences of input component indices enabled one after the other.

  • x_0_batch_size (None | int) –

    • None : Set x_0_bsz = n_x_0.

    • int : Set x_0_bsz.

  • x_seed (None | int) – Seed associated with x_dtld.

  • x_0_seed (None | int) – Seed associated with x_0_dtld.

Returns:

Resolved y_idx if it was None.

Return type:

torch.Tensor

add_data_iter_x(x, y_idx, batch_size, x_seed)[source]#

Setup a data manager iterating over inputs x.

Parameters:
  • x (None | int | ArrayLike | tuple(ArrayLike)) –

    • None : x_dtld iterates over the whole dataset.

    • int : Number of x inputs sampled from the dataset.

    • ArrayLike | tuple(ArrayLike) : Set new x used by x_dtld.

  • y_idx (None | int | ArrayLike) – Selected output component indices. If None, y_idx is resolved to all output component indices.

  • batch_size (None | int) –

    • None : Set x_bsz = 1.

    • int : Set x_bsz.

  • x_seed (None | int) – Seed associated with x_dtld.

Returns:

Resolved y_idx if it was None.

Return type:

torch.Tensor

add_data_iter_x_y_idx(x, y_idx, batch_size, x_seed)[source]#

Setup a data manager iterating over inputs x and output component indices y_idx.

Parameters:
  • x (None | int | ArrayLike | tuple(ArrayLike)) –

    • None : x_dtld iterates over the whole dataset.

    • int : Number of x inputs sampled from the dataset.

    • ArrayLike | tuple(ArrayLike) : Set new x used by x_dtld.

  • y_idx (None | int | ArrayLike) –

    • None : y_idx_dtld iterates over all output component indices y_idx.

    • int : Select a specific output component index y_idx.

    • ArrayLike : Select multiple output component indices y_idx.

  • batch_size (None | int | tuple(int)) –

    • None : Set x_bsz = 1 and y_idx_bsz = n_y_idx.

    • int : Total batch size budget automatically distributed between x_bsz and y_idx_bsz.

    • tuple(int) : Set x_bsz and y_idx_bsz individually.

  • x_seed (None | int) – Seed associated with x_dtld.

Returns:

Resolved y_idx if it was None.

Return type:

torch.Tensor

add_data_naive(x, y_idx, batch_size, x_seed)[source]#

Setup a data manager dedicated to naive attribution methods ( igc.naive.NaiveCorrelation and igc.naive.NaiveTTest).

Parameters:
  • x (None | int | ArrayLike | tuple(ArrayLike)) –

    • None : x_dtld iterates over the whole dataset.

    • int : Number of x inputs sampled from the dataset.

    • ArrayLike | tuple(ArrayLike) : Set new x used by x_dtld.

  • y_idx (None | int | ArrayLike) –

    • None : y_idx_dtld iterates over all output component indices y_idx.

    • int : Select a specific output component index y_idx.

    • ArrayLike : Select multiple output component indices y_idx.

  • batch_size (None | int | tuple(int)) –

    • None : Set x_bsz = 1 and y_idx_bsz = n_y_idx.

    • int : Total batch size budget automatically distributed between x_bsz and y_idx_bsz.

    • tuple(int) : Set x_bsz and y_idx_bsz individually.

  • x_seed (None | int) – Seed associated with x_dtld.

Returns:

Resolved y_idx if it was None.

Return type:

torch.Tensor

get_x_dtype(numpy=False)[source]#

Return the PyTorch data types of all inputs x.

Parameters:

numpy (bool) – If True, it returns NumPy data types instead.

Returns:

Data types of all inputs x.

Return type:

tuple(dtype)

update_x_0_dtld_seed()[source]#

Update the seed of baseline dataloader after its initialization.

If the dataset associated with the dataloader has an attribute dtld.dataset.rng, that represents a random number generator (torch.Generator), it will be updated with the same seed.

Return type:

self