wdfkit package

Python package for WDF data treatment.

class wdfkit.CosmicRayRemover(spike_width: int = 5, spike_threshold: float = 3.5, spike_passes: int = 3, map_sensitivity: float = 0.01, map_disk_radius: int = 3, map_spike_width: int = 5, map_method: str = 'median', map_n_components: int = 3, spectral_dim: str | None = None)[source]

Bases: object

Cosmic-ray removal with automatic routing by data dimensionality.

1D (single spectrum) — always uses the 1D medfilt + MAD engine controlled by spike_width, spike_threshold, spike_passes.

2D (line scan / series / point collection)

  • fewer than 20 spectra → 1D engine applied independently to each spectrum (no population statistics yet).

  • 20 or more spectra → collection engine: global median or PCA reconstruction as reference; map_method selects which.

3D (spatial map)

  • fewer than 20 spectra → same per-spectrum 1D path as above.

  • 20 or more → spatial disk-median engine (map_method="median", default) or PCA engine (map_method="pca"). The disk-median path additionally respects map_sensitivity and map_disk_radius.

Optionally removes broad Nd:YAG harmonics before spike removal via harmonic_check() / remove().

Parameters:
  • spike_width (int) – 1D engine — odd integer ≥ 3. Sets the medfilt window in spectral channels. Raise to 9–13 when cosmic rays span 7–10 channels; keep at 5 for narrow single-channel spikes.

  • spike_threshold (float) – 1D engine — positive float. Spike cutoff = spike_threshold × MAD_noise. Lower → more aggressive. Raise to 5–6 for very noisy spectra to avoid false positives.

  • spike_passes (int) – 1D engine — integer ≥ 1. Iterations of detect → repair. Each pass works on the already-repaired signal so that large spikes no longer mask smaller ones.

  • map_sensitivity (float) – 3D disk-median engine only — scales overall aggressiveness. Larger → more hits (default 0.01).

  • map_disk_radius (int) – 3D disk-median engine only — spatial disk radius for the reference median filter (pixels).

  • map_spike_width (int) – Collection / 3D engines — spectral dilation in channels added around each detected hit (integer ≥ 1). Increase for broader cosmic rays (e.g. 915 for multi-channel spikes). The repair region is capped at 2 × map_spike_width channels.

  • map_method (str) – "median" (default): global median spectrum as reference for 2D; spatial disk-median for 3D. "pca": PCA reconstruction as reference for both 2D and 3D.

  • map_n_components (int) – PCA path only — number of principal components for the reconstruction reference. 3–5 covers most real samples; increase for multi-phase or compositionally diverse maps.

  • spectral_dim (str | None) – Name of the spectral axis (default: last dimension).

harmonic_check(spectrum: DataArray) DataArray[source]

Notch broad harmonics when LaserWaveLength is ~355 nm (Nd:YAG).

If spectrum.attrs['LaserWaveLength'] is outside 354–356 nm, returns spectrum unchanged.

Searches 1064 / 532 / 355 / 266 nm (±2.5 nm); replaces ~1 nm around each found peak with linear interpolation.

map_disk_radius: int = 3
map_method: str = 'median'
map_n_components: int = 3
map_sensitivity: float = 0.01
map_spike_width: int = 5
remove(spectrum: DataArray) DataArray[source]

Harmonic cleanup first, then cosmic-ray removal.

remove_cosmic_rays(spectrum: DataArray) DataArray[source]

Spike removal only (no harmonic notch).

remove_cosmic_rays_with_diagnostics(spectrum: DataArray) tuple[DataArray, dict[str, Any]][source]

Like remove_cosmic_rays(), but also returns a diagnostics dict for visualization / QC (not written to DataArray.attrs).

Diagnostics keys depend on the engine used:

  • 1D: "cosmic_mask", "corrected_1d"

  • loop-1D (< 20 spectra, 2D/3D): "cosmic_masks"

  • collection (≥ 20 spectra, 2D or 3D PCA): "core_mask", "repair_mask", "residual", "reference", "noise_per_channel", "cutoff"

  • 3D disk-median: same as current map diagnostics ("core_mask", "repair_mask", "residual", "preprocessed", "spatial_median_reference", etc.)

remove_with_diagnostics(spectrum: DataArray) tuple[DataArray, dict[str, Any]][source]

Harmonics, then remove_cosmic_rays_with_diagnostics().

spectral_dim: str | None = None
spike_passes: int = 3
spike_threshold: float = 3.5
spike_width: int = 5
transform(spectrum: DataArray) DataArray[source]

Alias of remove() (harmonics then cosmic rays).

class wdfkit.SpectraCleaner(method: CleanMethod = 'pca', n_components: NComponents = 'mle', subtract_min: bool = True, restore_min: bool = False, spectral_dim: str | None = None, pca_kwargs: dict[str, Any]=<factory>, per_spectrum: bool = False, smoother: SpectraSmoother | None = None)[source]

Bases: object

Denoise a population of spectra by low-rank PCA reconstruction.

Multi-spectrum inputs (2D stacks (n_spectra, spectral) or 3D map cubes (y, x, spectral)): uses PCA to separate shared signal from per-channel noise.

1-D single spectrum (spectral,) or any input when per_spectrum=True: delegates to SpectraSmoother (Savitzky-Golay by default). Pass a pre-configured SpectraSmoother via the smoother parameter to change the method or its settings.

Parameters:
  • method (CleanMethod) – PCA denoising method. Currently only "pca"; kept for forward compatibility.

  • n_components (NComponents) – Forwarded to sklearn.decomposition.PCA. "mle" (default), a float in (0, 1) for variance-explained, an int count, or None for min(n_spectra, n_spectral).

  • subtract_min (bool) – Subtract per-spectrum min before the PCA fit.

  • restore_min (bool) – Add the saved per-spectrum min back after reconstruction.

  • spectral_dim (str | None) – Name of the spectral axis. Defaults to the last dimension.

  • pca_kwargs (dict[str, Any]) – Extra kwargs forwarded to sklearn.decomposition.PCA.

  • per_spectrum (bool) – If True, bypass PCA and apply smoother independently to every spectrum regardless of input dimensionality. Useful when you want 1-D-style smoothing on a 2D/3D dataset.

  • smoother (SpectraSmoother | None) – A SpectraSmoother instance used for 1-D input and when per_spectrum=True. None (default) creates a SpectraSmoother() with Savitzky-Golay defaults.

clean(spectra: DataArray) DataArray[source]

Return a denoised copy of spectra (no decomposition payload).

clean_with_decomposition(spectra: DataArray) tuple[DataArray, dict[str, Any]][source]

Like clean(), but also returns the PCA decomposition.

When the smoother path is taken (1-D input or per_spectrum=True), the returned payload is {} — no decomposition is available for per-spectrum filtering.

The PCA payload has keys components, coeffs, mean, explained_variance, explained_variance_ratio, noise_variance.

method: CleanMethod = 'pca'
n_components: NComponents = 'mle'
pca_kwargs: dict[str, Any]
per_spectrum: bool = False
restore_min: bool = False
smoother: SpectraSmoother | None = None
spectral_dim: str | None = None
subtract_min: bool = True
transform(spectra: DataArray) DataArray[source]

Alias of clean().

class wdfkit.SpectraSmoother(method: Literal['savgol', 'whittaker'] = 'savgol', window_length: int = 11, polyorder: int = 3, lam: float | None = None, d: int = 2, auto_lam_calls: int = 5, spectral_dim: str | None = None)[source]

Bases: object

Per-spectrum 1-D smoothing for DataArrays of any shape.

Applies the chosen filter independently to every spectrum along the spectral axis. Suitable for 1-D single spectra, 2-D stacks (n_spectra, spectral), and 3-D map cubes (y, x, spectral).

Parameters:
  • method (Literal['savgol', 'whittaker']) – "savgol" (default) — Savitzky-Golay filter via scipy.signal.savgol_filter. "whittaker" — Whittaker-Eilers smoother (sparse linear system).

  • window_length (int) – Savitzky-Golay: number of channels in the filter window. Must be odd and >= polyorder + 2.

  • polyorder (int) – Savitzky-Golay: polynomial order (must be < window_length).

  • lam (float | None) – Whittaker-Eilers: smoothness penalty λ. None (default) triggers automatic selection via GCV minimisation (see auto_lam_calls).

  • d (int) – Whittaker-Eilers: difference order (default 2).

  • auto_lam_calls (int) – Maximum GCV evaluations when lam=None (default 5).

  • spectral_dim (str | None) – Name of the spectral axis. Defaults to the last dimension.

auto_lam_calls: int = 5
d: int = 2
lam: float | None = None
method: Literal['savgol', 'whittaker'] = 'savgol'
polyorder: int = 3
smooth(spectrum: DataArray) DataArray[source]

Return a smoothed copy of spectrum.

Works on DataArrays of any shape. Each spectrum (row along the spectral axis) is smoothed independently.

spectral_dim: str | None = None
transform(spectrum: DataArray) DataArray[source]

Alias of smooth().

window_length: int = 11
class wdfkit.WDFReader(path: str | PathLike[str], *, verbose: bool = False, time_coord: str = 'seconds_elapsed', spectral_dim: str | None = None, chunks: bool | int = False)[source]

Bases: object

Load and expose all parsed data from a Renishaw WiRE .wdf file.

Typical usage:

data_array, white_light_image = WDFReader(path)

After construction every block is accessible as a typed property. The xarray DataArray (shaped by scan type) is in .data; the PIL white-light image (if any) is in .image. Both are also yielded by unpacking the reader directly.

Parameters:
  • spectral_dim – Override for the spectral-axis dimension name. None (default) auto-selects from the XLST units (e.g. RamanShift"raman_shift").

  • chunks – Dask lazy reading: False (default, eager NumPy), True (auto-chunk ~128 MB), or int (target MB per chunk).

property acquisition: PSet | None

WXDA block parsed as a PSet (scan / acquisition properties).

property acquisition_time: datetime | None

Acquisition start time decoded from the ORGN Time entry.

property app_name: str

WiRE application name string.

property app_version: str

WiRE application version string.

property bkxl: BKXLInfo | None

background X list (mirrored spectral axis).

Type:

BKXL block

property calibration: PSet | None

WXCS block parsed as a PSet (calibration settings).

property comment: str | None

Free-text comment from the TEXT block.

property file_uuid: str

Unique file identifier from WDF1 header.

property has_whitelight: bool

True if a WHTL white-light image block is present.

property initial_coordinates: dict | None

Stage XYZ at acquisition time from WXIS.

Returns {"x_um", "y_um", "z_um", "x_str", "y_str", "z_str"} for every measurement type, including Single scans where ORGN carries no spatial origins.

property instrument_status: PSet | None

WXIS block parsed as a PSet (motor positions, instrument state).

property measurement_type: int

Raw measurement type integer (1=Single, 2=Series, 3=Map).

property motor_positions: dict | None

All motor positions from WXIS as {label: (µm, string)}.

property naccum: int

Accumulations per spectrum.

property ncollected: int

Actually collected number of spectra.

property nspectra: int

Planned number of spectra (capacity).

property orgn: list[OrgnEntry]

List of ORGN entries (spatial / time / flags per spectrum).

orgn_by_type(data_type: str) OrgnEntry | None[source]

Return the first ORGN entry matching data_type.

property raw_data: ndarray | None

Flat spectral array of shape (nspectra, xlist_length), float32.

property scan_type: int

Raw scan type integer.

property whtl_jpeg_bytes: bytes | None

Raw JPEG bytes from the WHTL block, or None if absent.

property wmap: WMAPInfo | None

grid geometry. None if not a map scan.

Type:

WMAP block

property xlist_length: int

Number of spectral channels per spectrum.

property xlst: XLSTInfo

spectral axis values, data_type, units, dim_name.

Type:

XLST block

property ylist_length: int

Detector Y-axis length (1 for point detectors).

property ylst: YLSTInfo | None

detector-Y axis (None for point detectors).

Type:

YLST block

property zeldac: PSet | None

ZLDC block parsed as a PSet (zero level & dark current).

wdfkit.catalog(directory: str | PathLike[str], recursive: bool = False) Catalog[source]

Scan directory for .wdf files and return a Catalog.

Uses header-only parsing (no spectra loaded) — fast even for large collections.

Parameters:
  • directory – Path to the directory to scan.

  • recursive – If True, walk subdirectories recursively.

wdfkit.classify(path: str | PathLike[str]) dict[source]

Return scan classification for a WiRE .wdf file without loading the spectral data.

Returns:

Keys: kind, measurement_type, scan_type, wmap_flag, nspectra, npoints, nsteps.

Return type:

dict

wdfkit.normalize(input_spectra: DataArray | ndarray, method: str = 'robust_scale', *, spectral_dim: str | None = None, **kwargs) DataArray | ndarray[source]

Scale spectra along the spectral axis.

For xarray.DataArray input, the spectral axis defaults to the last dimension (e.g. nm, raman_shift, shifts, …). Pass spectral_dim to select another dimension when spectra are not last.

Dask-backed DataArrays are handled transparently:

  • Per-spectrum methods ("l1", "l2", "max", "min_max", "area"): processed chunk-by-chunk via xr.apply_ufunc — no data is loaded into RAM beyond the current chunk.

  • Global methods ("robust_scale", "wave_number"): require statistics across all spectra; the full array is computed first. A UserWarning is emitted so you know RAM is being used.

Parameters:
  • input_spectra – DataArray or 2D ndarray of shape (n_spectra, n_points).

  • method – One of "l1", "l2", "max", "min_max", "wave_number", "robust_scale", "area".

  • spectral_dim – Spectral dimension name when input_spectra is a DataArray.

  • x_values – Spectral abscissa for ndarray input (default arange(n_points)).

Returns:

  • Same type as input_spectra with updated attrs["treatments"] for

  • DataArray output.

wdfkit.read(path: str | PathLike[str], *, verbose: bool = False, spectral_dim: str | None = None, chunks: bool | int = False) DataArray[source]

Read a WiRE .wdf file and return a xarray.DataArray.

Parameters:
  • path – Path to the .wdf file.

  • spectral_dim – Override for the spectral-axis dimension name.

  • chunks – Dask chunking: False (eager), True (auto), or int (target MB).

Returns:

Shape and dims depend on scan kind; spectral axis is always last.

Return type:

xarray.DataArray

wdfkit.remove_cosmic_rays_1d(y: ndarray, *, kernel_size: int = 5, threshold: float = 5.0, max_passes: int = 3) tuple[ndarray, ndarray][source]

Remove sharp positive spikes from one 1D spectrum.

Uses a scipy.signal.medfilt reference and MAD-based noise estimation. Operates on the raw counts / intensity array (only masked indices change).

The algorithm runs up to max_passes iterations. Each pass:

  1. Detects new spikes on the current (already-repaired) signal.

  2. Dilates the new spike mask by 1 channel on each side to catch sub-threshold spike edges.

  3. Accumulates into a single cumulative mask across all passes.

  4. Repairs by linear interpolation from the original signal at all cumulative masked positions — avoids chaining interpolation errors.

Early termination when a pass finds no new spikes.

Parameters:
  • y – One spectral trace (any numeric dtype; cast to float).

  • kernel_size – Odd length >= 3 for medfilt. Increase for broader spikes (e.g. 913 for 7–10 channel-wide cosmic rays).

  • threshold – Multiplier on MAD-derived noise (larger → fewer detections).

  • max_passes – Maximum number of detection–repair iterations (default 3).

Returns:

  • corrected_y – Same shape as y; unchanged if no spikes found or noise degenerate.

  • cosmic_mask – Boolean mask, same shape as y; True at all corrected channels.

Submodules

wdfkit.reader

Public WDFReader API plus module-level read() and classify().

class wdfkit.reader.WDFReader(path: str | PathLike[str], *, verbose: bool = False, time_coord: str = 'seconds_elapsed', spectral_dim: str | None = None, chunks: bool | int = False)[source]

Bases: object

Load and expose all parsed data from a Renishaw WiRE .wdf file.

Typical usage:

data_array, white_light_image = WDFReader(path)

After construction every block is accessible as a typed property. The xarray DataArray (shaped by scan type) is in .data; the PIL white-light image (if any) is in .image. Both are also yielded by unpacking the reader directly.

Parameters:
  • spectral_dim – Override for the spectral-axis dimension name. None (default) auto-selects from the XLST units (e.g. RamanShift"raman_shift").

  • chunks – Dask lazy reading: False (default, eager NumPy), True (auto-chunk ~128 MB), or int (target MB per chunk).

property acquisition: PSet | None

WXDA block parsed as a PSet (scan / acquisition properties).

property acquisition_time: datetime | None

Acquisition start time decoded from the ORGN Time entry.

property app_name: str

WiRE application name string.

property app_version: str

WiRE application version string.

property bkxl: BKXLInfo | None

background X list (mirrored spectral axis).

Type:

BKXL block

property calibration: PSet | None

WXCS block parsed as a PSet (calibration settings).

property comment: str | None

Free-text comment from the TEXT block.

property file_uuid: str

Unique file identifier from WDF1 header.

property has_whitelight: bool

True if a WHTL white-light image block is present.

property initial_coordinates: dict | None

Stage XYZ at acquisition time from WXIS.

Returns {"x_um", "y_um", "z_um", "x_str", "y_str", "z_str"} for every measurement type, including Single scans where ORGN carries no spatial origins.

property instrument_status: PSet | None

WXIS block parsed as a PSet (motor positions, instrument state).

property measurement_type: int

Raw measurement type integer (1=Single, 2=Series, 3=Map).

property motor_positions: dict | None

All motor positions from WXIS as {label: (µm, string)}.

property naccum: int

Accumulations per spectrum.

property ncollected: int

Actually collected number of spectra.

property nspectra: int

Planned number of spectra (capacity).

property orgn: list[OrgnEntry]

List of ORGN entries (spatial / time / flags per spectrum).

orgn_by_type(data_type: str) OrgnEntry | None[source]

Return the first ORGN entry matching data_type.

property raw_data: ndarray | None

Flat spectral array of shape (nspectra, xlist_length), float32.

property scan_type: int

Raw scan type integer.

property whtl_jpeg_bytes: bytes | None

Raw JPEG bytes from the WHTL block, or None if absent.

property wmap: WMAPInfo | None

grid geometry. None if not a map scan.

Type:

WMAP block

property xlist_length: int

Number of spectral channels per spectrum.

property xlst: XLSTInfo

spectral axis values, data_type, units, dim_name.

Type:

XLST block

property ylist_length: int

Detector Y-axis length (1 for point detectors).

property ylst: YLSTInfo | None

detector-Y axis (None for point detectors).

Type:

YLST block

property zeldac: PSet | None

ZLDC block parsed as a PSet (zero level & dark current).

wdfkit.reader.classify(path: str | PathLike[str]) dict[source]

Return scan classification for a WiRE .wdf file without loading the spectral data.

Returns:

Keys: kind, measurement_type, scan_type, wmap_flag, nspectra, npoints, nsteps.

Return type:

dict

wdfkit.reader.read(path: str | PathLike[str], *, verbose: bool = False, spectral_dim: str | None = None, chunks: bool | int = False) DataArray[source]

Read a WiRE .wdf file and return a xarray.DataArray.

Parameters:
  • path – Path to the .wdf file.

  • spectral_dim – Override for the spectral-axis dimension name.

  • chunks – Dask chunking: False (eager), True (auto), or int (target MB).

Returns:

Shape and dims depend on scan kind; spectral axis is always last.

Return type:

xarray.DataArray

wdfkit.cosmic_ray

Cosmic-ray removal: CosmicRayRemover and helpers.

class wdfkit.cosmic_ray.CosmicRayRemover(spike_width: int = 5, spike_threshold: float = 3.5, spike_passes: int = 3, map_sensitivity: float = 0.01, map_disk_radius: int = 3, map_spike_width: int = 5, map_method: str = 'median', map_n_components: int = 3, spectral_dim: str | None = None)[source]

Bases: object

Cosmic-ray removal with automatic routing by data dimensionality.

1D (single spectrum) — always uses the 1D medfilt + MAD engine controlled by spike_width, spike_threshold, spike_passes.

2D (line scan / series / point collection)

  • fewer than 20 spectra → 1D engine applied independently to each spectrum (no population statistics yet).

  • 20 or more spectra → collection engine: global median or PCA reconstruction as reference; map_method selects which.

3D (spatial map)

  • fewer than 20 spectra → same per-spectrum 1D path as above.

  • 20 or more → spatial disk-median engine (map_method="median", default) or PCA engine (map_method="pca"). The disk-median path additionally respects map_sensitivity and map_disk_radius.

Optionally removes broad Nd:YAG harmonics before spike removal via harmonic_check() / remove().

Parameters:
  • spike_width (int) – 1D engine — odd integer ≥ 3. Sets the medfilt window in spectral channels. Raise to 9–13 when cosmic rays span 7–10 channels; keep at 5 for narrow single-channel spikes.

  • spike_threshold (float) – 1D engine — positive float. Spike cutoff = spike_threshold × MAD_noise. Lower → more aggressive. Raise to 5–6 for very noisy spectra to avoid false positives.

  • spike_passes (int) – 1D engine — integer ≥ 1. Iterations of detect → repair. Each pass works on the already-repaired signal so that large spikes no longer mask smaller ones.

  • map_sensitivity (float) – 3D disk-median engine only — scales overall aggressiveness. Larger → more hits (default 0.01).

  • map_disk_radius (int) – 3D disk-median engine only — spatial disk radius for the reference median filter (pixels).

  • map_spike_width (int) – Collection / 3D engines — spectral dilation in channels added around each detected hit (integer ≥ 1). Increase for broader cosmic rays (e.g. 915 for multi-channel spikes). The repair region is capped at 2 × map_spike_width channels.

  • map_method (str) – "median" (default): global median spectrum as reference for 2D; spatial disk-median for 3D. "pca": PCA reconstruction as reference for both 2D and 3D.

  • map_n_components (int) – PCA path only — number of principal components for the reconstruction reference. 3–5 covers most real samples; increase for multi-phase or compositionally diverse maps.

  • spectral_dim (str | None) – Name of the spectral axis (default: last dimension).

harmonic_check(spectrum: DataArray) DataArray[source]

Notch broad harmonics when LaserWaveLength is ~355 nm (Nd:YAG).

If spectrum.attrs['LaserWaveLength'] is outside 354–356 nm, returns spectrum unchanged.

Searches 1064 / 532 / 355 / 266 nm (±2.5 nm); replaces ~1 nm around each found peak with linear interpolation.

map_disk_radius: int = 3
map_method: str = 'median'
map_n_components: int = 3
map_sensitivity: float = 0.01
map_spike_width: int = 5
remove(spectrum: DataArray) DataArray[source]

Harmonic cleanup first, then cosmic-ray removal.

remove_cosmic_rays(spectrum: DataArray) DataArray[source]

Spike removal only (no harmonic notch).

remove_cosmic_rays_with_diagnostics(spectrum: DataArray) tuple[DataArray, dict[str, Any]][source]

Like remove_cosmic_rays(), but also returns a diagnostics dict for visualization / QC (not written to DataArray.attrs).

Diagnostics keys depend on the engine used:

  • 1D: "cosmic_mask", "corrected_1d"

  • loop-1D (< 20 spectra, 2D/3D): "cosmic_masks"

  • collection (≥ 20 spectra, 2D or 3D PCA): "core_mask", "repair_mask", "residual", "reference", "noise_per_channel", "cutoff"

  • 3D disk-median: same as current map diagnostics ("core_mask", "repair_mask", "residual", "preprocessed", "spatial_median_reference", etc.)

remove_with_diagnostics(spectrum: DataArray) tuple[DataArray, dict[str, Any]][source]

Harmonics, then remove_cosmic_rays_with_diagnostics().

spectral_dim: str | None = None
spike_passes: int = 3
spike_threshold: float = 3.5
spike_width: int = 5
transform(spectrum: DataArray) DataArray[source]

Alias of remove() (harmonics then cosmic rays).

wdfkit.cosmic_ray.remove_cosmic_rays_1d(y: ndarray, *, kernel_size: int = 5, threshold: float = 5.0, max_passes: int = 3) tuple[ndarray, ndarray][source]

Remove sharp positive spikes from one 1D spectrum.

Uses a scipy.signal.medfilt reference and MAD-based noise estimation. Operates on the raw counts / intensity array (only masked indices change).

The algorithm runs up to max_passes iterations. Each pass:

  1. Detects new spikes on the current (already-repaired) signal.

  2. Dilates the new spike mask by 1 channel on each side to catch sub-threshold spike edges.

  3. Accumulates into a single cumulative mask across all passes.

  4. Repairs by linear interpolation from the original signal at all cumulative masked positions — avoids chaining interpolation errors.

Early termination when a pass finds no new spikes.

Parameters:
  • y – One spectral trace (any numeric dtype; cast to float).

  • kernel_size – Odd length >= 3 for medfilt. Increase for broader spikes (e.g. 913 for 7–10 channel-wide cosmic rays).

  • threshold – Multiplier on MAD-derived noise (larger → fewer detections).

  • max_passes – Maximum number of detection–repair iterations (default 3).

Returns:

  • corrected_y – Same shape as y; unchanged if no spikes found or noise degenerate.

  • cosmic_mask – Boolean mask, same shape as y; True at all corrected channels.

wdfkit.spectra_cleaner

Spectral denoising: SpectraCleaner.

class wdfkit.spectra_cleaner.SpectraCleaner(method: CleanMethod = 'pca', n_components: NComponents = 'mle', subtract_min: bool = True, restore_min: bool = False, spectral_dim: str | None = None, pca_kwargs: dict[str, Any]=<factory>, per_spectrum: bool = False, smoother: SpectraSmoother | None = None)[source]

Bases: object

Denoise a population of spectra by low-rank PCA reconstruction.

Multi-spectrum inputs (2D stacks (n_spectra, spectral) or 3D map cubes (y, x, spectral)): uses PCA to separate shared signal from per-channel noise.

1-D single spectrum (spectral,) or any input when per_spectrum=True: delegates to SpectraSmoother (Savitzky-Golay by default). Pass a pre-configured SpectraSmoother via the smoother parameter to change the method or its settings.

Parameters:
  • method (CleanMethod) – PCA denoising method. Currently only "pca"; kept for forward compatibility.

  • n_components (NComponents) – Forwarded to sklearn.decomposition.PCA. "mle" (default), a float in (0, 1) for variance-explained, an int count, or None for min(n_spectra, n_spectral).

  • subtract_min (bool) – Subtract per-spectrum min before the PCA fit.

  • restore_min (bool) – Add the saved per-spectrum min back after reconstruction.

  • spectral_dim (str | None) – Name of the spectral axis. Defaults to the last dimension.

  • pca_kwargs (dict[str, Any]) – Extra kwargs forwarded to sklearn.decomposition.PCA.

  • per_spectrum (bool) – If True, bypass PCA and apply smoother independently to every spectrum regardless of input dimensionality. Useful when you want 1-D-style smoothing on a 2D/3D dataset.

  • smoother (SpectraSmoother | None) – A SpectraSmoother instance used for 1-D input and when per_spectrum=True. None (default) creates a SpectraSmoother() with Savitzky-Golay defaults.

clean(spectra: DataArray) DataArray[source]

Return a denoised copy of spectra (no decomposition payload).

clean_with_decomposition(spectra: DataArray) tuple[DataArray, dict[str, Any]][source]

Like clean(), but also returns the PCA decomposition.

When the smoother path is taken (1-D input or per_spectrum=True), the returned payload is {} — no decomposition is available for per-spectrum filtering.

The PCA payload has keys components, coeffs, mean, explained_variance, explained_variance_ratio, noise_variance.

method: CleanMethod = 'pca'
n_components: NComponents = 'mle'
pca_kwargs: dict[str, Any]
per_spectrum: bool = False
restore_min: bool = False
smoother: SpectraSmoother | None = None
spectral_dim: str | None = None
subtract_min: bool = True
transform(spectra: DataArray) DataArray[source]

Alias of clean().

wdfkit.spectra_smoother

Per-spectrum smoothing: SpectraSmoother.

class wdfkit.spectra_smoother.SpectraSmoother(method: Literal['savgol', 'whittaker'] = 'savgol', window_length: int = 11, polyorder: int = 3, lam: float | None = None, d: int = 2, auto_lam_calls: int = 5, spectral_dim: str | None = None)[source]

Bases: object

Per-spectrum 1-D smoothing for DataArrays of any shape.

Applies the chosen filter independently to every spectrum along the spectral axis. Suitable for 1-D single spectra, 2-D stacks (n_spectra, spectral), and 3-D map cubes (y, x, spectral).

Parameters:
  • method (Literal['savgol', 'whittaker']) – "savgol" (default) — Savitzky-Golay filter via scipy.signal.savgol_filter. "whittaker" — Whittaker-Eilers smoother (sparse linear system).

  • window_length (int) – Savitzky-Golay: number of channels in the filter window. Must be odd and >= polyorder + 2.

  • polyorder (int) – Savitzky-Golay: polynomial order (must be < window_length).

  • lam (float | None) – Whittaker-Eilers: smoothness penalty λ. None (default) triggers automatic selection via GCV minimisation (see auto_lam_calls).

  • d (int) – Whittaker-Eilers: difference order (default 2).

  • auto_lam_calls (int) – Maximum GCV evaluations when lam=None (default 5).

  • spectral_dim (str | None) – Name of the spectral axis. Defaults to the last dimension.

auto_lam_calls: int = 5
d: int = 2
lam: float | None = None
method: Literal['savgol', 'whittaker'] = 'savgol'
polyorder: int = 3
smooth(spectrum: DataArray) DataArray[source]

Return a smoothed copy of spectrum.

Works on DataArrays of any shape. Each spectrum (row along the spectral axis) is smoothed independently.

spectral_dim: str | None = None
transform(spectrum: DataArray) DataArray[source]

Alias of smooth().

window_length: int = 11