wdfkit package
Python package for WDF data treatment.
- class wdfkit.CosmicRayRemover(spike_width: int = 5, spike_threshold: float = 3.5, spike_passes: int = 3, map_sensitivity: float = 0.01, map_disk_radius: int = 3, map_spike_width: int = 5, map_method: str = 'median', map_n_components: int = 3, spectral_dim: str | None = None)[source]
Bases:
objectCosmic-ray removal with automatic routing by data dimensionality.
1D (single spectrum) — always uses the 1D medfilt + MAD engine controlled by
spike_width,spike_threshold,spike_passes.2D (line scan / series / point collection)
fewer than 20 spectra → 1D engine applied independently to each spectrum (no population statistics yet).
20 or more spectra → collection engine: global median or PCA reconstruction as reference;
map_methodselects which.
3D (spatial map)
fewer than 20 spectra → same per-spectrum 1D path as above.
20 or more → spatial disk-median engine (
map_method="median", default) or PCA engine (map_method="pca"). The disk-median path additionally respectsmap_sensitivityandmap_disk_radius.
Optionally removes broad Nd:YAG harmonics before spike removal via
harmonic_check()/remove().- Parameters:
spike_width (int) – 1D engine — odd integer ≥ 3. Sets the
medfiltwindow in spectral channels. Raise to 9–13 when cosmic rays span 7–10 channels; keep at 5 for narrow single-channel spikes.spike_threshold (float) – 1D engine — positive float. Spike cutoff =
spike_threshold × MAD_noise. Lower → more aggressive. Raise to 5–6 for very noisy spectra to avoid false positives.spike_passes (int) – 1D engine — integer ≥ 1. Iterations of detect → repair. Each pass works on the already-repaired signal so that large spikes no longer mask smaller ones.
map_sensitivity (float) – 3D disk-median engine only — scales overall aggressiveness. Larger → more hits (default 0.01).
map_disk_radius (int) – 3D disk-median engine only — spatial disk radius for the reference median filter (pixels).
map_spike_width (int) – Collection / 3D engines — spectral dilation in channels added around each detected hit (integer ≥ 1). Increase for broader cosmic rays (e.g.
9–15for multi-channel spikes). The repair region is capped at2 × map_spike_widthchannels.map_method (str) –
"median"(default): global median spectrum as reference for 2D; spatial disk-median for 3D."pca": PCA reconstruction as reference for both 2D and 3D.map_n_components (int) – PCA path only — number of principal components for the reconstruction reference. 3–5 covers most real samples; increase for multi-phase or compositionally diverse maps.
spectral_dim (str | None) – Name of the spectral axis (default: last dimension).
- harmonic_check(spectrum: DataArray) DataArray[source]
Notch broad harmonics when
LaserWaveLengthis ~355 nm (Nd:YAG).If
spectrum.attrs['LaserWaveLength']is outside 354–356 nm, returnsspectrumunchanged.Searches 1064 / 532 / 355 / 266 nm (±2.5 nm); replaces ~1 nm around each found peak with linear interpolation.
- remove_cosmic_rays_with_diagnostics(spectrum: DataArray) tuple[DataArray, dict[str, Any]][source]
Like
remove_cosmic_rays(), but also returns a diagnostics dict for visualization / QC (not written toDataArray.attrs).Diagnostics keys depend on the engine used:
1D:
"cosmic_mask","corrected_1d"loop-1D (< 20 spectra, 2D/3D):
"cosmic_masks"collection (≥ 20 spectra, 2D or 3D PCA):
"core_mask","repair_mask","residual","reference","noise_per_channel","cutoff"3D disk-median: same as current map diagnostics (
"core_mask","repair_mask","residual","preprocessed","spatial_median_reference", etc.)
- class wdfkit.SpectraCleaner(method: CleanMethod = 'pca', n_components: NComponents = 'mle', subtract_min: bool = True, restore_min: bool = False, spectral_dim: str | None = None, pca_kwargs: dict[str, Any]=<factory>, per_spectrum: bool = False, smoother: SpectraSmoother | None = None)[source]
Bases:
objectDenoise a population of spectra by low-rank PCA reconstruction.
Multi-spectrum inputs (2D stacks
(n_spectra, spectral)or 3D map cubes(y, x, spectral)): uses PCA to separate shared signal from per-channel noise.1-D single spectrum
(spectral,)or any input whenper_spectrum=True: delegates toSpectraSmoother(Savitzky-Golay by default). Pass a pre-configuredSpectraSmoothervia thesmootherparameter to change the method or its settings.- Parameters:
method (CleanMethod) – PCA denoising method. Currently only
"pca"; kept for forward compatibility.n_components (NComponents) – Forwarded to
sklearn.decomposition.PCA."mle"(default), afloatin(0, 1)for variance-explained, anintcount, orNoneformin(n_spectra, n_spectral).subtract_min (bool) – Subtract per-spectrum min before the PCA fit.
restore_min (bool) – Add the saved per-spectrum min back after reconstruction.
spectral_dim (str | None) – Name of the spectral axis. Defaults to the last dimension.
pca_kwargs (dict[str, Any]) – Extra kwargs forwarded to
sklearn.decomposition.PCA.per_spectrum (bool) – If
True, bypass PCA and applysmootherindependently to every spectrum regardless of input dimensionality. Useful when you want 1-D-style smoothing on a 2D/3D dataset.smoother (SpectraSmoother | None) – A
SpectraSmootherinstance used for 1-D input and whenper_spectrum=True.None(default) creates aSpectraSmoother()with Savitzky-Golay defaults.
- clean(spectra: DataArray) DataArray[source]
Return a denoised copy of
spectra(no decomposition payload).
- clean_with_decomposition(spectra: DataArray) tuple[DataArray, dict[str, Any]][source]
Like
clean(), but also returns the PCA decomposition.When the smoother path is taken (1-D input or
per_spectrum=True), the returned payload is{}— no decomposition is available for per-spectrum filtering.The PCA payload has keys
components,coeffs,mean,explained_variance,explained_variance_ratio,noise_variance.
- method: CleanMethod = 'pca'
- n_components: NComponents = 'mle'
- smoother: SpectraSmoother | None = None
- class wdfkit.SpectraSmoother(method: Literal['savgol', 'whittaker'] = 'savgol', window_length: int = 11, polyorder: int = 3, lam: float | None = None, d: int = 2, auto_lam_calls: int = 5, spectral_dim: str | None = None)[source]
Bases:
objectPer-spectrum 1-D smoothing for DataArrays of any shape.
Applies the chosen filter independently to every spectrum along the spectral axis. Suitable for 1-D single spectra, 2-D stacks
(n_spectra, spectral), and 3-D map cubes(y, x, spectral).- Parameters:
method (Literal['savgol', 'whittaker']) –
"savgol"(default) — Savitzky-Golay filter viascipy.signal.savgol_filter."whittaker"— Whittaker-Eilers smoother (sparse linear system).window_length (int) – Savitzky-Golay: number of channels in the filter window. Must be odd and >=
polyorder + 2.polyorder (int) – Savitzky-Golay: polynomial order (must be <
window_length).lam (float | None) – Whittaker-Eilers: smoothness penalty λ.
None(default) triggers automatic selection via GCV minimisation (seeauto_lam_calls).d (int) – Whittaker-Eilers: difference order (default 2).
auto_lam_calls (int) – Maximum GCV evaluations when
lam=None(default 5).spectral_dim (str | None) – Name of the spectral axis. Defaults to the last dimension.
- class wdfkit.WDFReader(path: str | PathLike[str], *, verbose: bool = False, time_coord: str = 'seconds_elapsed', spectral_dim: str | None = None, chunks: bool | int = False)[source]
Bases:
objectLoad and expose all parsed data from a Renishaw WiRE
.wdffile.Typical usage:
data_array, white_light_image = WDFReader(path)
After construction every block is accessible as a typed property. The xarray DataArray (shaped by scan type) is in
.data; the PIL white-light image (if any) is in.image. Both are also yielded by unpacking the reader directly.- Parameters:
spectral_dim – Override for the spectral-axis dimension name.
None(default) auto-selects from the XLST units (e.g.RamanShift→"raman_shift").chunks – Dask lazy reading:
False(default, eager NumPy),True(auto-chunk ~128 MB), orint(target MB per chunk).
- property acquisition_time: datetime | None
Acquisition start time decoded from the ORGN Time entry.
- property initial_coordinates: dict | None
Stage XYZ at acquisition time from WXIS.
Returns
{"x_um", "y_um", "z_um", "x_str", "y_str", "z_str"}for every measurement type, including Single scans where ORGN carries no spatial origins.
- property instrument_status: PSet | None
WXIS block parsed as a PSet (motor positions, instrument state).
- orgn_by_type(data_type: str) OrgnEntry | None[source]
Return the first ORGN entry matching data_type.
- property xlst: XLSTInfo
spectral axis values, data_type, units, dim_name.
- Type:
XLST block
- wdfkit.catalog(directory: str | PathLike[str], recursive: bool = False) Catalog[source]
Scan directory for
.wdffiles and return aCatalog.Uses header-only parsing (no spectra loaded) — fast even for large collections.
- Parameters:
directory – Path to the directory to scan.
recursive – If
True, walk subdirectories recursively.
- wdfkit.classify(path: str | PathLike[str]) dict[source]
Return scan classification for a WiRE
.wdffile without loading the spectral data.- Returns:
Keys:
kind,measurement_type,scan_type,wmap_flag,nspectra,npoints,nsteps.- Return type:
- wdfkit.normalize(input_spectra: DataArray | ndarray, method: str = 'robust_scale', *, spectral_dim: str | None = None, **kwargs) DataArray | ndarray[source]
Scale spectra along the spectral axis.
For
xarray.DataArrayinput, the spectral axis defaults to the last dimension (e.g.nm,raman_shift,shifts, …). Passspectral_dimto select another dimension when spectra are not last.Dask-backed DataArrays are handled transparently:
Per-spectrum methods (
"l1","l2","max","min_max","area"): processed chunk-by-chunk viaxr.apply_ufunc— no data is loaded into RAM beyond the current chunk.Global methods (
"robust_scale","wave_number"): require statistics across all spectra; the full array is computed first. AUserWarningis emitted so you know RAM is being used.
- Parameters:
input_spectra – DataArray or 2D ndarray of shape
(n_spectra, n_points).method – One of
"l1","l2","max","min_max","wave_number","robust_scale","area".spectral_dim – Spectral dimension name when
input_spectrais a DataArray.x_values – Spectral abscissa for ndarray input (default
arange(n_points)).
- Returns:
Same type as
input_spectrawith updatedattrs["treatments"]forDataArray output.
- wdfkit.read(path: str | PathLike[str], *, verbose: bool = False, spectral_dim: str | None = None, chunks: bool | int = False) DataArray[source]
Read a WiRE
.wdffile and return axarray.DataArray.- Parameters:
path – Path to the
.wdffile.spectral_dim – Override for the spectral-axis dimension name.
chunks – Dask chunking:
False(eager),True(auto), or int (target MB).
- Returns:
Shape and dims depend on scan kind; spectral axis is always last.
- Return type:
- wdfkit.remove_cosmic_rays_1d(y: ndarray, *, kernel_size: int = 5, threshold: float = 5.0, max_passes: int = 3) tuple[ndarray, ndarray][source]
Remove sharp positive spikes from one 1D spectrum.
Uses a
scipy.signal.medfiltreference and MAD-based noise estimation. Operates on the raw counts / intensity array (only masked indices change).The algorithm runs up to
max_passesiterations. Each pass:Detects new spikes on the current (already-repaired) signal.
Dilates the new spike mask by 1 channel on each side to catch sub-threshold spike edges.
Accumulates into a single cumulative mask across all passes.
Repairs by linear interpolation from the original signal at all cumulative masked positions — avoids chaining interpolation errors.
Early termination when a pass finds no new spikes.
- Parameters:
y – One spectral trace (any numeric dtype; cast to float).
kernel_size – Odd length
>= 3formedfilt. Increase for broader spikes (e.g.9–13for 7–10 channel-wide cosmic rays).threshold – Multiplier on MAD-derived noise (larger → fewer detections).
max_passes – Maximum number of detection–repair iterations (default 3).
- Returns:
corrected_y – Same shape as
y; unchanged if no spikes found or noise degenerate.cosmic_mask – Boolean mask, same shape as
y;Trueat all corrected channels.
Submodules
wdfkit.reader
Public WDFReader API plus module-level read() and
classify().
- class wdfkit.reader.WDFReader(path: str | PathLike[str], *, verbose: bool = False, time_coord: str = 'seconds_elapsed', spectral_dim: str | None = None, chunks: bool | int = False)[source]
Bases:
objectLoad and expose all parsed data from a Renishaw WiRE
.wdffile.Typical usage:
data_array, white_light_image = WDFReader(path)
After construction every block is accessible as a typed property. The xarray DataArray (shaped by scan type) is in
.data; the PIL white-light image (if any) is in.image. Both are also yielded by unpacking the reader directly.- Parameters:
spectral_dim – Override for the spectral-axis dimension name.
None(default) auto-selects from the XLST units (e.g.RamanShift→"raman_shift").chunks – Dask lazy reading:
False(default, eager NumPy),True(auto-chunk ~128 MB), orint(target MB per chunk).
- property acquisition_time: datetime | None
Acquisition start time decoded from the ORGN Time entry.
- property initial_coordinates: dict | None
Stage XYZ at acquisition time from WXIS.
Returns
{"x_um", "y_um", "z_um", "x_str", "y_str", "z_str"}for every measurement type, including Single scans where ORGN carries no spatial origins.
- property instrument_status: PSet | None
WXIS block parsed as a PSet (motor positions, instrument state).
- orgn_by_type(data_type: str) OrgnEntry | None[source]
Return the first ORGN entry matching data_type.
- property xlst: XLSTInfo
spectral axis values, data_type, units, dim_name.
- Type:
XLST block
- wdfkit.reader.classify(path: str | PathLike[str]) dict[source]
Return scan classification for a WiRE
.wdffile without loading the spectral data.- Returns:
Keys:
kind,measurement_type,scan_type,wmap_flag,nspectra,npoints,nsteps.- Return type:
- wdfkit.reader.read(path: str | PathLike[str], *, verbose: bool = False, spectral_dim: str | None = None, chunks: bool | int = False) DataArray[source]
Read a WiRE
.wdffile and return axarray.DataArray.- Parameters:
path – Path to the
.wdffile.spectral_dim – Override for the spectral-axis dimension name.
chunks – Dask chunking:
False(eager),True(auto), or int (target MB).
- Returns:
Shape and dims depend on scan kind; spectral axis is always last.
- Return type:
wdfkit.cosmic_ray
Cosmic-ray removal: CosmicRayRemover and helpers.
- class wdfkit.cosmic_ray.CosmicRayRemover(spike_width: int = 5, spike_threshold: float = 3.5, spike_passes: int = 3, map_sensitivity: float = 0.01, map_disk_radius: int = 3, map_spike_width: int = 5, map_method: str = 'median', map_n_components: int = 3, spectral_dim: str | None = None)[source]
Bases:
objectCosmic-ray removal with automatic routing by data dimensionality.
1D (single spectrum) — always uses the 1D medfilt + MAD engine controlled by
spike_width,spike_threshold,spike_passes.2D (line scan / series / point collection)
fewer than 20 spectra → 1D engine applied independently to each spectrum (no population statistics yet).
20 or more spectra → collection engine: global median or PCA reconstruction as reference;
map_methodselects which.
3D (spatial map)
fewer than 20 spectra → same per-spectrum 1D path as above.
20 or more → spatial disk-median engine (
map_method="median", default) or PCA engine (map_method="pca"). The disk-median path additionally respectsmap_sensitivityandmap_disk_radius.
Optionally removes broad Nd:YAG harmonics before spike removal via
harmonic_check()/remove().- Parameters:
spike_width (int) – 1D engine — odd integer ≥ 3. Sets the
medfiltwindow in spectral channels. Raise to 9–13 when cosmic rays span 7–10 channels; keep at 5 for narrow single-channel spikes.spike_threshold (float) – 1D engine — positive float. Spike cutoff =
spike_threshold × MAD_noise. Lower → more aggressive. Raise to 5–6 for very noisy spectra to avoid false positives.spike_passes (int) – 1D engine — integer ≥ 1. Iterations of detect → repair. Each pass works on the already-repaired signal so that large spikes no longer mask smaller ones.
map_sensitivity (float) – 3D disk-median engine only — scales overall aggressiveness. Larger → more hits (default 0.01).
map_disk_radius (int) – 3D disk-median engine only — spatial disk radius for the reference median filter (pixels).
map_spike_width (int) – Collection / 3D engines — spectral dilation in channels added around each detected hit (integer ≥ 1). Increase for broader cosmic rays (e.g.
9–15for multi-channel spikes). The repair region is capped at2 × map_spike_widthchannels.map_method (str) –
"median"(default): global median spectrum as reference for 2D; spatial disk-median for 3D."pca": PCA reconstruction as reference for both 2D and 3D.map_n_components (int) – PCA path only — number of principal components for the reconstruction reference. 3–5 covers most real samples; increase for multi-phase or compositionally diverse maps.
spectral_dim (str | None) – Name of the spectral axis (default: last dimension).
- harmonic_check(spectrum: DataArray) DataArray[source]
Notch broad harmonics when
LaserWaveLengthis ~355 nm (Nd:YAG).If
spectrum.attrs['LaserWaveLength']is outside 354–356 nm, returnsspectrumunchanged.Searches 1064 / 532 / 355 / 266 nm (±2.5 nm); replaces ~1 nm around each found peak with linear interpolation.
- remove_cosmic_rays_with_diagnostics(spectrum: DataArray) tuple[DataArray, dict[str, Any]][source]
Like
remove_cosmic_rays(), but also returns a diagnostics dict for visualization / QC (not written toDataArray.attrs).Diagnostics keys depend on the engine used:
1D:
"cosmic_mask","corrected_1d"loop-1D (< 20 spectra, 2D/3D):
"cosmic_masks"collection (≥ 20 spectra, 2D or 3D PCA):
"core_mask","repair_mask","residual","reference","noise_per_channel","cutoff"3D disk-median: same as current map diagnostics (
"core_mask","repair_mask","residual","preprocessed","spatial_median_reference", etc.)
- wdfkit.cosmic_ray.remove_cosmic_rays_1d(y: ndarray, *, kernel_size: int = 5, threshold: float = 5.0, max_passes: int = 3) tuple[ndarray, ndarray][source]
Remove sharp positive spikes from one 1D spectrum.
Uses a
scipy.signal.medfiltreference and MAD-based noise estimation. Operates on the raw counts / intensity array (only masked indices change).The algorithm runs up to
max_passesiterations. Each pass:Detects new spikes on the current (already-repaired) signal.
Dilates the new spike mask by 1 channel on each side to catch sub-threshold spike edges.
Accumulates into a single cumulative mask across all passes.
Repairs by linear interpolation from the original signal at all cumulative masked positions — avoids chaining interpolation errors.
Early termination when a pass finds no new spikes.
- Parameters:
y – One spectral trace (any numeric dtype; cast to float).
kernel_size – Odd length
>= 3formedfilt. Increase for broader spikes (e.g.9–13for 7–10 channel-wide cosmic rays).threshold – Multiplier on MAD-derived noise (larger → fewer detections).
max_passes – Maximum number of detection–repair iterations (default 3).
- Returns:
corrected_y – Same shape as
y; unchanged if no spikes found or noise degenerate.cosmic_mask – Boolean mask, same shape as
y;Trueat all corrected channels.
wdfkit.spectra_cleaner
Spectral denoising: SpectraCleaner.
- class wdfkit.spectra_cleaner.SpectraCleaner(method: CleanMethod = 'pca', n_components: NComponents = 'mle', subtract_min: bool = True, restore_min: bool = False, spectral_dim: str | None = None, pca_kwargs: dict[str, Any]=<factory>, per_spectrum: bool = False, smoother: SpectraSmoother | None = None)[source]
Bases:
objectDenoise a population of spectra by low-rank PCA reconstruction.
Multi-spectrum inputs (2D stacks
(n_spectra, spectral)or 3D map cubes(y, x, spectral)): uses PCA to separate shared signal from per-channel noise.1-D single spectrum
(spectral,)or any input whenper_spectrum=True: delegates toSpectraSmoother(Savitzky-Golay by default). Pass a pre-configuredSpectraSmoothervia thesmootherparameter to change the method or its settings.- Parameters:
method (CleanMethod) – PCA denoising method. Currently only
"pca"; kept for forward compatibility.n_components (NComponents) – Forwarded to
sklearn.decomposition.PCA."mle"(default), afloatin(0, 1)for variance-explained, anintcount, orNoneformin(n_spectra, n_spectral).subtract_min (bool) – Subtract per-spectrum min before the PCA fit.
restore_min (bool) – Add the saved per-spectrum min back after reconstruction.
spectral_dim (str | None) – Name of the spectral axis. Defaults to the last dimension.
pca_kwargs (dict[str, Any]) – Extra kwargs forwarded to
sklearn.decomposition.PCA.per_spectrum (bool) – If
True, bypass PCA and applysmootherindependently to every spectrum regardless of input dimensionality. Useful when you want 1-D-style smoothing on a 2D/3D dataset.smoother (SpectraSmoother | None) – A
SpectraSmootherinstance used for 1-D input and whenper_spectrum=True.None(default) creates aSpectraSmoother()with Savitzky-Golay defaults.
- clean(spectra: DataArray) DataArray[source]
Return a denoised copy of
spectra(no decomposition payload).
- clean_with_decomposition(spectra: DataArray) tuple[DataArray, dict[str, Any]][source]
Like
clean(), but also returns the PCA decomposition.When the smoother path is taken (1-D input or
per_spectrum=True), the returned payload is{}— no decomposition is available for per-spectrum filtering.The PCA payload has keys
components,coeffs,mean,explained_variance,explained_variance_ratio,noise_variance.
- method: CleanMethod = 'pca'
- n_components: NComponents = 'mle'
- smoother: SpectraSmoother | None = None
wdfkit.spectra_smoother
Per-spectrum smoothing: SpectraSmoother.
- class wdfkit.spectra_smoother.SpectraSmoother(method: Literal['savgol', 'whittaker'] = 'savgol', window_length: int = 11, polyorder: int = 3, lam: float | None = None, d: int = 2, auto_lam_calls: int = 5, spectral_dim: str | None = None)[source]
Bases:
objectPer-spectrum 1-D smoothing for DataArrays of any shape.
Applies the chosen filter independently to every spectrum along the spectral axis. Suitable for 1-D single spectra, 2-D stacks
(n_spectra, spectral), and 3-D map cubes(y, x, spectral).- Parameters:
method (Literal['savgol', 'whittaker']) –
"savgol"(default) — Savitzky-Golay filter viascipy.signal.savgol_filter."whittaker"— Whittaker-Eilers smoother (sparse linear system).window_length (int) – Savitzky-Golay: number of channels in the filter window. Must be odd and >=
polyorder + 2.polyorder (int) – Savitzky-Golay: polynomial order (must be <
window_length).lam (float | None) – Whittaker-Eilers: smoothness penalty λ.
None(default) triggers automatic selection via GCV minimisation (seeauto_lam_calls).d (int) – Whittaker-Eilers: difference order (default 2).
auto_lam_calls (int) – Maximum GCV evaluations when
lam=None(default 5).spectral_dim (str | None) – Name of the spectral axis. Defaults to the last dimension.