Lesson 5: state analysis
========================
**Author: Su Ye (remotesensingsuy@gmail.com)**
**Time series datasets: MODIS & GPCP**
**Application: greenning and precipitation seasonality**
In state-space theory, a state is an unobserved (latent) variable that
evolves through time and governs the dynamics of a system. The
observations we collect (e.g., NDVI, reflectance, precipitation) are
assumed to be generated from these hidden states, with additional noise.
Formally, the basic state-space model is defined by two equations: a
state (transition) equation, which describes the temporal evolution of
the latent state, and an observation equation, which links the latent
state to the observed data. Both equations incorporate stochastic error
terms, reflecting uncertainty in the evolving process and the
measurement process. This stochastic foundation is the reason why the
framework is referred to as stochastic continuous change detection
(S-CCD).
.. math:: a_{t}=Ta_{t-1} + Q
.. math:: y_{t}=Za_{t} + H
- :math:`a_{t}`: the hidden state at time (t)
- :math:`T`: transition matrix (defines how states evolve)
- :math:`Z`: the system matrix that determines the items included in the
observation
- :math:`y_{t}`: the observation
- :math:`Q`: process noise
- :math:`H`: observation noise
In S-CCD [1], the state is represented as a vector comprising three
temporal components: trend, annual, and semiannual. When
``trimodel``\ =True is specified in the flexible mode, an additional
component corresponding to a four-month cycle (``trimodel``) is
included. S-CCD supports the output of states for each of these
components. Unlike the harmonic regression line with fixed coefficients,
the states incorporate both observational and process noise, resulting
in temporal variability that better captures local fluctuations.
This lesson demonstrates a novel application of ``states`` for analyzing
subtle long-term changes that are often overlooked by traditional break
detection approaches.
*[1] Ye, S., Rogan, J., Zhu, Z., & Eastman, J. R. (2021). A
near-real-time approach for monitoring forest disturbance using Landsat
time series: Stochastic continuous change detection. Remote Sensing of
Environment, 252, 112167.*
--------------
Greenning trend analysis
------------------------
.. code:: ipython3
import pandas as pd
import numpy as np
import pathlib
from dateutil import parser
from pyxccd import sccd_detect_flex
TUTORIAL_DATASET = (pathlib.Path.cwd() / 'datasets').resolve() # modify it as you need
assert TUTORIAL_DATASET.exists()
in_path = TUTORIAL_DATASET/ '5_greenning_modis.csv' # read single-pixel MODIS time series
# read example csv for HLS time series
data = pd.read_csv(in_path)
# as the original data doesn't have qa, we append qa as all zeros value (meaning they are all clear)
data['qa'] = np.zeros(data.shape[0])
# force column name to 'date' to let display_sccd_states work
data = data.rename(columns={'date': 'dates'})
# convert them to ordinal dates
ordinal_dates = [pd.Timestamp.toordinal(parser.parse(row)) for row in data["dates"]]
data.loc[:, "dates"] = ordinal_dates
dates, ndvi, qas = data.to_numpy().copy().T
print(sccd_detect_flex(dates, ndvi, qas, lam=20, state_intervaldays=1))
.. raw:: html
.. code:: text
:class: output-block
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[1], line 27
24 data.loc[:, "dates"] = ordinal_dates
26 dates, ndvi, qas = data.to_numpy().copy().T
---> 27 print(sccd_detect_flex(dates, ndvi, qas, lam=20, state_intervaldays=1))
File c:\Users\dell\env_collect\py311_geo\Lib\site-packages\pyxccd\ccd.py:929, in sccd_detect_flex(dates, ts_stack, qas, lam, p_cg, conse, pos, b_c2, output_anomaly, anomaly_pcg, anomaly_conse, state_intervaldays, tmask_b1_index, tmask_b2_index, fitting_coefs, trimodal)
915 _validate_params(
916 func_name="sccd_detect_flex",
917 p_cg=p_cg,
(...)
926 trimodal=trimodal,
927 )
928 # make sure it is c contiguous array and 64 bit
--> 929 dates, ts_stack, qas = _validate_data_flex(dates, ts_stack, qas)
930 valid_num_scenes = ts_stack.shape[0]
931 nbands = ts_stack.shape[1] if ts_stack.ndim > 1 else 1
File c:\Users\dell\env_collect\py311_geo\Lib\site-packages\pyxccd\ccd.py:138, in _validate_data_flex(dates, ts_data, qas)
126 """
127 validate and forcibly change the data format
128 Parameters
(...)
135 ----------
136 """
137 check_consistent_length(dates, ts_data, qas)
--> 138 check_1d(dates, "dates")
139 check_1d(qas, "qas")
141 dates = dates.astype(dtype=numpy.int64, order="C")
File c:\Users\dell\env_collect\py311_geo\Lib\site-packages\pyxccd\_param_validation.py:663, in check_1d(array, var_name)
654 raise ValueError(
655 "Expected 1D array for input {}, but got {}D".format(var_name, array.ndim)
656 )
658 if (
659 (array.dtype != "int64")
660 and (array.dtype != "int32")
661 and (array.dtype != "int16")
662 ):
--> 663 raise ValueError(
664 "Expected int16, int32, int64 for the input, but got {}".format(array.dtype)
665 )
ValueError: Expected int16, int32, int64 for the input, but got object
Oops, we got the error “Expected int16, int32, int64 for the input, but
got object”. This is because is that the datatype for NDVI is double
[-1, 1]. We need to convert it to integer. **Pyxccd only supports
integer input!**
The scale does change the results of CCDC because the Lasso parameter is
sensitive to the magnitude of the input values, which directly
influences the balance between fitting accuracy and regularization
strength. To ensure optimal performance and comparability, we recommend
scaling the floating-point reflectance values so that they fall within
or close to the standard Landsat reflectance range [0,10000]. This
adjustment harmonizes the input data with the scale for which CCDC is
typically calibrated and helps prevent bias in model fitting. In this
case, multiplying the reflectance values by 10,000 provides an
appropriate transformation:
.. code:: ipython3
import pathlib
from datetime import date
from typing import List, Tuple, Dict, Union, Optional
import seaborn as sns
import matplotlib.pyplot as plt
from matplotlib.axes import Axes
from pyxccd.common import SccdOutput
from pyxccd.utils import getcategory_sccd, defaults
def display_sccd_states_flex(
data_df: pd.DataFrame,
states:pd.DataFrame,
axes: Axes,
variable_name: str,
title:str,
band_name:str = "b0",
plot_kwargs: Optional[Dict] = None
):
default_plot_kwargs: Dict[str, Union[int, float, str]] = {
'marker_size': 5,
'marker_alpha': 0.7,
'line_color': 'orange',
'font_size': 14
}
if plot_kwargs is not None:
default_plot_kwargs.update(plot_kwargs)
# convert ordinal dates to calendar
formal_dates = [pd.Timestamp.fromordinal(int(row)) for row in states["dates"]]
states.loc[:, "dates_formal"] = formal_dates
extra = (np.max(states[f"{band_name}_trend"]) - np.min(states[f"{band_name}_trend"])) / 4
axes[0].set(ylim=(np.min(states[f"{band_name}_trend"]) - extra, np.max(states[f"{band_name}_trend"]) + extra))
sns.lineplot(x="dates_formal", y=f"{band_name}_trend", data=states, ax=axes[0], color="orange")
axes[0].set(ylabel=f"Trend")
extra = (np.max(states[f"{band_name}_annual"]) - np.min(states[f"{band_name}_annual"])) / 4
axes[1].set(ylim=(np.min(states[f"{band_name}_annual"]) - extra, np.max(states[f"{band_name}_annual"]) + extra))
sns.lineplot(x="dates_formal", y=f"{band_name}_annual", data=states, ax=axes[1], color="orange")
axes[1].set(ylabel=f"Annual cycle")
extra = (np.max(states[f"{band_name}_semiannual"]) - np.min(states[f"{band_name}_semiannual"])) / 4
axes[2].set(ylim=(np.min(states[f"{band_name}_semiannual"]) - extra, np.max(states[f"{band_name}_semiannual"]) + extra))
sns.lineplot(x="dates_formal", y=f"{band_name}_semiannual", data=states, ax=axes[2], color="orange")
axes[2].set(ylabel=f"Semi-annual cycle")
data_clean = data_df[(data_df["qa"] == 0) | (data_df['qa'] == 1)].copy() # CCDC also processes water pixels
formal_dates = [pd.Timestamp.fromordinal(int(row)) for row in data_clean["dates"]]
data_clean.loc[:, "dates_formal"] = formal_dates # convert ordinal dates to calendar
axes[3].plot(
'dates_formal', variable_name, 'go',
markersize=default_plot_kwargs['marker_size'],
alpha=default_plot_kwargs['marker_alpha'],
data=data_clean
)
if f"{band_name}_trimodal" in list(states.columns):
states["General"] = states[f"{band_name}_annual"] + states[f"{band_name}_trend"] + states[f"{band_name}_semiannual"]+ states[f"{band_name}_trimodal"]
else:
states["General"] = states[f"{band_name}_annual"] + states[f"{band_name}_trend"] + states[f"{band_name}_semiannual"]
g = sns.lineplot(
x="dates_formal", y="General", data=states, label="fit", ax=axes[3], color="orange"
)
axes[3].set_ylabel(f"{variable_name}", fontsize=default_plot_kwargs['font_size'])
axes[3].set_title(title, fontweight="bold", size=16 , pad=2)
band_values = data_df[data_df['qa'] == 0][variable_name]
q01, q99 = np.quantile(band_values, [0.01, 0.99])
extra = (q99 - q01) * 0.4
ylim_low = q01 - extra
ylim_high = q99 + extra
axes[3].set(ylim=(ylim_low, ylim_high))
data_copy = data.copy()
# Multiplying NDVI by 10000
data_copy.NDVI = data_copy.NDVI.multiply(10000)
dates, ndvi, qas = data_copy.to_numpy().astype(np.int64).copy().T
# Turn on the state output by setting state_intervaldays as non-zero value
sccd_results, states = sccd_detect_flex(dates, ndvi, qas, lam=20, state_intervaldays=1)
# Set up plotting style
sns.set(style="darkgrid")
sns.set_context("notebook")
# Create figure and axes
fig, axes = plt.subplots(4, 1, figsize=[12, 10], sharex=True)
plt.subplots_adjust(left=0.08, right=0.98, top=0.92, bottom=0.1)
display_sccd_states_flex(data_df=data_copy, states=states, axes=axes, variable_name="NDVI", title="S-CCD")
.. image:: 5_state_analysis_greenning&precipitation_coarse_files/5_state_analysis_greenning&precipitation_coarse_3_0.png
Multiple studies revealed that our earth has experienced a slowdown of
the vegetation greening [2, 3]. We also visually confirm the same
slowdown trend from ``Trend`` component.
*[2] Feng, X., Fu, B., Zhang, Y., Pan, N., Zeng, Z., Tian, H., … &
Penuelas, J. (2021). Recent leveling off of vegetation greenness and
primary production reveals the increasing soil water limitations on the
greening Earth. Science Bulletin, 66(14), 1462-1471.*
*[3] Ren, Y., Wang, H., Yang, K., Li, W., Hu, Z., Ma, Y., & Qiao, S.
(2024). Vegetation productivity slowdown on the Tibetan Plateau around
the late 1990s. Geophysical Research Letters, 51(4), e2023GL103865.*
Finding the turning point
~~~~~~~~~~~~~~~~~~~~~~~~~
I will use ``kneed`` package to automatically locate the timepoint that
the slowdown trend starts for this MODIS pixel. The ``kneed`` package
found the the ``knee`` of a curve based upon the maximum curvature where
a curve changes from a steep slope to a flatter one.
.. code:: ipython3
from kneed import KneeLocator
# find the minimum NDVI trend value
istart = np.where(states['b0_trend'].to_numpy()==np.min(states['b0_trend'].to_numpy()))[0][0]
# Detecting the knee between the minimum NDVI to the maximum
knee = KneeLocator(states['dates'].to_numpy()[istart:], states['b0_trend'].to_numpy()[istart:], direction="increasing")
xknee = states['dates'].to_numpy()[istart:][np.argmax(knee.y_difference)]
yknee = states['b0_trend'].to_numpy()[istart:][np.argmax(knee.y_difference)]
sns.set(style="darkgrid")
sns.set_context("notebook")
fig, ax = plt.subplots(figsize=(14, 6))
formal_dates = [pd.Timestamp.fromordinal(int(row)) for row in states["dates"]]
states.loc[:, "dates_formal"] = formal_dates
sns.lineplot(data=states, ax=ax, x='dates_formal', y='b0_trend', legend=False, color='orange', palette='colorblind').set_title("The turning point of greening")
ax.scatter(pd.Timestamp.fromordinal(int(xknee)), yknee , s=80, marker='o', facecolors='none', edgecolors='#1f77b4')
print(f"The knee date (i.e., when NDVI saturates) is {pd.Timestamp.fromordinal(int(xknee))}")
print(f"The knee NDVI is {yknee/10000}")
.. code:: text
:class: output-block
The knee date (i.e., when NDVI saturates) is 2005-03-06 00:00:00
The knee NDVI is 0.8471548930431367
.. image:: 5_state_analysis_greenning&precipitation_coarse_files/5_state_analysis_greenning&precipitation_coarse_5_1.png
Summary 1
~~~~~~~~~
S-CCD provides an innovative framework for **decomposing satellite-based
time series into multiple interpretable components** while explicitly
accounting for temporal breaks. By separating the signal into trend, and
seasonal components, S-CCD enables a clearer understanding of ecosystem
dynamics under various disturbance and recovery regimes. The extracted
trend component is particularly informative, as it can reveal subtle or
gradual shifts in vegetation or biophysical conditions that are often
masked by strong seasonal fluctuations or short-term noise. Such
long-term trend changes frequently originate from external drivers such
as climatic variability, human activities, or natural disturbances. By
effectively suppressing seasonal and irregular variations, S-CCD
enhances the detectability of these underlying changes, offering a more
robust and interpretable view of ecosystem trajectories over time.
Precipitation seasonality
-------------------------
The seasonality amplitude of the annual precipitation cycle refers to
the strength of the seasonal contrast. It has been reported that the
seasonality amplitude is being amplified as global warming [4].
indicating a greater difference between the wettest and driest periods.
But this trend might be regional, requiring careful examination.
*[4] Wang, X., Luo, M., Song, F., Wu, S., Chen, Y. D., & Zhang, W.
(2024). Precipitation seasonality amplifies as earth warms. Geophysical
Research Letters, 51(10), e2024GL109132.*
The below is an example that we will evaluate this trend for a pixel in
Arctic. Let’s quickly plot the time series for the first step:
.. code:: ipython3
sns.set_theme(style="darkgrid")
sns.set_context("notebook")
# Create figure and axes
plt.subplots_adjust(left=0.08, right=0.98, top=0.92, bottom=0.1)
TUTORIAL_DATASET = (pathlib.Path.cwd() / 'datasets').resolve() # modify it as you need
assert TUTORIAL_DATASET.exists()
in_path = TUTORIAL_DATASET/ '5_precip_gpcp.csv' # read the MPB-affected plot in CO
# read example csv for HLS time series
data = pd.read_csv(in_path)
# as the original data doesn't have qa, we append qa as all zeros value (meaning they are all clear)
data['qa'] = np.zeros(data.shape[0])
formal_dates = [parser.parse(row) for row in data["time"]]
data.loc[:, "dates_formal"] = formal_dates
# convert formal to ordinaldates
ordinal_dates = [pd.Timestamp.toordinal(parser.parse(row)) for row in data["time"]]
data.loc[:, "time"] = ordinal_dates
fig, axes = plt.subplots(figsize=[12, 4])
axes.plot(
'dates_formal', "precip", 'go',
data=data
)
plt.ylabel("Precip")
plt.xlabel("Date")
.. code:: text
:class: output-block
Text(0.5, 0, 'Date')
.. code:: text
:class: output-block
.. image:: 5_state_analysis_greenning&precipitation_coarse_files/5_state_analysis_greenning&precipitation_coarse_7_2.png
From the above example, we did see the amplitude of the precipitation
seasonality increased since 2019-ish. In most of the high Arctic,
precipitation peaks in summer (largely as rain, due to open water and
more active moisture transport). Warming oceans and reduced sea ice
increase moisture flux into the atmosphere, enhancing summer
precipitation (especially rainfall), increasing the amplitude of the
seasonality. Let’s use S-CCD states to investigate the increasing
amplitude.
.. code:: ipython3
sns.set_theme(style="darkgrid")
sns.set_context("notebook")
# Create figure and axes
fig, axes = plt.subplots(4, 1, figsize=[12, 10], sharex=True)
plt.subplots_adjust(left=0.08, right=0.98, top=0.92, bottom=0.1)
# read example csv for HLS time series
TUTORIAL_DATASET = (pathlib.Path.cwd() / 'datasets').resolve() # modify it as you need
assert TUTORIAL_DATASET.exists()
in_path = TUTORIAL_DATASET/ '5_precip_gpcp.csv' # read the MPB-affected plot in CO
data = pd.read_csv(in_path)
# as the original data doesn't have qa, we append qa as all zeros value (meaning they are all clear)
data['qa'] = np.zeros(data.shape[0])
formal_dates = [parser.parse(row) for row in data["time"]]
data.loc[:, "dates_formal"] = formal_dates
# convert formal to ordinaldates
ordinal_dates = [pd.Timestamp.toordinal(parser.parse(row)) for row in data["time"]]
data.loc[:, "time"] = ordinal_dates
# scale precipitation by 1000
data.loc[:, "precip"] = data.loc[:, "precip"].apply(lambda x: x*1000)
# force column name to 'date' to let display_sccd_states work
data = data.rename(columns={'time': 'dates'})
dates, prep, qas = data.drop('dates_formal', axis=1).astype(np.int32).to_numpy().copy().T
sccd_result, states = sccd_detect_flex(dates, prep, qas, lam=20, state_intervaldays=1)
display_sccd_states_flex(data_df=data, states=states, axes=axes, variable_name="precip", title="precipitation * 1000")
.. image:: 5_state_analysis_greenning&precipitation_coarse_files/5_state_analysis_greenning&precipitation_coarse_9_0.png
Lambda
~~~~~~~~~~~
Unfortunately, we only observed a subtle increase in the amplitude of
the semi-annual cycle toward the end of the time series. Further
investigation can reveal that this pattern resulted from under-fitting
of extreme summer precipitation events, which led to the failure in
capturing the enhanced seasonality signal. To improve the S-CCD fitting
for this dataset, we decreased the Lambda to 0:
.. code:: ipython3
sns.set_theme(style="darkgrid")
sns.set_context("notebook")
# Create figure and axes
fig, axes = plt.subplots(4, 1, figsize=[12, 10], sharex=True)
plt.subplots_adjust(left=0.08, right=0.98, top=0.92, bottom=0.1)
# decrease lambda to 0
sccd_result, states = sccd_detect_flex(dates, prep, qas, lam=0, state_intervaldays=1)
display_sccd_states_flex(data_df=data, states=states, axes=axes, variable_name="precip", title="precipitation * 1000 (Lam=0)")
.. image:: 5_state_analysis_greenning&precipitation_coarse_files/5_state_analysis_greenning&precipitation_coarse_11_0.png
Now, after reaching a better model fit, you could see the increased
seasonality amplitude has been captured in both annual (the second
subfigure) and semi-annual (the third subfigure) cycle components
through enhancing the fitting degree of the curve for the observations.
We could sum up annual and semi-annual into a new component for the
general seasonal cycle for a better investigation:
.. code:: ipython3
def display_sccd_states_flex_adjusted(
data_df: pd.DataFrame,
states:pd.DataFrame,
axes: Axes,
variable_name: str,
title:str,
band_name:str = "b0",
plot_kwargs: Optional[Dict] = None
):
default_plot_kwargs: Dict[str, Union[int, float, str]] = {
'marker_size': 5,
'marker_alpha': 0.7,
'line_color': 'orange',
'font_size': 14
}
if plot_kwargs is not None:
default_plot_kwargs.update(plot_kwargs)
# convert ordinal dates to calendar
formal_dates = [pd.Timestamp.fromordinal(int(row)) for row in states["dates"]]
states.loc[:, "dates_formal"] = formal_dates
extra = (np.max(states[f"{band_name}_trend"]) - np.min(states[f"{band_name}_trend"])) / 4
axes[0].set(ylim=(np.min(states[f"{band_name}_trend"]) - extra, np.max(states[f"{band_name}_trend"]) + extra))
sns.lineplot(x="dates_formal", y=f"{band_name}_trend", data=states, ax=axes[0], color="orange")
axes[0].set(ylabel=f"Trend")
extra = (np.max(states[f"{band_name}_seasonality"]) - np.min(states[f"{band_name}_seasonality"])) / 4
axes[1].set(ylim=(np.min(states[f"{band_name}_seasonality"]) - extra, np.max(states[f"{band_name}_seasonality"]) + extra))
sns.lineplot(x="dates_formal", y=f"{band_name}_seasonality", data=states, ax=axes[1], color="orange")
axes[1].set(ylabel=f"Seasonal cycle")
data_clean = data_df[(data_df["qa"] == 0) | (data_df['qa'] == 1)].copy() # CCDC also processes water pixels
formal_dates = [pd.Timestamp.fromordinal(int(row)) for row in data_clean["dates"]]
data_clean.loc[:, "dates_formal"] = formal_dates # convert ordinal dates to calendar
axes[2].plot(
'dates_formal', variable_name, 'go',
markersize=default_plot_kwargs['marker_size'],
alpha=default_plot_kwargs['marker_alpha'],
data=data_clean
)
if f"{band_name}_trimodal" in list(states.columns):
states["General"] = states[f"{band_name}_annual"] + states[f"{band_name}_trend"] + states[f"{band_name}_semiannual"]+ states[f"{band_name}_trimodal"]
else:
states["General"] = states[f"{band_name}_annual"] + states[f"{band_name}_trend"] + states[f"{band_name}_semiannual"]
g = sns.lineplot(
x="dates_formal", y="General", data=states, label="fit", ax=axes[2], color="orange"
)
axes[2].set_ylabel(f"{variable_name}", fontsize=default_plot_kwargs['font_size'])
axes[2].set_title(title, fontweight="bold", size=16 , pad=2)
band_values = data_df[data_df['qa'] == 0][variable_name]
q01, q99 = np.quantile(band_values, [0.01, 0.99])
extra = (q99 - q01) * 0.4
ylim_low = q01 - extra
ylim_high = q99 + extra
axes[2].set(ylim=(ylim_low, ylim_high))
fig, axes = plt.subplots(3, 1, figsize=[12, 8], sharex=True)
plt.subplots_adjust(left=0.08, right=0.98, top=0.92, bottom=0.1)
sccd_result, states = sccd_detect_flex(dates, prep, qas, lam=0, state_intervaldays=1)
# sum up two components
states['b0_seasonality'] = states['b0_annual'] + states['b0_semiannual']
display_sccd_states_flex_adjusted(data_df=data, states=states, axes=axes, variable_name="precip", title="precipitation * 1000 (Lam=0)")
.. image:: 5_state_analysis_greenning&precipitation_coarse_files/5_state_analysis_greenning&precipitation_coarse_13_0.png
Summary 2
~~~~~~~~~
Further quantification of seasonality amplitude can be achieved by
calculating the difference between the maximum and minimum values of the
“Seasonal cycle” variable derived from the above steps. This approach
provides a more stable estimate of intra-annual variability compared
with directly differencing the maximum and minimum of the raw
observations. The key advantage lies in the fact that S-CCD incorporates
a Kalman filter during the fitting process, which effectively smooths
the time series and reduces the influence of noise, outliers, and
irregular sampling. As a result, the estimated amplitude better reflects
the underlying seasonal signal rather than being distorted by anomalous
measurements.
S-CCD model fit
---------------
Traditionally, curve fitting does not account for temporal breaks, which
can degrade model performance. S-CCD addresses this limitation by
providing three approaches for break-aware model fitting:
(1) directly summing all state components;
(2) applying LASSO regression using all observations within a segment
(``fitting_coefs=True``);
(3) adopting time-specific harmonic model coefficients estimated at the
last model through the Kalman filter (``fitting_coefs=False``).
In general, summing all states (Approach 1) offers the most effective
means of capturing local fluctuations in the time series and therefore
yields the lowest RMSE. Applying LASSO regression (Approach 2)
corresponds to the coefficient-generation strategy used in the
traditional CCDC algorithm and provides the best generalization
capability, as it outputs only eight coefficients that can serve as
shape parameters for machine learning inputs. Using time-specific
harmonic model coefficients (Approach 3) reflects model behavior only at
the most recent observations and may lead to underfitting in earlier
portions of the time series. This approach is particularly suitable for
near-real-time monitoring applications, where the primary focus is on
detecting and characterizing recent disturbances.
The below are the examples for comparing different fitting approaches
using the precipitation datasets:
.. code:: ipython3
from pyxccd.common import SccdOutput, cold_rec_cg, anomaly
from matplotlib.lines import Line2D
def display_sccd_result_single(
data: np.ndarray,
band_names: List[str],
band_index: int,
sccd_result: SccdOutput,
axe: Axes,
title: str = 'S-CCD',
states:Optional[pd.DataFrame] = None,
trimodal: bool = False,
anomaly:Optional[anomaly] = None,
plot_kwargs: Optional[Dict] = None
) -> Tuple[plt.Figure, List[plt.Axes]]:
"""
Compare COLD and SCCD change detection algorithms by plotting their results side by side.
This function takes time series remote sensing data, applies both COLD and SCCD algorithms,
and visualizes the curve fitting and break detection results for comparison.
Parameters:
-----------
data : np.ndarray
Input data array with shape (n_observations, n_bands + 2) where:
- First column: ordinal dates (days since January 1, AD 1)
- Next n_bands columns: spectral band values
- Last column: QA flags (0-clear, 1-water, 2-shadow, 3-snow, 4-cloud)
band_names : List[str]
List of band names corresponding to the spectral bands in the data (e.g., ['red', 'nir'])
band_index : int
1-based index of the band to plot (e.g., 0 for first band, 1 for second band)
sccd_result: SccdOutput
Output of sccd_detect
axe: Axes
An Axes object represents a single plot within that Figure
title: Str
The figure title. The default is "S-CCD"
trimodal: bool
indicate whether using trimodal
anomaly: anomaly, optional
The anomaly detection outputs
plot_kwargs : Dict, optional
Additional keyword arguments to pass to the display function. Possible keys:
- 'marker_size': size of observation markers (default: 5)
- 'marker_alpha': transparency of markers (default: 0.7)
- 'line_color': color of model fit lines (default: 'orange')
- 'font_size': base font size (default: 14)
Returns:
--------
Tuple[plt.Figure, List[plt.Axes]]
A tuple containing the matplotlib Figure object and a list of Axes objects
(top axis is COLD results, bottom axis is SCCD results)
"""
if trimodal:
n_coefs = 8
else:
n_coefs = 6
w = np.pi * 2 / 365.25
# Set default plot parameters
default_plot_kwargs: Dict[str, Union[int, float, str]] = {
'marker_size': 5,
'marker_alpha': 0.7,
'line_color': 'orange',
'font_size': 14
}
if plot_kwargs is not None:
default_plot_kwargs.update(plot_kwargs)
# Extract values with proper type casting
font_size = default_plot_kwargs.get('font_size', 14)
try:
title_font_size = int(font_size) + 2
except (TypeError, ValueError):
title_font_size = 16
# Clean and prepare data
data = data[np.all(np.isfinite(data), axis=1)]
data_df = pd.DataFrame(data, columns=['dates'] + band_names + ['qa'])
# Plot COLD results
w = np.pi * 2 / 365.25
slope_scale = 10000
# Prepare clean data for COLD plot
data_clean = data_df[(data_df['qa'] == 0) | (data_df['qa'] == 1)].copy()
data_clean = data_clean[(data_clean >= 0).all(axis=1) & (data_clean.drop(columns="dates") <= 10000).all(axis=1)]
calendar_dates = [pd.Timestamp.fromordinal(int(row)) for row in data_clean["dates"]]
data_clean.loc[:, 'dates_formal'] = calendar_dates
# Calculate y-axis limits
band_name = band_names[band_index]
band_values = data_clean[data_clean['qa'] == 0 | (data_clean['qa'] == 1)][band_name]
# band_values = band_values[band_values <10000]
q01, q99 = np.quantile(band_values, [0.01, 0.99])
extra = (q99 - q01) * 0.4
ylim_low = q01 - extra
ylim_high = q99 + extra
# Plot SCCD observations
axe.plot(
'dates_formal', band_name, 'go',
markersize=default_plot_kwargs['marker_size'],
alpha=default_plot_kwargs['marker_alpha'],
data=data_clean
)
# Plot SCCD segments
if states is not None:
if trimodal is True:
states['predicted'] = states['b0_trend']+states['b0_annual']+states['b0_semiannual']+states['b0_trimodal']
else:
states['predicted'] = states['b0_trend']+states['b0_annual']+states['b0_semiannual']
calendar_dates = [pd.Timestamp.fromordinal(int(row)) for row in states["dates"]]
states.loc[:, 'dates_formal'] = calendar_dates
g = sns.lineplot(
x="dates_formal", y="predicted",
data=states,
label="Model fit",
ax=axe,
color=default_plot_kwargs['line_color']
)
if g.legend_ is not None:
g.legend_.remove()
else:
for segment in sccd_result.rec_cg:
j = np.arange(segment['t_start'], segment['t_break'] + 1, 1)
if trimodal == True:
plot_df = pd.DataFrame(
{
'dates': j,
'trend': j * segment['coefs'][band_index][1] / slope_scale + segment['coefs'][band_index][0],
'annual': np.cos(w * j) * segment['coefs'][band_index][2] + np.sin(w * j) * segment['coefs'][band_index][3],
'semiannual': np.cos(2 * w * j) * segment['coefs'][band_index][4] + np.sin(2 * w * j) * segment['coefs'][band_index][5],
'trimodal': j * 0
})
else:
plot_df = pd.DataFrame(
{
'dates': j,
'trend': j * segment['coefs'][band_index][1] / slope_scale + segment['coefs'][band_index][0],
'annual': np.cos(w * j) * segment['coefs'][band_index][2] + np.sin(w * j) * segment['coefs'][band_index][3],
'semiannual': np.cos(2 * w * j) * segment['coefs'][band_index][4] + np.sin(2 * w * j) * segment['coefs'][band_index][5],
'trimodal': np.cos(3 * w * j) * segment['coefs'][band_index][6] + np.sin(3 * w * j) * segment['coefs'][band_index][7]
})
# Convert dates and plot model fit
calendar_dates = [pd.Timestamp.fromordinal(int(row)) for row in plot_df["dates"]]
plot_df.loc[:, 'dates_formal'] = calendar_dates
g = sns.lineplot(
x="dates_formal", y="predicted",
data=plot_df,
label="Model fit",
ax=axe,
color=default_plot_kwargs['line_color']
)
if g.legend_ is not None:
g.legend_.remove()
# Plot near-real-time projection for SCCD if available
if hasattr(sccd_result, 'nrt_mode') and (sccd_result.nrt_mode %10 == 1 or sccd_result.nrt_mode == 3 or sccd_result.nrt_mode %10 == 5):
recent_obs = sccd_result.nrt_model['obs_date_since1982'][sccd_result.nrt_model['obs_date_since1982']>0]
j = np.arange(
sccd_result.nrt_model['t_start_since1982'].item() + defaults['COMMON']['JULIAN_LANDSAT4_LAUNCH'],
recent_obs[-1].item()+ defaults['COMMON']['JULIAN_LANDSAT4_LAUNCH']+1,
1
)
if trimodal == True:
plot_df = pd.DataFrame(
{
'dates': j,
'trend': j * sccd_result.nrt_model['nrt_coefs'][band_index][1] / slope_scale + sccd_result.nrt_model['nrt_coefs'][band_index][0],
'annual': np.cos(w * j) * sccd_result.nrt_model['nrt_coefs'][band_index][2] + np.sin(w * j) * sccd_result.nrt_model['nrt_coefs'][band_index][3],
'semiannual': np.cos(2 * w * j) * sccd_result.nrt_model['nrt_coefs'][band_index][4] + np.sin(2 * w * j) * sccd_result.nrt_model['nrt_coefs'][band_index][5],
'trimodal': np.cos(3 * w * j) * sccd_result.nrt_model['nrt_coefs'][band_index][6] + np.sin(3 * w * j) * sccd_result.nrt_model['nrt_coefs'][band_index][7]
})
else:
plot_df = pd.DataFrame(
{
'dates': j,
'trend': j * sccd_result.nrt_model['nrt_coefs'][band_index][1] / slope_scale + sccd_result.nrt_model['nrt_coefs'][band_index][0],
'annual': np.cos(w * j) * sccd_result.nrt_model['nrt_coefs'][band_index][2] + np.sin(w * j) * sccd_result.nrt_model['nrt_coefs'][band_index][3],
'semiannual': np.cos(2 * w * j) * sccd_result.nrt_model['nrt_coefs'][band_index][4] + np.sin(2 * w * j) * sccd_result.nrt_model['nrt_coefs'][band_index][5],
'trimodal': j * 0
})
plot_df['predicted'] = plot_df['trend'] + plot_df['annual'] + plot_df['semiannual']+ plot_df['trimodal']
calendar_dates = [pd.Timestamp.fromordinal(int(row)) for row in plot_df["dates"]]
plot_df.loc[:, 'dates_formal'] = calendar_dates
g = sns.lineplot(
x="dates_formal", y="predicted",
data=plot_df,
label="Model fit",
ax=axe,
color=default_plot_kwargs['line_color']
)
if g.legend_ is not None:
g.legend_.remove()
# add manual legends
if anomaly is not None:
legend_elements = [Line2D([0], [0], label='Disturbance break', color='k'),
Line2D([0], [0], label='recovery break', color='r'),
Line2D([0], [0], marker='o', color="#EAEAF2",
markerfacecolor="#EAEAF2",markeredgecolor="black",
label='Disturbance anomalies', lw=0, markersize=8),
Line2D([0], [0], marker='o', color="#EAEAF2",
markerfacecolor="#EAEAF2",markeredgecolor="red",
label='Recovery anomalies', lw=0, markersize=8)]
else:
legend_elements = [Line2D([0], [0], label='Disturbance break', color='k'),
Line2D([0], [0], label='recovery break', color='r')]
axe.legend(handles=legend_elements, loc='upper left', prop={'size': 9})
# plot breaks
for i in range(len(sccd_result.rec_cg)):
# we used the sign of change magnitude to decide the category of the breaks
if sccd_result.rec_cg[i]['magnitude'] < 0:
axe.axvline(pd.Timestamp.fromordinal(sccd_result.rec_cg[i]['t_break']), color='k')
else:
axe.axvline(pd.Timestamp.fromordinal(sccd_result.rec_cg[i]['t_break']), color='r')
# plot anomalies if available
if anomaly is not None:
for i in range(len(anomaly.rec_cg_anomaly)):
pred_ref = np.asarray(
[
predict_ref(
anomaly.rec_cg_anomaly[i]["coefs"][0],
anomaly.rec_cg_anomaly[i]["obs_date_since1982"][i_conse]
+ defaults['COMMON']['JULIAN_LANDSAT4_LAUNCH'], num_coefficients=n_coefs
) for i_conse in range(3)
]
)
cm = anomaly.rec_cg_anomaly[i]["obs"][0, 0: 3] - pred_ref
# gpp increase is black line
if np.median(cm) > 0:
yc = data[data[:,0] == anomaly.rec_cg_anomaly[i]['t_break']][0][1]
axe.plot(pd.Timestamp.fromordinal(anomaly.rec_cg_anomaly[i]['t_break']), yc,'ro',fillstyle='none',markersize=8)
# gpp decrease is red line
else:
yc = data[data[:,0] == anomaly.rec_cg_anomaly[i]['t_break']][0][1]
axe.plot(pd.Timestamp.fromordinal(anomaly.rec_cg_anomaly[i]['t_break']), yc,'ko',fillstyle='none',markersize=8)
axe.set_ylabel(f"{band_name}", fontsize=default_plot_kwargs['font_size'])
# Handle tick params with type safety
tick_font_size = default_plot_kwargs['font_size']
if isinstance(tick_font_size, (int, float)):
axe.tick_params(axis='x', labelsize=int(tick_font_size)-1)
else:
axe.tick_params(axis='x', labelsize=13) # fallback
axe.set(ylim=(ylim_low, ylim_high))
axe.set_xlabel("", fontsize=6)
# Format spines
for spine in axe.spines.values():
spine.set_edgecolor('black')
title_font_size = int(font_size) + 2 if isinstance(font_size, (int, float)) else 16
axe.set_title(title, fontweight="bold", size=title_font_size, pad=2)
fig, axes = plt.subplots(3, 1, figsize=(12, 11))
plt.subplots_adjust(hspace=0.4)
# fitting using the state output
sccd_result1, states = sccd_detect_flex(dates, prep, qas, lam=0, state_intervaldays=1)
display_sccd_result_single(data=data[['dates', 'precip', 'qa']].to_numpy().astype(np.int64), band_names=['precip'], band_index=0, sccd_result=sccd_result1, axe=axes[0], trimodal=False, states=states, title=f"Fitting using states")
# fitting using lasso regression
sccd_result2= sccd_detect_flex(dates, prep, qas, lam=0, fitting_coefs=True)
display_sccd_result_single(data=data[['dates', 'precip', 'qa']].to_numpy().astype(np.int64), band_names=['precip'], band_index=0, sccd_result=sccd_result2, axe=axes[1], trimodal=False, title=f"Fitting using Lasso regression")
# fitting using the last model cofficients from Kalman filter
sccd_result3= sccd_detect_flex(dates, prep, qas, lam=0, fitting_coefs=False)
display_sccd_result_single(data=data[['dates', 'precip', 'qa']].to_numpy().astype(np.int64), band_names=['precip'], band_index=0, sccd_result=sccd_result2, axe=axes[2], trimodal=False, title=f"Fitting using the last Kalman filter model")
.. image:: 5_state_analysis_greenning&precipitation_coarse_files/5_state_analysis_greenning&precipitation_coarse_16_0.png