hydrobot package

Submodules

hydrobot.data_acquisition module

Main module.

hydrobot.data_acquisition.config_yaml_import(file_name: str)[source]

Import config.yaml.

Parameters:

file_name (str) – Path to config.yaml

Returns:

For inputting into processor processing_parameters

Return type:

dict

hydrobot.data_acquisition.convert_inspection_expiry(processing_parameters)[source]

Interpret inspection_expiry dict as pd.DateOffset.

Parameters:

processing_parameters (dict)

Returns:

processing_parameters with inspection_expiry converted to pd.DateOffset

Return type:

dict

hydrobot.data_acquisition.enforce_site_in_hts(hts: Hilltop, site: str)[source]

Raise exception if site not in Hilltop file.

hydrobot.data_acquisition.get_data(base_url, hts, site, measurement, from_date, to_date, tstype='Standard')[source]

Acquire time series data from a web service and return it as a DataFrame.

Parameters:
  • base_url (str) – The base URL of the web service.

  • hts (str) – The Hilltop Time Series (HTS) identifier.

  • site (str) – The site name or location.

  • measurement (str) – The type of measurement to retrieve.

  • from_date (str) – The start date and time for data retrieval in the format ‘YYYY-MM-DD HH:mm’.

  • to_date (str) – The end date and time for data retrieval in the format ‘YYYY-MM-DD HH:mm’.

  • tstype (str) – Type of data that is sought (default is Standard, can be Standard, Check, or Quality)

Returns:

  • xml.etree.ElementTree – An XML tree containing the acquired time series data.

  • [DataSourceBlob] – XML tree parsed to DataSourceBlobs

hydrobot.data_acquisition.get_depth_profiles(base_url, hts, site, measurement, from_date, to_date, tstype='Standard') [<class 'pandas.Series'>][source]

Call hilltop server for depth profiles.

Parameters:
  • base_url (str) – The base URL of the web service.

  • hts (str) – The Hilltop Time Series (HTS) identifier.

  • site (str) – The site name or location.

  • measurement (str) – The type of measurement to retrieve.

  • from_date (str | pd.Timestamp) – The start date and time for data retrieval in the format ‘YYYY-MM-DD HH:mm’.

  • to_date (str | pd.Timestamp) – The end date and time for data retrieval in the format ‘YYYY-MM-DD HH:mm’.

  • tstype (str) – Type of data that is sought (default ‘Standard’, can be Standard, Check, or Quality)

Returns:

A list of pandas series each giving a depth profile.

Return type:

[pandas.Series]

Raises:

KeyError – if there is no measurement for the given parameters

hydrobot.data_acquisition.get_server_dataframe(base_url, hts, site, measurement, from_date, to_date, tstype='Standard') DataFrame[source]

Call hilltop server and transform to pd.DataFrame.

Parameters:
  • base_url (str) – The base URL of the web service.

  • hts (str) – The Hilltop Time Series (HTS) identifier.

  • site (str) – The site name or location.

  • measurement (str) – The type of measurement to retrieve.

  • from_date (str | pd.Timestamp) – The start date and time for data retrieval in the format ‘YYYY-MM-DD HH:mm’.

  • to_date (str | pd.Timestamp) – The end date and time for data retrieval in the format ‘YYYY-MM-DD HH:mm’.

  • tstype (str) – Type of data that is sought (default ‘Standard’, can be Standard, Check, or Quality)

Returns:

A dataframe containing the acquired time series data.

Return type:

pandas.DataFrame

Raises:

KeyError – if there is no measurement for the given parameters

hydrobot.data_acquisition.get_time_range(base_url, hts, site, measurement, tstype='Standard')[source]

Acquire time series data from a web service and return it as a DataFrame.

Parameters:
  • base_url (str) – The base URL of the web service.

  • hts (str) – The Hilltop Time Series (HTS) identifier.

  • site (str) – The site name or location.

  • measurement (str) – The type of measurement to retrieve.

  • tstype (str) – Type of data that is sought (default is Standard, can be Standard, Check, or Quality)

Returns:

  • Element – XML element from the server call

  • [DataSourceBlob] – A list of DataSourceBlobs corresponding to all measurements contained in the acquired time series data.

hydrobot.data_sources module

Handling for different types of data sources.

class hydrobot.data_sources.DissolvedOxygenQualityCodeEvaluator(qc_500_limit, qc_600_limit, qc_500_percent, qc_600_percent, constant_check_shift=0)[source]

Bases: QualityCodeEvaluator

QualityCodeEvaluator for DO NEMS.

Constant error plus percentage error.

find_qc(base_datum, check_datum)[source]

Find the base quality codes for DO.

Parameters:
  • base_datum (numerical) – Closest continuum datum point to the check

  • check_datum (numerical) – The check data to verify the continuous data, shifted by any constant_check_shift

Returns:

The Quality code

Return type:

int

class hydrobot.data_sources.QualityCodeEvaluator(qc_500_limit, qc_600_limit, constant_check_shift=0)[source]

Bases: object

Basic QualityCodeEvaluator only compares magnitude of differences.

find_qc(base_datum, check_datum)[source]

Find the base quality codes.

Parameters:
  • base_datum (numerical) – Closest continuum datum point to the check

  • check_datum (numerical) – The check data to verify the continuous data, shifted by any constant_check_shift

Returns:

The Quality code

Return type:

int

class hydrobot.data_sources.TwoLevelQualityCodeEvaluator(qc_500_limit, qc_600_limit, qc_500_percent, qc_600_percent, limit_percent_threshold, constant_check_shift=0)[source]

Bases: QualityCodeEvaluator

QualityCodeEvaluator for standards such as water level.

Fixed error up to given threshold, percentage error after that.

find_qc(base_datum, check_datum)[source]

Find the base quality codes with two stages.

The two stages are: a flat and percentage QC threshold.

Parameters:
  • base_datum (numerical) – Closest continuum datum point to the check

  • check_datum (numerical) – The check data to verify the continuous data, shifted by any constant_check_shift

Returns:

The Quality code

Return type:

int

class hydrobot.data_sources.UncheckedQualityCodeEvaluator[source]

Bases: QualityCodeEvaluator

QualityCodeEvaluator for data without checks.

Returns 200 for QC.

find_qc(base_datum, check_datum)[source]

Return 200 quality code.

Parameters:
  • base_datum (numerical) – Closest continuum datum point to the check

  • check_datum (numerical) – The check data to verify the continuous data, shifted by any constant_check_shift

Returns:

The Quality code 200

Return type:

int

class hydrobot.data_sources.With200QualityCodeEvaluator(qc_500_limit, qc_600_limit, qc_400_limit, constant_check_shift=0)[source]

Bases: QualityCodeEvaluator

For standard quality code evaluators that also have QC200 data.

Examples: pH and Conductivity.

find_qc(base_datum, check_datum)[source]

Find the base quality codes.

Parameters:
  • base_datum (numerical) – Closest continuum datum point to the check

  • check_datum (numerical) – The check data to verify the continuous data, shifted by any constant_check_shift

Returns:

The Quality code

Return type:

int

hydrobot.data_sources.depth_check_measurement_name_by_data_family(data_family, depth)[source]

Return check measurement name for the data family at depth.

Many data sources have separate measurement name formats for lake sampling, so this maps the data_family/depth to the appropriate check measurement name

Parameters:
  • data_family (str) – data family to find check measurement name for

  • depth (int) – depth of the measurement, in mm

Returns:

The check measurement name

Return type:

str

hydrobot.data_sources.depth_standard_measurement_name_by_data_family(data_family, depth)[source]

Return standard measurement name for the data family at depth.

Many data sources have separate measurement name formats for lake sampling, so this maps the data_family/depth to the appropriate standard measurement name

Parameters:
  • data_family (str) – data family to find standard measurement name for

  • depth (int) – depth of the measurement, in mm

Returns:

The standard measurement name

Return type:

str

hydrobot.data_sources.get_qc_evaluator(family: str)[source]

Get QC evaluator from data family name.

hydrobot.data_sources.hilltop_export(file_location: str, site_name: str, std_series: Series, check_series: Series, qc_series: Series)[source]

Export the 3 main series to csv files ready to import into hilltop.

Parameters:
  • file_location (str) – Where the files are exported to

  • site_name (str) – Site name

  • std_series (pd.Series) – Standard series

  • check_series (pd.Series) – Check series

  • qc_series (pd.Series) – Quality code series

Return type:

None, but makes files

hydrobot.data_sources.series_export_to_csv(file_location: str, series: list[Series]) None[source]

Export the 3 main series to csv.

Parameters:
  • file_location (str) – Where the files are exported to

  • series (pd.Series) – Pandas series to be exported

Return type:

None, but makes files

hydrobot.evaluator module

Tools for checking quality and finding problems in the data.

hydrobot.evaluator.base_data_meets_qc(std_series, qc_series, target_qc)[source]

Find all data where QC targets are met.

Returns only the base series data for which the next date in the qc_filter is equal to target_qc

Parameters:
  • std_series (pandas.Series) – Data to be filtered

  • qc_series (pandas.Series) – quality code data series, some of which are presumably target_qc

  • target_qc (int) – target quality code

Returns:

Filtered data

Return type:

pandas.Series

hydrobot.evaluator.base_data_qc_filter(std_series, qc_filter)[source]

Filter out data based on quality code filter.

Return only the base series data for which the next date in the qc_filter is ‘true’

Parameters:
  • std_series (pandas.Series) – Data to be filtered

  • qc_filter (pandas.Series of booleans) – Dates for which some condition is met or not

Returns:

The filtered data

Return type:

pandas.Series

hydrobot.evaluator.bulk_downgrade_out_of_validation(qc_frame: DataFrame, check_series: Series, interval_dict: dict, day_end_rounding: bool = True)[source]

Applies caps on quality codes for any data that has gaps between check data that is too large.

Utilises single_downgrade_out_of_validation multiple times for different time periods.

Parameters:
  • qc_frame (pd.DataFrame) – Quality series that potentially needs downgrading

  • check_series (pd.Series) – Check series to check for frequency of checks

  • interval_dict (dict) – Key:Value pairs of max_interval:downgraded_qc for single_downgrade_out_of_validation

  • day_end_rounding (bool) – Whether to round to the day end. If true, downgraded data starts at midnight

Returns:

The qc_frame with any downgraded QCs added in

Return type:

pd.DataFrame

hydrobot.evaluator.cap_qc_where_std_high(std_frame, qc_frame, cap_qc, cap_threshold)[source]

Cap the quality code of data where the standard series exceeds some value.

Parameters:
  • std_frame (pd.DataFrame)

  • qc_frame (pd.DataFrame)

  • cap_qc (numeric)

  • cap_threshold (numeric)

Returns:

the qc series to return

Return type:

pd.DataFrame

hydrobot.evaluator.check_data_quality_code(series: Series, check_series: Series, qc_evaluator: QualityCodeEvaluator, gap_limit=10800) DataFrame[source]

Quality Code Check Data.

Quality codes data based on the difference between the standard series and the check data

Parameters:
  • series (pd.Series) – Data to be quality coded

  • check_series (pd.Series) – Check data - must not be empty

  • qc_evaluator (data_sources.QualityCodeEvaluator) – Handler for QC comparisons

  • gap_limit (integer (seconds)) – If the nearest real data point is more than this many seconds away, return 200

Returns:

The QC values of the series, indexed by the END time of the QC period

Return type:

pd.Series

hydrobot.evaluator.diagnose_data(std_series, check_series, qc_series, frequency)[source]

Return description of how much missing data, how much for each QC, etc.

This function feels like a mess, I’m sorry. The good news is that it is only a diagnostic, so feel free to change the hell out of it

Parameters:
  • std_series (pandas.Series) – processed base time series data

  • check_series (pandas.Series) – Check datatime series

  • qc_series (pandas.Series) – QC time series

  • frequency (DateOffset or str) – Frequency to which the data gets set to

Returns:

Prints statements that describe the state of the data

Return type:

None

hydrobot.evaluator.find_nearest_time(series, dt)[source]

Find the time in the series that is closest to dt.

For example for the series:

pd.Timestamp("2021-01-01 02:00"): 0.0,
pd.Timestamp("2021-01-01 02:15"): 0.0,

with dt:

pd.Timestamp("2021-01-01 02:13"): 0.0,

the result should be the closer pd.Timestamp("2021-01-01 02:15") value

Parameters:
  • series (pd.Series) – The series indexed by time

  • dt (Datetime) – Time that may or may nor exactly line up with the series

Returns:

The value of dt rounded to the nearest timestamp of the series

Return type:

Datetime

hydrobot.evaluator.find_nearest_valid_time(series, dt) Timestamp[source]

Find the time in the series that is closest to dt, but ignoring NaN values (gaps).

Parameters:
  • series (pd.Series) – The series indexed by time

  • dt (Datetime) – Time that may or may nor exactly line up with the series

Returns:

The value of dt rounded to the nearest timestamp of the series

Return type:

Datetime

hydrobot.evaluator.gap_finder(data: Series) list[source]

Find the indices and lengths of gaps (sequences of NaN values) in a pandas Series.

Parameters:

data (pd.Series) – Input Series containing NaN values.

Returns:

List of tuples, each containing the index of a NaN value, the length of the gap containing it, and True for strictness.

Return type:

list

hydrobot.evaluator.max_qc_limiter(qc_frame: DataFrame, max_qc) DataFrame[source]

Enforce max_qc on a QC series.

Replaces all values with QCs above max_qc with max_qc

Parameters:
  • qc_frame (pd.DataFrame) – The series to be limited.

  • max_qc (numerical) – maximum allowed value. None imposes no limit.

Returns:

qc_frame with too high QCs limited to max_qc

Return type:

pd.DataFrame

hydrobot.evaluator.missing_data_quality_code(std_series, qc_data, gap_limit)[source]

Make sure that missing data is QC100.

Returns qc_frame with QC100 values added where std_series is NaN

Parameters:
  • std_series (pd.Series) – Base series which may contain NaNs

  • qc_data – QC series for base std_series without QC100 values

  • gap_limit – Maximum size of gaps which will be ignored

Returns:

The modified QC series, indexed by the start time of the QC period

Return type:

pd.Series

hydrobot.evaluator.single_downgrade_out_of_validation(qc_frame: DataFrame, check_series: Series, max_interval: DateOffset, downgraded_qc: int = 200, day_end_rounding: bool = True)[source]

Applies a cap on quality codes for any data that has gaps between check data that is too large.

Only applies a single cap quality code, see bulk_downgrade_out_of_validation for multiple steps.

Parameters:
  • qc_frame (pd.DataFrame) – Quality series that potentially needs downgrading

  • check_series (pd.Series) – Check series to check for frequency of checks

  • max_interval (pd.DateOffset) – How long of a gap between checks before the data gets downgraded

  • downgraded_qc (int) – Which code the quality data gets downgraded to

  • day_end_rounding (bool) – Whether to round to the day end. If true, downgraded data starts at midnight

Returns:

The qc_frame with any downgraded QCs added in

Return type:

pd.DataFrame

hydrobot.evaluator.small_gap_closer(series: Series, gap_limit: int) Series[source]

Remove small gaps from a series.

Gaps are defined by a sequential number of np.nan values Small gaps are defined as gaps of length gap_length or less.

Will return series with the nan values in the short gaps removed, and the long gaps untouched.

Parameters:
  • series (pandas.Series) – Data which has gaps to be closed

  • gap_limit (integer) – Maximum length of gaps removed, will remove all np.nan’s in consecutive runs of gap_length or less

Returns:

Data with any short gaps removed

Return type:

pandas.Series

hydrobot.evaluator.splitter(std_series, qc_series)[source]

Split the data up by QC code.

Selects all data which meets a given QC code, pads the rest with NaN values Does this for all current NEMs values ([0, 100, 200, 300, 400, 500, 600])

Parameters:
  • std_series (pd.Series) – Time series data to be split up

  • qc_series (pd.Series) – QC values to split the data by

Returns:

dict of int – Keys are the QC values as ints, values are series of data that fits

Return type:

pd.Series pairs

hydrobot.filters module

General filtering utilities.

hydrobot.filters.clip(unclipped: Series, low_clip: float, high_clip: float)[source]

Clip values in a pandas Series within a specified range.

Parameters:
  • unclipped (pandas.Series) – Input data to be clipped.

  • high_clip (float) – Upper bound for clipping. Values greater than this will be set to NaN.

  • low_clip (float) – Lower bound for clipping. Values less than this will be set to NaN.

Returns:

A Series containing the clipped values with the same index as the input Series.

Return type:

pandas.Series

hydrobot.filters.fbewma(input_data, span: int)[source]

Calculate the Forward-Backward Exponentially Weighted Moving Average (FBEWMA).

Parameters:
  • input_data (pandas.Series) – Input time series data to calculate the FBEWMA on.

  • span (int) – Span parameter for exponential weighting.

Returns:

A Series containing the FBEWMA values with the same index as the input Series.

Return type:

pandas.Series

hydrobot.filters.flatline_value_remover(series: Series, span: int = 3)[source]

Remove repeated (flatlined) values in a series.

Examines the data to see if any values are exactly repeated over a period. Where values exactly repeat it probably indicates a broken instrument. Replaces all values after the first with NaN. Uses math.isclose() to measure float “equality”

Parameters:
  • series (pd.Series) – Data to examine for flatlined values

  • span (int) – Amount of allowed repeated values in a row before duplicates are removed

Returns:

Data with the flatlined values replaced with np.nan

Return type:

pd.Series

hydrobot.filters.remove_one_spikes(input_data: Series, threshold_factor=3.0, window_size=5) Series[source]

Detect and remove single-point spikes in a time series.

A one-point spike is defined as a data point that deviates significantly from both its preceding and following points and the local trend. For the removal of more complex multi-spikes, use the remove_spikes() function.

NOTE: This function only works when baseline data is fairly stable. If baseline data is noisy or has high variability, use one_spike_filter_mad() instead.

Parameters:
  • input_data (pandas.Series) – The input time series data.

  • threshold_factor (float) – Multiplier for the standard deviation to define the spike threshold. Default is 3.0.

  • window_size (int) – The size of the rolling window to compute local statistics. Default is 5.

Returns:

filtered_data – The time series with one-point spikes removed (set to NaN).

Return type:

pandas.Series

hydrobot.filters.remove_one_spikes_mad(input_data: Series, threshold_factor=2.5) Series[source]

Detect and remove single-point spikes using Median Absolute Deviation (MAD).

A one-point spike is defined as a data point that deviates significantly from both its preceding and following points and the local trend. For the removal of more complex multi-spikes, use the remove_spikes() function.

NOTE: This function is more robust to noisy or variable baseline data than remove_one_spikes().

ALSO NOTE: This function is… not very good. I think I need to play with desmos a bit more to get a better thresholding mechanism.

Parameters:
  • input_data (pandas.Series) – The input time series data.

  • threshold_factor (float) – Multiplier for the MAD to define the spike threshold. Default is 2.5.

Returns:

filtered_data – The time series with one-point spikes removed (set to NaN).

Return type:

pandas.Series

hydrobot.filters.remove_outliers(input_data: Series, span: int, delta: float)[source]

Remove outliers.

Remove outliers from a time series by comparing it to the Forward-Backward Exponentially Weighted Moving Average (FBEWMA).

Parameters:
  • input_data (pandas.Series) – Input time series data.

  • span (int) – Span parameter for exponential weighting used in the FBEWMA.

  • delta (float) – Threshold for identifying outliers. Values greater than this threshold will be set to NaN.

Returns:

A Series containing the time series with outliers removed with the same index as the input Series.

Return type:

pandas.Series

hydrobot.filters.remove_range(input_series: Series | DataFrame, from_date: str | None, to_date: str | None, min_gap_length: int = 1, insert_gaps: str = 'none')[source]

Remove data from series in given range.

Returns the input series without data between from_date and to_date inclusive.

A None to_date will remove all data since the from_date (and vice versa). A double None for to_date/from_date removes all data.

Inserts gaps or not depending on insert_gaps

Parameters:
  • input_series (pd.Series | pd.DataFrame) – The series or dataframe to have a section removed

  • from_date (str | None) – Start of removed section

  • to_date (str | None) – End of removed section

  • min_gap_length (int) – Will insert gaps based on insert_gaps strategy if missing more data points than min_gap_length in a row.

  • insert_gaps (str) – If “all” will insert np.nan at every missing point. If “start” will insert np.nan only at from_date. If “end” will insert np.nan only at to_date. If “none” will insert no np.nan values, and remove all timestamps completely.

Returns:

The series with relevant slice removed

Return type:

pd.Series

hydrobot.filters.remove_spikes(input_data: Series, span: int, low_clip: float, high_clip: float, delta: float) Series[source]

Remove spikes.

Remove spikes from a time series data using a combination of clipping and interpolation.

Parameters:
  • input_data (pandas.Series) – Input time series data.

  • span (int) – Span parameter for exponential weighting used in outlier detection.

  • low_clip (float) – Lower bound for clipping. Values less than this will be set to NaN.

  • high_clip (float) – Upper bound for clipping. Values greater than this will be set to NaN.

  • delta (float) – Threshold for identifying outliers. Values greater than this threshold will be considered spikes.

Returns:

A Series containing the time series with spikes removed with the same index as the input Series.

Return type:

pandas.Series

hydrobot.filters.trim_series(std_series: Series, check_series: Series | Timestamp) Series[source]

Remove end of std series to match check series.

All data after the last entry in check_series is presumed to be unchecked, so that data is removed from the std_series

If check_series is empty, returns the entire std_series

Parameters:
  • std_series (pd.Series) – The series to be trimmed

  • check_series (pd.Series | pd.DataFrame | pd.Timestamp) – Indicates the end of the usable data

Returns:

std_series with the unchecked elements trimmed

Return type:

pd.Series

hydrobot.plotter module

Tools for displaying potentially problematic data.

hydrobot.plotter.add_qc_limit_bars(qc400, qc500, fig=None, **kwargs: int)[source]

Add horizontal lines to the plot for the QC limits.

Parameters:
  • qc400 (float) – The value of the QC400 limit

  • qc500 (float) – The value of the QC500 limit

  • fig (go.Figure) – The figure to add the horizontal lines to

  • kwargs (dict) – Additional arguments to pass to the lines

Return type:

go.Figure

hydrobot.plotter.plot_check_data(standard_series, check_data, constant_check_shift, tag_list=None, check_names=None, ghosts=False, diffs=False, align_checks=False, fig=None, rain_control=False, **kwargs: int)[source]

Plot the check data.

Parameters:
  • standard_series (pd.Series) – The series to be plotted

  • check_data (pd.DataFrame) – The data to be plotted on top of the standard data

  • constant_check_shift (float) – The shift between the check data and the standard data

  • tag_list (list[str]) – The tags of the check data

  • check_names (list[str]) – The names of the check data

  • ghosts (bool) – Whether to plot the check data where the timestamps are

  • diffs (bool) – Whether to plot the difference between the check data and the standard data

  • align_checks (bool) – Whether to align the check data to the standard data

  • fig (go.Figure) – The figure to add the plot to

  • rain_control (bool) – Adjustment for rain control plot

  • kwargs (dict) – Additional arguments to be passed to the plot

Return type:

go.Figure

hydrobot.plotter.plot_processing_overview_chart(standard_data, quality_data, check_data, constant_check_shift, qc_500_limit, qc_600_limit, tag_list=None, check_names=None, fig=None, rain_control=False, **kwargs)[source]

Plot the standard processing plot with small pcc chart underneath.

Parameters:
  • standard_data (pd.DataFrame) – The data to be plotted

  • quality_data (pd.DataFrame) – The quality data to be plotted

  • check_data (pd.DataFrame) – The check data to be plotted

  • constant_check_shift (float) – The shift between the check data and the standard data

  • qc_500_limit (float) – The value of the QC500 limit

  • qc_600_limit (float) – The value of the QC600 limit

  • tag_list (list[str]) – The tags of the check data

  • check_names (list[str]) – The names of the check data

  • fig (go.Figure, optional) – The figure to add the plot to, will make a new one if none

  • rain_control (bool) – Adjustment for rain control plot

  • kwargs (dict) – Additional arguments to pass to the plot

Return type:

go.Figure

hydrobot.plotter.plot_qc_codes(standard_series, quality_series, fig=None, **kwargs)[source]

Plot data with correct qc colour.

Parameters:
  • standard_series (pd.Series) – Data to be sorted by colour

  • quality_series (pd.Series) – Data to use to determine colour

  • fig (go.Figure | None, optional) – The figure to add info to, will make a figure if None

Return type:

go.Figure

hydrobot.plotter.plot_raw_data(raw_standard_series, fig=None, **kwargs: int)[source]

Plot the raw data with a grey line.

Parameters:
  • raw_standard_series (pd.Series) – The data to be plotted.

  • fig (go.Figure) – The figure to add the plot to

  • kwargs (dict) – Additional arguments to be passed to the plot

Return type:

go.Figure

hydrobot.plotter.qc_colour(qc)[source]

Give the colour of the QC.

Parameters:

qc (int) – Quality code

Returns:

Hex code for the colour of the QC

Return type:

String

hydrobot.processor module

Processor class.

class hydrobot.processor.Processor(self, base_url: str, site: str, standard_hts_filename: str, standard_measurement_name: str, frequency: str | None, data_family: str, from_date: str | None = None, to_date: str | None = None, check_hts_filename: str | None = None, check_measurement_name: str | None = None, defaults: dict | None = None, interval_dict: dict | None = None, constant_check_shift: float = 0, fetch_quality: bool = False, export_file_name: str | None = None, archive_base_url: str | None = None, archive_standard_hts_filename: str | None = None, archive_check_hts_filename: str | None = None, provisional_wq_filename: str | None = None, archive_standard_measurement_name: str | None = None, depth: float | None = None, infer_frequency: bool = True, **kwargs)[source]

Bases: object

Processor class for handling data processing.

_defaults

The default settings.

Type:

dict

_site

The site to be processed.

Type:

str

_standard_measurement_name

The standard measurement to be processed.

Type:

str

_check_measurement_name

The measurement to be checked.

Type:

str

_base_url

The base URL of the Hilltop server.

Type:

str

_standard_hts_filename

The standard Hilltop service.

Type:

str

_check_hts_filename

The Hilltop service to be checked.

Type:

str

_frequency

The frequency of the data.

Type:

str

_from_date

The start date of the data.

Type:

str

_to_date

The end date of the data.

Type:

str

_quality_code_evaluator

The quality code evaluator.

Type:

QualityCodeEvaluator

_interval_dict

Determines how data with old checks is downgraded.

Type:

dict

_standard_data

The standard series data.

Type:

pd.Series

_check_data

The series containing check data.

Type:

pd.Series

_quality_data

The quality series data.

Type:

pd.Series

standard_item_name

The name of the standard item.

Type:

str

standard_data_source_name

The name of the standard data source.

Type:

str

check_item_name

The name of the check item.

Type:

str

check_data_source_name

The name of the check data source.

Type:

str

export_file_name

Where the data is exported to. Used as default when exporting without specified

Type:

str

add_check(extra_check)[source]

Incorporate extra check data into the check series using utils.merge_series.

Parameters:

extra_check – extra check data

Return type:

None, but adds data to self.check_series

add_qc_limit_bars(fig=None, **kwargs)[source]

Implement plotting.add_qc_limit_bars.

add_quality(extra_quality)[source]

Incorporate extra quality data into the quality series using utils.merge_series.

Parameters:

extra_quality – extra quality data

Return type:

None, but adds data to self.quality_series

add_standard(extra_standard)[source]

Incorporate extra standard data into the standard series using utils.merge_series.

Parameters:

extra_standard – extra standard data

Return type:

None, but adds data to self.standard_data

property base_url

The base URL of the Hilltop server.

Type:

str

check_data

Decorate class methods to provide Annalist functionality.

Used as a decorator for class methods to provide Annalist logging. Unlike function_logger, this decorator preserves knowledge of the class instance of the method that it decorates, which can be used to log information that is only available at runtime.

This logger looks for input arguments instance that are named the same as custom fields in the formatter. If none such arguments are found, it looks for attributes on the parent class that match. If any are found, they are passed to Annalist to log them according to the formatter specification.

Examples

Class methods can be decorated with the ClassLogger to provide logging that preserves knowledge of the class instance. However, some linters have a difficult time understanding this syntax. For example, mypy does not like custom decorator on __init__, even though this is perfectly legal code. In this case, add the linter comment # type: ignore inline:

class MyClass:

@ClassLogger # type: ignore def __init__(self, prop1, …):

self._prop1 = prop1 …

It is also possible to decorate properties. These should be decorated on the setter, and not the @property. Once again, mypy is not a big fan of this syntax, so add the # type: ignore line if necessary:

@property
def prop1(self):
    return self._prop1

@ClassLogger  # type: ignore
@prop1.setter
def prop1(self, value):
    self._prop1 = value

Do not decorate the @property method itself. This creates an infinite loop, as the logger calls the property, which calls the property …

Normal methods, static methods, and class methods can be decorated as normal.

@ClassLogger def normal_method(self, arg):

@ClassLogger @staticmethod def static_method(arg):

@ClassLogger @classmethod def class_method(cls, arg):

I haven’t tried all the magic methods. __init__ works fine. __repr__ does not, it does the infinite loop thing.

property check_hts_filename

The Hilltop service to be checked.

Type:

str

clip(low_clip: float | None = None, high_clip: float | None = None)[source]

Clip data within specified low and high values.

Parameters:
  • low_clip (float or None, optional) – The lower bound for clipping, by default None. If None, the low clip value from the class defaults is used.

  • high_clip (float or None, optional) – The upper bound for clipping, by default None. If None, the high clip value from the class defaults is used.

Return type:

None

Notes

This method clips the data in both the standard and check series within the specified low and high values. It uses the filters.clip function for the actual clipping process.

Examples

>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1")
>>> processor.clip(low_clip=0, high_clip=100)
>>> processor.standard_data["Value"]
<clipped standard series within the specified range>
>>> processor.check_data["Value"]
<clipped check series within the specified range>
classmethod complete_yaml_parameters(config_path)[source]

Ensure a yaml holds all relevant parameters, filling in missing from/to dates.

data_exporter(file_location=None, ftype='xml', standard: bool = True, quality: bool = True, check: bool = True, trimmed=True)[source]

Export data to file.

Parameters:
  • file_location (str | None) – The file path where the file will be saved. If ‘ftype’ is “csv” or “xml”, this should be a full file path including extension. If ‘ftype’ is “hilltop_csv”, multiple files will be created, so ‘file_location’ should be a prefix that will be appended with “_std_qc.csv” for the file containing the standard and quality data, and “_check.csv” for the check data file. If None, uses self.export_file_name

  • ftype (str, optional) – Avalable options are “xml”, “hilltop_csv”, “csv”, “check”.

  • standard (bool, optional) – Whether standard data is exported, default true

  • check (bool, optional) – Whether check data is exported, default true

  • quality (bool, optional) – Whether quality data is exported, default true

  • trimmed (bool, optional) – If True, export trimmed data; otherwise, export the full data. Default is True.

Return type:

None

Raises:

ValueError

  • If ftype is not a recognised string

Notes

This method exports data to a CSV file.

Examples

>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1")
>>> processor.data_exporter("output.xml", trimmed=True)
>>> # Check the generated XML file at 'output.xml'
property defaults

The default settings.

Type:

dict

delete_range(from_date, to_date, tstype_standard=True, tstype_check=False, tstype_quality=False, gap_limit=None)[source]

Delete a range of data from specified time series types.

DEPRECATED: The use of this method is discouraged as it completely removes rows from the dataframes. User is encouraged to use ‘remove_range’ which marks rows for removal, but retains the timestamp to be associated with the other values in the row such as the raw value, reason for removal, etc.

Parameters:
  • from_date (str) – The start date of the range to delete.

  • to_date (str) – The end date of the range to delete.

  • tstype_standard (bool, optional) – Flag to delete data from the standard series, by default True.

  • tstype_check (bool, optional) – Flag to delete data from the check series, by default False.

  • tstype_quality (bool, optional) – Flag to delete data from the quality series, by default False.

  • gap_limit (int, optional) – How big missing data is required to insert a gap.

Return type:

None

Notes

This method deletes a specified range of data from the selected time series types. The range is defined by the from_date and to_date parameters.

Examples

>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1")
>>> processor.delete_range(from_date="2022-01-01", to_date="2022-12-31",                 tstype_standard=True)
>>> processor.standard_data
<standard series with specified range deleted>
>>> processor.delete_range(from_date="2022-01-01", to_date="2022-12-31",                 tstype_check=True)
>>> processor.check_data
<check series with specified range deleted>
diagnosis()[source]

Provide a diagnosis of the data.

Return type:

None

Notes

This method analyzes the state of the data, including the standard, check, and quality series. It provides diagnostic information about the data distribution, gaps, and other relevant characteristics.

Examples

>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1")
>>> processor.import_data()
>>> processor.diagnosis()
>>> # View diagnostic information about the data.
enforce_measurement_at_site(measurement_name, hilltop)[source]

Unimplemented test that measurement is in a given hilltop.

property frequency

The frequency of the data.

Type:

str

classmethod from_config_yaml(config_path, fetch_quality=False)[source]

Initialises a Processor class given a config file.

Parameters:
  • config_path (string) – Path to config.yaml.

  • fetch_quality (bool, optional) – Whether to fetch any existing quality data, default false

Return type:

Processor, Annalist

property from_date

The start date of the data.

Type:

str

classmethod from_processing_parameters_dict(processing_parameters, fetch_quality=False)[source]

Initialises a Processor class given a config file.

Parameters:
  • processing_parameters (dict) – Dictionary of processing parameters

  • fetch_quality (bool, optional) – Whether to fetch any existing quality data, default false

Return type:

Processor, Annalist

gap_closer(gap_limit: int | None = None)[source]

Close small gaps in the standard series.

DEPRECATED: The use of this method is discouraged as it completely removes rows from the dataframes. The gap closing functionality has been moved to data_exporter, where gaps are handled automatically before data export.

Parameters:

gap_limit (int or None, optional) – The maximum number of consecutive missing values to close, by default None. If None, the gap limit from the class defaults is used.

Return type:

None

Notes

This method closes small gaps in the standard series by replacing consecutive missing values with interpolated or backfilled values. The gap closure is performed using the evaluator.small_gap_closer function.

Examples

>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1")
>>> processor.gap_closer(gap_limit=5)
>>> processor.standard_data["Value"]
<updated standard series with closed gaps>
get_measurement_dataframe(measurement, hts_type)[source]

Get a dataframe of a given measurement for other processor parameters.

import_check(check_hts_filename: str | None = None, site: str | None = None, check_measurement_name: str | None = None, check_data_source_name: str | None = None, check_item_info: dict | None = None, check_item_name: str | None = None, check_data: DataFrame | None = None, from_date: str | None = None, to_date: str | None = None, base_url: str | None = None)[source]

Import Check data.

Parameters:
  • check_hts_filename (str or None, optional) – Where to get check data from

  • site (str or None, optional) – Which site to get data from

  • check_measurement_name (str or None, optional) – Name for measurement to get

  • check_data_source_name (str or None, optional) – Name for data source to get

  • check_item_info (dict or None, optional) – ItemInfo to be used in hilltop xml

  • check_item_name (str or None, optional) – ItemName to be used in hilltop xml

  • check_data (pd.DataFrame or None, optional) – data which just gets overwritten I think? should maybe be removed

  • from_date (str or None, optional) – The start date for data retrieval. If None, defaults to the earliest available data.

  • to_date (str or None, optional) – The end date for data retrieval. If None, defaults to latest available data.

  • base_url (str, optional) – Base of the url to use for the hilltop server request. Defaults to the Processor value.

Returns:

check_data

Return type:

pd.DataFrame

Raises:

TypeError – If the parsed Check data is not a pandas.DataFrame.

Notes

This method imports Check data from the specified server based on the provided parameters. It retrieves data using the data_acquisition.get_data function. The data is parsed and formatted according to the item_info in the data source.

Examples

>>> processor = Processor(...)  # initialize processor instance
>>> processor.import_check(
...     from_date='2022-01-01', to_date='2022-01-10', overwrite=True
... )
import_data(from_date: Timestamp | str | None = None, to_date: Timestamp | str | None = None, standard: bool = True, check: bool = True, quality: bool = True)[source]

Import data using the class parameter range.

Parameters:
  • from_date (str or None, optional) – start of data to be imported, if None will use defaults

  • to_date (str or None, optional) – end of data to be imported, if None will use defaults

  • standard (bool, optional) – Whether to import standard data, by default True.

  • check (bool, optional) – Whether to import check data, by default True.

  • quality (bool, optional) – Whether to import quality data, by default False.

Return type:

None

Notes

This method imports data for the specified date range, using the class parameters _from_date and _to_date. It updates the internal series data in the Processor instance for standard, check, and quality measurements separately.

Examples

>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1")
>>> processor.import_data("2022-01-01", "2022-12-31",standard=True, check=True)
False
import_quality(standard_hts_filename: str | None = None, site: str | None = None, standard_measurement_name: str | None = None, standard_data_source_name: str | None = None, quality_data: DataFrame | None = None, from_date: str | None = None, to_date: str | None = None, base_url: str | None = None)[source]

Import quality data.

Parameters:
  • standard_hts_filename (str or None, optional) – Where to get quality data from

  • site (str or None, optional) – Which site to get data from

  • standard_measurement_name (str or None, optional) – Name for measurement to get

  • standard_data_source_name (str or None, optional) – Name for data source to get

  • quality_data (pd.DataFrame or None, optional) – data which just gets overwritten I think? should maybe be removed

  • from_date (str or None, optional) – The start date for data retrieval. If None, defaults to the earliest available data.

  • to_date (str or None, optional) – The end date for data retrieval. If None, defaults to latest available data.

  • base_url (str, optional) – Base of the url to use for the hilltop server request. Defaults to the Processor value.

Return type:

pd.DataFrame

Raises:

TypeError – If the parsed Quality data is not a pandas.Series.

Notes

This method imports Quality data from the specified server based on the provided parameters. It retrieves data using the data_acquisition.get_data function and updates the Quality Series in the instance. The data is parsed and formatted according to the item_info in the data source.

Examples

>>> processor = Processor(...)  # initialize processor instance
>>> processor.import_quality(
...     from_date='2022-01-01', to_date='2022-01-10', overwrite=True
... )
import_standard(standard_hts_filename: str | None = None, site: str | None = None, standard_measurement_name: str | None = None, standard_data_source_name: str | None = None, standard_item_info: dict | None = None, standard_data: DataFrame | None = None, from_date: str | None = None, to_date: str | None = None, frequency: str | None = None, base_url: str | None = None, infer_frequency: bool | None = None)[source]

Import standard data.

Parameters:
  • standard_hts_filename (str or None, optional) – The standard Hilltop service. If None, defaults to the standard HTS.

  • site (str or None, optional) – The site to be processed. If None, defaults to the site on the processor object.

  • standard_measurement_name (str or None, optional) – The standard measurement to be processed. If None, defaults to the standard measurement name on the processor object.

  • standard_data_source_name (str or None, optional) – The name of the standard data source. If None, defaults to the standard data source name on the processor object.

  • standard_item_info (dict or None, optional) – The item information for the standard data. If None, defaults to the standard item info on the processor object.

  • standard_data (pd.DataFrame or None, optional) – The standard data. If None, makes an empty standard_data object

  • from_date (str or None, optional) – The start date for data retrieval. If None, defaults to the earliest available data.

  • to_date (str or None, optional) – The end date for data retrieval. If None, defaults to latest available data.

  • frequency (str or None, optional) – The frequency of the data. If None and infer_frequency, defaults to the frequency on the processor object. If that’s also None, self.infer_frequency is consulted to determine whether to infer the frequency from the data.

  • base_url (str or None, optional) – URL to look for hilltop server. Will use self.base_url if None.

  • infer_frequency (str or None, optional.) – Whether to look for frequency. Uses self.infer_frequency if None. If True and frequency is provided will issue a warning.

Returns:

The standard data

Return type:

pd.DataFrame

Raises:

ValueError

  • If no standard data is found within the specified date range.

TypeError

If the parsed Standard data is not a pandas.Series.

Notes

This method imports Standard data from the specified server based on the provided parameters. It retrieves data using the data_acquisition.get_data function and updates the Standard Series in the instance. The data is parsed and formatted according to the item_info in the data source.

Examples

>>> processor = Processor(...)  # initialize processor instance
>>> processor.import_standard(
...     from_date='2022-01-01', to_date='2022-01-10'
... )
interpolate_depth_profiles(depth: int | float, measurement: str, site: str | None = None, from_date: str | None | Timestamp = None, to_date: str | None | Timestamp = None)[source]

Looks up depth profile and find interpolates for given depth.

Parameters:
  • depth (numeric) – what depth to interpolate to, in meters

  • measurement (str) – measurement + data source name e.g. “Water Temperature (Depth Profile)”

  • site (str | None) – site to use to look for depth profiles, if none will use default

  • from_date (str | pd.Timestamp | None) – start of period to look for, if none will use

  • to_date (str | pd.Timestamp | None)

pad_data_with_nan_to_set_freq()[source]

Set the data to the correct frequency, filled with NaNs as appropriate.

Return type:

None

Notes

This method adjusts the time series data to the correct frequency, filling missing values with NaNs as appropriate. It modifies the standard series in-place.

Examples

>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1")
>>> processor.pad_data_with_nan_to_set_freq()
>>> processor.standard_data
<standard series with missing values filled with NaNs>
plot_check_data(tag_list=None, check_names=None, ghosts=False, diffs=False, align_checks=False, fig=None, **kwargs)[source]

Implement plotting.plot_qc_codes.

plot_processing_overview_chart(fig=None, **kwargs)[source]

Plot a processing overview chart.

Parameters:
  • fig (plotly.graph_objects.Figure, optional) – The figure to plot on, by default None.

  • kwargs (dict) – Additional keyword arguments to pass to the plot

Returns:

The figure with the processing overview chart.

Return type:

plotly.graph_objects.Figure

plot_qc_codes(fig=None, **kwargs)[source]

Implement plotting.plot_qc_codes.

plot_raw_data(fig=None, **kwargs)[source]

Implement plotting.plot_raw_data.

quality_code_evaluator

Decorate class methods to provide Annalist functionality.

Used as a decorator for class methods to provide Annalist logging. Unlike function_logger, this decorator preserves knowledge of the class instance of the method that it decorates, which can be used to log information that is only available at runtime.

This logger looks for input arguments instance that are named the same as custom fields in the formatter. If none such arguments are found, it looks for attributes on the parent class that match. If any are found, they are passed to Annalist to log them according to the formatter specification.

Examples

Class methods can be decorated with the ClassLogger to provide logging that preserves knowledge of the class instance. However, some linters have a difficult time understanding this syntax. For example, mypy does not like custom decorator on __init__, even though this is perfectly legal code. In this case, add the linter comment # type: ignore inline:

class MyClass:

@ClassLogger # type: ignore def __init__(self, prop1, …):

self._prop1 = prop1 …

It is also possible to decorate properties. These should be decorated on the setter, and not the @property. Once again, mypy is not a big fan of this syntax, so add the # type: ignore line if necessary:

@property
def prop1(self):
    return self._prop1

@ClassLogger  # type: ignore
@prop1.setter
def prop1(self, value):
    self._prop1 = value

Do not decorate the @property method itself. This creates an infinite loop, as the logger calls the property, which calls the property …

Normal methods, static methods, and class methods can be decorated as normal.

@ClassLogger def normal_method(self, arg):

@ClassLogger @staticmethod def static_method(arg):

@ClassLogger @classmethod def class_method(cls, arg):

I haven’t tried all the magic methods. __init__ works fine. __repr__ does not, it does the infinite loop thing.

quality_data

Decorate class methods to provide Annalist functionality.

Used as a decorator for class methods to provide Annalist logging. Unlike function_logger, this decorator preserves knowledge of the class instance of the method that it decorates, which can be used to log information that is only available at runtime.

This logger looks for input arguments instance that are named the same as custom fields in the formatter. If none such arguments are found, it looks for attributes on the parent class that match. If any are found, they are passed to Annalist to log them according to the formatter specification.

Examples

Class methods can be decorated with the ClassLogger to provide logging that preserves knowledge of the class instance. However, some linters have a difficult time understanding this syntax. For example, mypy does not like custom decorator on __init__, even though this is perfectly legal code. In this case, add the linter comment # type: ignore inline:

class MyClass:

@ClassLogger # type: ignore def __init__(self, prop1, …):

self._prop1 = prop1 …

It is also possible to decorate properties. These should be decorated on the setter, and not the @property. Once again, mypy is not a big fan of this syntax, so add the # type: ignore line if necessary:

@property
def prop1(self):
    return self._prop1

@ClassLogger  # type: ignore
@prop1.setter
def prop1(self, value):
    self._prop1 = value

Do not decorate the @property method itself. This creates an infinite loop, as the logger calls the property, which calls the property …

Normal methods, static methods, and class methods can be decorated as normal.

@ClassLogger def normal_method(self, arg):

@ClassLogger @staticmethod def static_method(arg):

@ClassLogger @classmethod def class_method(cls, arg):

I haven’t tried all the magic methods. __init__ works fine. __repr__ does not, it does the infinite loop thing.

quality_encoder(gap_limit: int | None = None, max_qc: int | float | None = None, interval_dict: dict | None = None)[source]

Encode quality information in the quality series.

Parameters:
  • gap_limit (int or None, optional) – The maximum number of consecutive missing values to consider as gaps, by default None. If None, the gap limit from the class defaults is used.

  • max_qc (numeric or None, optional) – Maximum quality code possible at site If None, the max qc from the class defaults is used.

  • interval_dict (dict or None, optional) – Dictionary that dictates when to downgrade data with old checks Takes pd.DateOffset:quality_code pairs If None, the interval_dict from the class defaults is used.

Return type:

None

Notes

This method encodes quality information in the quality series based on the provided standard series, check series, and measurement information. It uses the evaluator.quality_encoder function to determine the quality flags for the data.

Examples

>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1")
>>> processor.quality_encoder(gap_limit=5)
>>> processor.quality_data["Value"]
<updated quality series with encoded quality flags>
remove_flatlined_values(span: int = 3)[source]

Remove repeated values in std series a la flatline_value_remover().

remove_one_spikes(threshold_factor: float = 3.0, window_size: int = 5)[source]

Remove one-spikes from the data.

A one-point spike is defined as a data point that deviates significantly from both its preceding and following points and the local trend. For the removal of more complex multi-spikes, use the remove_spikes() function.

NOTE: This function only works when baseline data is fairly stable. If baseline data is noisy or has high variability, use one_spike_filter_mad() instead.

Parameters:
  • threshold_factor (float) – Multiplier for the standard deviation to define the spike threshold. Default is 3.0. Increasing this value makes the spike detection less sensitive.

  • window_size (int) – The size of the rolling window to compute local statistics. Default is 5. Increasing this value makes the spike detection less sensitive.

Return type:

None

Notes

This method removes spikes from the standard series using the specified parameters. It utilizes the filters.remove_one_spikes function for the actual spike removal process.

Examples

>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1")
>>> processor.remove_one_spikes(threshold_factor=3.0, window_size=5)
>>> processor.standard_data["Value"]
<standard series with spikes removed>
remove_one_spikes_mad(threshold_factor: float = 2.5)[source]

Remove one-spikes from the data using Median Absolute Deviation (MAD).

A one-point spike is defined as a data point that deviates significantly from both its preceding and following points and the local trend. For the removal of more complex multi-spikes, use the remove_spikes() function.

NOTE: This function is more robust to noisy or variable baseline data than remove_one_spikes().

Parameters:
  • input_data (pandas.Series) – The input time series data.

  • threshold_factor (float) – Multiplier for the MAD to define the spike threshold. Default is 2.5.

Return type:

None

Notes

This method removes spikes from the standard series using the specified parameters. It utilizes the filters.remove_one_spikes_mad function for the actual spike removal process.

Examples

>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1")
>>> processor.remove_one_spikes_mad(threshold_factor=2.5)
>>> processor.standard_data["Value"]
<standard series with spikes removed>
remove_outliers(span: int | None = None, delta: float | None = None)[source]

Remove outliers from the data.

Parameters:
  • span (int or None, optional) – The span parameter for smoothing, by default None. If None, the span value from the class defaults is used.

  • delta (float or None, optional) – The delta parameter for identifying outliers, by default None. If None, the delta value from the class defaults is used.

Return type:

None

Notes

This method removes outliers from the standard series using the specified span and delta values. It utilizes the filters.remove_outliers function for the actual outlier removal process.

Examples

>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1")
>>> processor.remove_outliers(span=10, delta=2.0)
>>> processor.standard_data["Value"]
<standard series with outliers removed>
remove_range(from_date, to_date)[source]

Mark a range in standard_data for removal.

Parameters:
  • from_date (str) – The start date of the range to delete.

  • to_date (str) – The end date of the range to delete.

Return type:

None

Notes

This method deletes a specified range of data from the selected time series types. The range is defined by the from_date and to_date parameters.

Examples

>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1")
>>> processor.remove_range(from_date="2022-01-01", to_date="2022-12-31",                 tstype_standard=True)
>>> processor.standard_data
<standard series with specified range deleted>
>>> processor.remove_range(from_date="2022-01-01", to_date="2022-12-31",                 tstype_check=True)
>>> processor.check_data
<check series with specified range deleted>
remove_spikes(low_clip: float | None = None, high_clip: float | None = None, span: int | None = None, delta: float | None = None)[source]

Remove spikes from the data.

Parameters:
  • low_clip (float or None, optional) – The lower clipping threshold, by default None. If None, the low_clip value from the class defaults is used.

  • high_clip (float or None, optional) – The upper clipping threshold, by default None. If None, the high_clip value from the class defaults is used.

  • span (int or None, optional) – The span parameter for smoothing, by default None. If None, the span value from the class defaults is used.

  • delta (float or None, optional) – The delta parameter for identifying spikes, by default None. If None, the delta value from the class defaults is used.

Return type:

None

Notes

This method removes spikes from the standard series using the specified parameters. It utilizes the filters.remove_spikes function for the actual spike removal process.

Examples

>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1")
>>> processor.remove_spikes(low_clip=10, high_clip=20, span=5, delta=2.0)
>>> processor.standard_data["Value"]
<standard series with spikes removed>
report_processing_issue(start_time=None, end_time=None, code=None, comment=None, series_type=None, message_type=None)[source]

Add an issue to be reported for processing usage.

This method adds an issue to the processing_issues DataFrame.

Parameters:
  • start_time (str | None) – The start time of the issue.

  • end_time (str | None) – The end time of the issue.

  • code (str | None) – The code of the issue.

  • comment (str | None) – The comment of the issue.

  • series_type (str | None) – The type of the series the issue is related to.

  • message_type (str | None) – Should be one of: [“debug”, “info”, “warning”, “error”]

property site

The site to be processed.

Type:

str

standard_data

Decorate class methods to provide Annalist functionality.

Used as a decorator for class methods to provide Annalist logging. Unlike function_logger, this decorator preserves knowledge of the class instance of the method that it decorates, which can be used to log information that is only available at runtime.

This logger looks for input arguments instance that are named the same as custom fields in the formatter. If none such arguments are found, it looks for attributes on the parent class that match. If any are found, they are passed to Annalist to log them according to the formatter specification.

Examples

Class methods can be decorated with the ClassLogger to provide logging that preserves knowledge of the class instance. However, some linters have a difficult time understanding this syntax. For example, mypy does not like custom decorator on __init__, even though this is perfectly legal code. In this case, add the linter comment # type: ignore inline:

class MyClass:

@ClassLogger # type: ignore def __init__(self, prop1, …):

self._prop1 = prop1 …

It is also possible to decorate properties. These should be decorated on the setter, and not the @property. Once again, mypy is not a big fan of this syntax, so add the # type: ignore line if necessary:

@property
def prop1(self):
    return self._prop1

@ClassLogger  # type: ignore
@prop1.setter
def prop1(self, value):
    self._prop1 = value

Do not decorate the @property method itself. This creates an infinite loop, as the logger calls the property, which calls the property …

Normal methods, static methods, and class methods can be decorated as normal.

@ClassLogger def normal_method(self, arg):

@ClassLogger @staticmethod def static_method(arg):

@ClassLogger @classmethod def class_method(cls, arg):

I haven’t tried all the magic methods. __init__ works fine. __repr__ does not, it does the infinite loop thing.

property standard_hts_filename

The standard Hilltop service.

Type:

str

property standard_measurement_name

The site to be processed.

Type:

str

property to_date

The end date of the data.

Type:

str

to_xml_data_structure(standard=True, quality=True, check=True)[source]

Convert Processor object data to a list of XML data structures.

Returns:

List of DataSourceBlob instances representing the data in the Processor object.

Return type:

list of data_structure.DataSourceBlob

Notes

This method converts the data in the Processor object, including standard, check, and quality series, into a list of DataSourceBlob instances. Each DataSourceBlob contains information about the site, data source, and associated data.

Examples

>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1")
>>> processor.import_data()
>>> xml_data_list = processor.to_xml_data_structure()
>>> # Convert Processor data to a list of XML data structures.

hydrobot.testicle module

Module contents

Top-level package for Hydro Processing Tools.