hydrobot package¶
Submodules¶
hydrobot.data_acquisition module¶
Main module.
- hydrobot.data_acquisition.config_yaml_import(file_name: str)[source]¶
Import config.yaml.
- Parameters:
file_name (str) – Path to config.yaml
- Returns:
For inputting into processor processing_parameters
- Return type:
dict
- hydrobot.data_acquisition.convert_inspection_expiry(processing_parameters)[source]¶
Interpret inspection_expiry dict as pd.DateOffset.
- Parameters:
processing_parameters (dict)
- Returns:
processing_parameters with inspection_expiry converted to pd.DateOffset
- Return type:
dict
- hydrobot.data_acquisition.enforce_site_in_hts(hts: Hilltop, site: str)[source]¶
Raise exception if site not in Hilltop file.
- hydrobot.data_acquisition.get_data(base_url, hts, site, measurement, from_date, to_date, tstype='Standard')[source]¶
Acquire time series data from a web service and return it as a DataFrame.
- Parameters:
base_url (str) – The base URL of the web service.
hts (str) – The Hilltop Time Series (HTS) identifier.
site (str) – The site name or location.
measurement (str) – The type of measurement to retrieve.
from_date (str) – The start date and time for data retrieval in the format ‘YYYY-MM-DD HH:mm’.
to_date (str) – The end date and time for data retrieval in the format ‘YYYY-MM-DD HH:mm’.
tstype (str) – Type of data that is sought (default is Standard, can be Standard, Check, or Quality)
- Returns:
xml.etree.ElementTree – An XML tree containing the acquired time series data.
[DataSourceBlob] – XML tree parsed to DataSourceBlobs
- hydrobot.data_acquisition.get_depth_profiles(base_url, hts, site, measurement, from_date, to_date, tstype='Standard') [<class 'pandas.Series'>][source]¶
Call hilltop server for depth profiles.
- Parameters:
base_url (str) – The base URL of the web service.
hts (str) – The Hilltop Time Series (HTS) identifier.
site (str) – The site name or location.
measurement (str) – The type of measurement to retrieve.
from_date (str | pd.Timestamp) – The start date and time for data retrieval in the format ‘YYYY-MM-DD HH:mm’.
to_date (str | pd.Timestamp) – The end date and time for data retrieval in the format ‘YYYY-MM-DD HH:mm’.
tstype (str) – Type of data that is sought (default ‘Standard’, can be Standard, Check, or Quality)
- Returns:
A list of pandas series each giving a depth profile.
- Return type:
[pandas.Series]
- Raises:
KeyError – if there is no measurement for the given parameters
- hydrobot.data_acquisition.get_server_dataframe(base_url, hts, site, measurement, from_date, to_date, tstype='Standard') DataFrame[source]¶
Call hilltop server and transform to pd.DataFrame.
- Parameters:
base_url (str) – The base URL of the web service.
hts (str) – The Hilltop Time Series (HTS) identifier.
site (str) – The site name or location.
measurement (str) – The type of measurement to retrieve.
from_date (str | pd.Timestamp) – The start date and time for data retrieval in the format ‘YYYY-MM-DD HH:mm’.
to_date (str | pd.Timestamp) – The end date and time for data retrieval in the format ‘YYYY-MM-DD HH:mm’.
tstype (str) – Type of data that is sought (default ‘Standard’, can be Standard, Check, or Quality)
- Returns:
A dataframe containing the acquired time series data.
- Return type:
pandas.DataFrame
- Raises:
KeyError – if there is no measurement for the given parameters
- hydrobot.data_acquisition.get_time_range(base_url, hts, site, measurement, tstype='Standard')[source]¶
Acquire time series data from a web service and return it as a DataFrame.
- Parameters:
base_url (str) – The base URL of the web service.
hts (str) – The Hilltop Time Series (HTS) identifier.
site (str) – The site name or location.
measurement (str) – The type of measurement to retrieve.
tstype (str) – Type of data that is sought (default is Standard, can be Standard, Check, or Quality)
- Returns:
Element – XML element from the server call
[DataSourceBlob] – A list of DataSourceBlobs corresponding to all measurements contained in the acquired time series data.
hydrobot.data_sources module¶
Handling for different types of data sources.
- class hydrobot.data_sources.DissolvedOxygenQualityCodeEvaluator(qc_500_limit, qc_600_limit, qc_500_percent, qc_600_percent, constant_check_shift=0)[source]¶
Bases:
QualityCodeEvaluatorQualityCodeEvaluator for DO NEMS.
Constant error plus percentage error.
- find_qc(base_datum, check_datum)[source]¶
Find the base quality codes for DO.
- Parameters:
base_datum (numerical) – Closest continuum datum point to the check
check_datum (numerical) – The check data to verify the continuous data, shifted by any constant_check_shift
- Returns:
The Quality code
- Return type:
int
- class hydrobot.data_sources.QualityCodeEvaluator(qc_500_limit, qc_600_limit, constant_check_shift=0)[source]¶
Bases:
objectBasic QualityCodeEvaluator only compares magnitude of differences.
- find_qc(base_datum, check_datum)[source]¶
Find the base quality codes.
- Parameters:
base_datum (numerical) – Closest continuum datum point to the check
check_datum (numerical) – The check data to verify the continuous data, shifted by any constant_check_shift
- Returns:
The Quality code
- Return type:
int
- class hydrobot.data_sources.TwoLevelQualityCodeEvaluator(qc_500_limit, qc_600_limit, qc_500_percent, qc_600_percent, limit_percent_threshold, constant_check_shift=0)[source]¶
Bases:
QualityCodeEvaluatorQualityCodeEvaluator for standards such as water level.
Fixed error up to given threshold, percentage error after that.
- find_qc(base_datum, check_datum)[source]¶
Find the base quality codes with two stages.
The two stages are: a flat and percentage QC threshold.
- Parameters:
base_datum (numerical) – Closest continuum datum point to the check
check_datum (numerical) – The check data to verify the continuous data, shifted by any constant_check_shift
- Returns:
The Quality code
- Return type:
int
- class hydrobot.data_sources.UncheckedQualityCodeEvaluator[source]¶
Bases:
QualityCodeEvaluatorQualityCodeEvaluator for data without checks.
Returns 200 for QC.
- find_qc(base_datum, check_datum)[source]¶
Return 200 quality code.
- Parameters:
base_datum (numerical) – Closest continuum datum point to the check
check_datum (numerical) – The check data to verify the continuous data, shifted by any constant_check_shift
- Returns:
The Quality code 200
- Return type:
int
- class hydrobot.data_sources.With200QualityCodeEvaluator(qc_500_limit, qc_600_limit, qc_400_limit, constant_check_shift=0)[source]¶
Bases:
QualityCodeEvaluatorFor standard quality code evaluators that also have QC200 data.
Examples: pH and Conductivity.
- find_qc(base_datum, check_datum)[source]¶
Find the base quality codes.
- Parameters:
base_datum (numerical) – Closest continuum datum point to the check
check_datum (numerical) – The check data to verify the continuous data, shifted by any constant_check_shift
- Returns:
The Quality code
- Return type:
int
- hydrobot.data_sources.depth_check_measurement_name_by_data_family(data_family, depth)[source]¶
Return check measurement name for the data family at depth.
Many data sources have separate measurement name formats for lake sampling, so this maps the data_family/depth to the appropriate check measurement name
- Parameters:
data_family (str) – data family to find check measurement name for
depth (int) – depth of the measurement, in mm
- Returns:
The check measurement name
- Return type:
str
- hydrobot.data_sources.depth_standard_measurement_name_by_data_family(data_family, depth)[source]¶
Return standard measurement name for the data family at depth.
Many data sources have separate measurement name formats for lake sampling, so this maps the data_family/depth to the appropriate standard measurement name
- Parameters:
data_family (str) – data family to find standard measurement name for
depth (int) – depth of the measurement, in mm
- Returns:
The standard measurement name
- Return type:
str
- hydrobot.data_sources.get_qc_evaluator(family: str)[source]¶
Get QC evaluator from data family name.
- hydrobot.data_sources.hilltop_export(file_location: str, site_name: str, std_series: Series, check_series: Series, qc_series: Series)[source]¶
Export the 3 main series to csv files ready to import into hilltop.
- Parameters:
file_location (str) – Where the files are exported to
site_name (str) – Site name
std_series (pd.Series) – Standard series
check_series (pd.Series) – Check series
qc_series (pd.Series) – Quality code series
- Return type:
None, but makes files
hydrobot.evaluator module¶
Tools for checking quality and finding problems in the data.
- hydrobot.evaluator.base_data_meets_qc(std_series, qc_series, target_qc)[source]¶
Find all data where QC targets are met.
Returns only the base series data for which the next date in the qc_filter is equal to target_qc
- Parameters:
std_series (pandas.Series) – Data to be filtered
qc_series (pandas.Series) – quality code data series, some of which are presumably target_qc
target_qc (int) – target quality code
- Returns:
Filtered data
- Return type:
pandas.Series
- hydrobot.evaluator.base_data_qc_filter(std_series, qc_filter)[source]¶
Filter out data based on quality code filter.
Return only the base series data for which the next date in the qc_filter is ‘true’
- Parameters:
std_series (pandas.Series) – Data to be filtered
qc_filter (pandas.Series of booleans) – Dates for which some condition is met or not
- Returns:
The filtered data
- Return type:
pandas.Series
- hydrobot.evaluator.bulk_downgrade_out_of_validation(qc_frame: DataFrame, check_series: Series, interval_dict: dict, day_end_rounding: bool = True)[source]¶
Applies caps on quality codes for any data that has gaps between check data that is too large.
Utilises single_downgrade_out_of_validation multiple times for different time periods.
- Parameters:
qc_frame (pd.DataFrame) – Quality series that potentially needs downgrading
check_series (pd.Series) – Check series to check for frequency of checks
interval_dict (dict) – Key:Value pairs of max_interval:downgraded_qc for single_downgrade_out_of_validation
day_end_rounding (bool) – Whether to round to the day end. If true, downgraded data starts at midnight
- Returns:
The qc_frame with any downgraded QCs added in
- Return type:
pd.DataFrame
- hydrobot.evaluator.cap_qc_where_std_high(std_frame, qc_frame, cap_qc, cap_threshold)[source]¶
Cap the quality code of data where the standard series exceeds some value.
- Parameters:
std_frame (pd.DataFrame)
qc_frame (pd.DataFrame)
cap_qc (numeric)
cap_threshold (numeric)
- Returns:
the qc series to return
- Return type:
pd.DataFrame
- hydrobot.evaluator.check_data_quality_code(series: Series, check_series: Series, qc_evaluator: QualityCodeEvaluator, gap_limit=10800) DataFrame[source]¶
Quality Code Check Data.
Quality codes data based on the difference between the standard series and the check data
- Parameters:
series (pd.Series) – Data to be quality coded
check_series (pd.Series) – Check data - must not be empty
qc_evaluator (data_sources.QualityCodeEvaluator) – Handler for QC comparisons
gap_limit (integer (seconds)) – If the nearest real data point is more than this many seconds away, return 200
- Returns:
The QC values of the series, indexed by the END time of the QC period
- Return type:
pd.Series
- hydrobot.evaluator.diagnose_data(std_series, check_series, qc_series, frequency)[source]¶
Return description of how much missing data, how much for each QC, etc.
This function feels like a mess, I’m sorry. The good news is that it is only a diagnostic, so feel free to change the hell out of it
- Parameters:
std_series (pandas.Series) – processed base time series data
check_series (pandas.Series) – Check datatime series
qc_series (pandas.Series) – QC time series
frequency (DateOffset or str) – Frequency to which the data gets set to
- Returns:
Prints statements that describe the state of the data
- Return type:
None
- hydrobot.evaluator.find_nearest_time(series, dt)[source]¶
Find the time in the series that is closest to dt.
For example for the series:
pd.Timestamp("2021-01-01 02:00"): 0.0, pd.Timestamp("2021-01-01 02:15"): 0.0,
with dt:
pd.Timestamp("2021-01-01 02:13"): 0.0,
the result should be the closer
pd.Timestamp("2021-01-01 02:15")value- Parameters:
series (pd.Series) – The series indexed by time
dt (Datetime) – Time that may or may nor exactly line up with the series
- Returns:
The value of dt rounded to the nearest timestamp of the series
- Return type:
Datetime
- hydrobot.evaluator.find_nearest_valid_time(series, dt) Timestamp[source]¶
Find the time in the series that is closest to dt, but ignoring NaN values (gaps).
- Parameters:
series (pd.Series) – The series indexed by time
dt (Datetime) – Time that may or may nor exactly line up with the series
- Returns:
The value of dt rounded to the nearest timestamp of the series
- Return type:
Datetime
- hydrobot.evaluator.gap_finder(data: Series) list[source]¶
Find the indices and lengths of gaps (sequences of NaN values) in a pandas Series.
- Parameters:
data (pd.Series) – Input Series containing NaN values.
- Returns:
List of tuples, each containing the index of a NaN value, the length of the gap containing it, and True for strictness.
- Return type:
list
- hydrobot.evaluator.max_qc_limiter(qc_frame: DataFrame, max_qc) DataFrame[source]¶
Enforce max_qc on a QC series.
Replaces all values with QCs above max_qc with max_qc
- Parameters:
qc_frame (pd.DataFrame) – The series to be limited.
max_qc (numerical) – maximum allowed value. None imposes no limit.
- Returns:
qc_frame with too high QCs limited to max_qc
- Return type:
pd.DataFrame
- hydrobot.evaluator.missing_data_quality_code(std_series, qc_data, gap_limit)[source]¶
Make sure that missing data is QC100.
Returns qc_frame with QC100 values added where std_series is NaN
- Parameters:
std_series (pd.Series) – Base series which may contain NaNs
qc_data – QC series for base std_series without QC100 values
gap_limit – Maximum size of gaps which will be ignored
- Returns:
The modified QC series, indexed by the start time of the QC period
- Return type:
pd.Series
- hydrobot.evaluator.single_downgrade_out_of_validation(qc_frame: DataFrame, check_series: Series, max_interval: DateOffset, downgraded_qc: int = 200, day_end_rounding: bool = True)[source]¶
Applies a cap on quality codes for any data that has gaps between check data that is too large.
Only applies a single cap quality code, see bulk_downgrade_out_of_validation for multiple steps.
- Parameters:
qc_frame (pd.DataFrame) – Quality series that potentially needs downgrading
check_series (pd.Series) – Check series to check for frequency of checks
max_interval (pd.DateOffset) – How long of a gap between checks before the data gets downgraded
downgraded_qc (int) – Which code the quality data gets downgraded to
day_end_rounding (bool) – Whether to round to the day end. If true, downgraded data starts at midnight
- Returns:
The qc_frame with any downgraded QCs added in
- Return type:
pd.DataFrame
- hydrobot.evaluator.small_gap_closer(series: Series, gap_limit: int) Series[source]¶
Remove small gaps from a series.
Gaps are defined by a sequential number of np.nan values Small gaps are defined as gaps of length gap_length or less.
Will return series with the nan values in the short gaps removed, and the long gaps untouched.
- Parameters:
series (pandas.Series) – Data which has gaps to be closed
gap_limit (integer) – Maximum length of gaps removed, will remove all np.nan’s in consecutive runs of gap_length or less
- Returns:
Data with any short gaps removed
- Return type:
pandas.Series
- hydrobot.evaluator.splitter(std_series, qc_series)[source]¶
Split the data up by QC code.
Selects all data which meets a given QC code, pads the rest with NaN values Does this for all current NEMs values ([0, 100, 200, 300, 400, 500, 600])
- Parameters:
std_series (pd.Series) – Time series data to be split up
qc_series (pd.Series) – QC values to split the data by
- Returns:
dict of int – Keys are the QC values as ints, values are series of data that fits
- Return type:
pd.Series pairs
hydrobot.filters module¶
General filtering utilities.
- hydrobot.filters.clip(unclipped: Series, low_clip: float, high_clip: float)[source]¶
Clip values in a pandas Series within a specified range.
- Parameters:
unclipped (pandas.Series) – Input data to be clipped.
high_clip (float) – Upper bound for clipping. Values greater than this will be set to NaN.
low_clip (float) – Lower bound for clipping. Values less than this will be set to NaN.
- Returns:
A Series containing the clipped values with the same index as the input Series.
- Return type:
pandas.Series
- hydrobot.filters.fbewma(input_data, span: int)[source]¶
Calculate the Forward-Backward Exponentially Weighted Moving Average (FBEWMA).
- Parameters:
input_data (pandas.Series) – Input time series data to calculate the FBEWMA on.
span (int) – Span parameter for exponential weighting.
- Returns:
A Series containing the FBEWMA values with the same index as the input Series.
- Return type:
pandas.Series
- hydrobot.filters.flatline_value_remover(series: Series, span: int = 3)[source]¶
Remove repeated (flatlined) values in a series.
Examines the data to see if any values are exactly repeated over a period. Where values exactly repeat it probably indicates a broken instrument. Replaces all values after the first with NaN. Uses math.isclose() to measure float “equality”
- Parameters:
series (pd.Series) – Data to examine for flatlined values
span (int) – Amount of allowed repeated values in a row before duplicates are removed
- Returns:
Data with the flatlined values replaced with np.nan
- Return type:
pd.Series
- hydrobot.filters.remove_one_spikes(input_data: Series, threshold_factor=3.0, window_size=5) Series[source]¶
Detect and remove single-point spikes in a time series.
A one-point spike is defined as a data point that deviates significantly from both its preceding and following points and the local trend. For the removal of more complex multi-spikes, use the remove_spikes() function.
NOTE: This function only works when baseline data is fairly stable. If baseline data is noisy or has high variability, use one_spike_filter_mad() instead.
- Parameters:
input_data (pandas.Series) – The input time series data.
threshold_factor (float) – Multiplier for the standard deviation to define the spike threshold. Default is 3.0.
window_size (int) – The size of the rolling window to compute local statistics. Default is 5.
- Returns:
filtered_data – The time series with one-point spikes removed (set to NaN).
- Return type:
pandas.Series
- hydrobot.filters.remove_one_spikes_mad(input_data: Series, threshold_factor=2.5) Series[source]¶
Detect and remove single-point spikes using Median Absolute Deviation (MAD).
A one-point spike is defined as a data point that deviates significantly from both its preceding and following points and the local trend. For the removal of more complex multi-spikes, use the remove_spikes() function.
NOTE: This function is more robust to noisy or variable baseline data than remove_one_spikes().
ALSO NOTE: This function is… not very good. I think I need to play with desmos a bit more to get a better thresholding mechanism.
- Parameters:
input_data (pandas.Series) – The input time series data.
threshold_factor (float) – Multiplier for the MAD to define the spike threshold. Default is 2.5.
- Returns:
filtered_data – The time series with one-point spikes removed (set to NaN).
- Return type:
pandas.Series
- hydrobot.filters.remove_outliers(input_data: Series, span: int, delta: float)[source]¶
Remove outliers.
Remove outliers from a time series by comparing it to the Forward-Backward Exponentially Weighted Moving Average (FBEWMA).
- Parameters:
input_data (pandas.Series) – Input time series data.
span (int) – Span parameter for exponential weighting used in the FBEWMA.
delta (float) – Threshold for identifying outliers. Values greater than this threshold will be set to NaN.
- Returns:
A Series containing the time series with outliers removed with the same index as the input Series.
- Return type:
pandas.Series
- hydrobot.filters.remove_range(input_series: Series | DataFrame, from_date: str | None, to_date: str | None, min_gap_length: int = 1, insert_gaps: str = 'none')[source]¶
Remove data from series in given range.
Returns the input series without data between from_date and to_date inclusive.
A None to_date will remove all data since the from_date (and vice versa). A double None for to_date/from_date removes all data.
Inserts gaps or not depending on insert_gaps
- Parameters:
input_series (pd.Series | pd.DataFrame) – The series or dataframe to have a section removed
from_date (str | None) – Start of removed section
to_date (str | None) – End of removed section
min_gap_length (int) – Will insert gaps based on insert_gaps strategy if missing more data points than min_gap_length in a row.
insert_gaps (str) – If “all” will insert np.nan at every missing point. If “start” will insert np.nan only at from_date. If “end” will insert np.nan only at to_date. If “none” will insert no np.nan values, and remove all timestamps completely.
- Returns:
The series with relevant slice removed
- Return type:
pd.Series
- hydrobot.filters.remove_spikes(input_data: Series, span: int, low_clip: float, high_clip: float, delta: float) Series[source]¶
Remove spikes.
Remove spikes from a time series data using a combination of clipping and interpolation.
- Parameters:
input_data (pandas.Series) – Input time series data.
span (int) – Span parameter for exponential weighting used in outlier detection.
low_clip (float) – Lower bound for clipping. Values less than this will be set to NaN.
high_clip (float) – Upper bound for clipping. Values greater than this will be set to NaN.
delta (float) – Threshold for identifying outliers. Values greater than this threshold will be considered spikes.
- Returns:
A Series containing the time series with spikes removed with the same index as the input Series.
- Return type:
pandas.Series
- hydrobot.filters.trim_series(std_series: Series, check_series: Series | Timestamp) Series[source]¶
Remove end of std series to match check series.
All data after the last entry in check_series is presumed to be unchecked, so that data is removed from the std_series
If check_series is empty, returns the entire std_series
- Parameters:
std_series (pd.Series) – The series to be trimmed
check_series (pd.Series | pd.DataFrame | pd.Timestamp) – Indicates the end of the usable data
- Returns:
std_series with the unchecked elements trimmed
- Return type:
pd.Series
hydrobot.plotter module¶
Tools for displaying potentially problematic data.
- hydrobot.plotter.add_qc_limit_bars(qc400, qc500, fig=None, **kwargs: int)[source]¶
Add horizontal lines to the plot for the QC limits.
- Parameters:
qc400 (float) – The value of the QC400 limit
qc500 (float) – The value of the QC500 limit
fig (go.Figure) – The figure to add the horizontal lines to
kwargs (dict) – Additional arguments to pass to the lines
- Return type:
go.Figure
- hydrobot.plotter.plot_check_data(standard_series, check_data, constant_check_shift, tag_list=None, check_names=None, ghosts=False, diffs=False, align_checks=False, fig=None, rain_control=False, **kwargs: int)[source]¶
Plot the check data.
- Parameters:
standard_series (pd.Series) – The series to be plotted
check_data (pd.DataFrame) – The data to be plotted on top of the standard data
constant_check_shift (float) – The shift between the check data and the standard data
tag_list (list[str]) – The tags of the check data
check_names (list[str]) – The names of the check data
ghosts (bool) – Whether to plot the check data where the timestamps are
diffs (bool) – Whether to plot the difference between the check data and the standard data
align_checks (bool) – Whether to align the check data to the standard data
fig (go.Figure) – The figure to add the plot to
rain_control (bool) – Adjustment for rain control plot
kwargs (dict) – Additional arguments to be passed to the plot
- Return type:
go.Figure
- hydrobot.plotter.plot_processing_overview_chart(standard_data, quality_data, check_data, constant_check_shift, qc_500_limit, qc_600_limit, tag_list=None, check_names=None, fig=None, rain_control=False, **kwargs)[source]¶
Plot the standard processing plot with small pcc chart underneath.
- Parameters:
standard_data (pd.DataFrame) – The data to be plotted
quality_data (pd.DataFrame) – The quality data to be plotted
check_data (pd.DataFrame) – The check data to be plotted
constant_check_shift (float) – The shift between the check data and the standard data
qc_500_limit (float) – The value of the QC500 limit
qc_600_limit (float) – The value of the QC600 limit
tag_list (list[str]) – The tags of the check data
check_names (list[str]) – The names of the check data
fig (go.Figure, optional) – The figure to add the plot to, will make a new one if none
rain_control (bool) – Adjustment for rain control plot
kwargs (dict) – Additional arguments to pass to the plot
- Return type:
go.Figure
- hydrobot.plotter.plot_qc_codes(standard_series, quality_series, fig=None, **kwargs)[source]¶
Plot data with correct qc colour.
- Parameters:
standard_series (pd.Series) – Data to be sorted by colour
quality_series (pd.Series) – Data to use to determine colour
fig (go.Figure | None, optional) – The figure to add info to, will make a figure if None
- Return type:
go.Figure
- hydrobot.plotter.plot_raw_data(raw_standard_series, fig=None, **kwargs: int)[source]¶
Plot the raw data with a grey line.
- Parameters:
raw_standard_series (pd.Series) – The data to be plotted.
fig (go.Figure) – The figure to add the plot to
kwargs (dict) – Additional arguments to be passed to the plot
- Return type:
go.Figure
hydrobot.processor module¶
Processor class.
- class hydrobot.processor.Processor(self, base_url: str, site: str, standard_hts_filename: str, standard_measurement_name: str, frequency: str | None, data_family: str, from_date: str | None = None, to_date: str | None = None, check_hts_filename: str | None = None, check_measurement_name: str | None = None, defaults: dict | None = None, interval_dict: dict | None = None, constant_check_shift: float = 0, fetch_quality: bool = False, export_file_name: str | None = None, archive_base_url: str | None = None, archive_standard_hts_filename: str | None = None, archive_check_hts_filename: str | None = None, provisional_wq_filename: str | None = None, archive_standard_measurement_name: str | None = None, depth: float | None = None, infer_frequency: bool = True, **kwargs)[source]¶
Bases:
objectProcessor class for handling data processing.
- _defaults¶
The default settings.
- Type:
dict
- _site¶
The site to be processed.
- Type:
str
- _standard_measurement_name¶
The standard measurement to be processed.
- Type:
str
- _check_measurement_name¶
The measurement to be checked.
- Type:
str
- _base_url¶
The base URL of the Hilltop server.
- Type:
str
- _standard_hts_filename¶
The standard Hilltop service.
- Type:
str
- _check_hts_filename¶
The Hilltop service to be checked.
- Type:
str
- _frequency¶
The frequency of the data.
- Type:
str
- _from_date¶
The start date of the data.
- Type:
str
- _to_date¶
The end date of the data.
- Type:
str
- _quality_code_evaluator¶
The quality code evaluator.
- Type:
- _interval_dict¶
Determines how data with old checks is downgraded.
- Type:
dict
- _standard_data¶
The standard series data.
- Type:
pd.Series
- _check_data¶
The series containing check data.
- Type:
pd.Series
- _quality_data¶
The quality series data.
- Type:
pd.Series
- standard_item_name¶
The name of the standard item.
- Type:
str
- standard_data_source_name¶
The name of the standard data source.
- Type:
str
- check_item_name¶
The name of the check item.
- Type:
str
- check_data_source_name¶
The name of the check data source.
- Type:
str
- export_file_name¶
Where the data is exported to. Used as default when exporting without specified
- Type:
str
- add_check(extra_check)[source]¶
Incorporate extra check data into the check series using utils.merge_series.
- Parameters:
extra_check – extra check data
- Return type:
None, but adds data to self.check_series
- add_quality(extra_quality)[source]¶
Incorporate extra quality data into the quality series using utils.merge_series.
- Parameters:
extra_quality – extra quality data
- Return type:
None, but adds data to self.quality_series
- add_standard(extra_standard)[source]¶
Incorporate extra standard data into the standard series using utils.merge_series.
- Parameters:
extra_standard – extra standard data
- Return type:
None, but adds data to self.standard_data
- property base_url¶
The base URL of the Hilltop server.
- Type:
str
- check_data¶
Decorate class methods to provide Annalist functionality.
Used as a decorator for class methods to provide Annalist logging. Unlike
function_logger, this decorator preserves knowledge of the class instance of the method that it decorates, which can be used to log information that is only available at runtime.This logger looks for input arguments instance that are named the same as custom fields in the formatter. If none such arguments are found, it looks for attributes on the parent class that match. If any are found, they are passed to Annalist to log them according to the formatter specification.
Examples
Class methods can be decorated with the
ClassLoggerto provide logging that preserves knowledge of the class instance. However, some linters have a difficult time understanding this syntax. For example,mypydoes not like custom decorator on __init__, even though this is perfectly legal code. In this case, add the linter comment# type: ignoreinline:- class MyClass:
@ClassLogger # type: ignore def __init__(self, prop1, …):
self._prop1 = prop1 …
It is also possible to decorate properties. These should be decorated on the
setter, and not the@property. Once again,mypyis not a big fan of this syntax, so add the# type: ignoreline if necessary:@property def prop1(self): return self._prop1 @ClassLogger # type: ignore @prop1.setter def prop1(self, value): self._prop1 = value
Do not decorate the
@propertymethod itself. This creates an infinite loop, as the logger calls the property, which calls the property …Normal methods, static methods, and class methods can be decorated as normal.
@ClassLogger def normal_method(self, arg):
…
@ClassLogger @staticmethod def static_method(arg):
…
@ClassLogger @classmethod def class_method(cls, arg):
…
I haven’t tried all the magic methods.
__init__works fine.__repr__does not, it does the infinite loop thing.
- property check_hts_filename¶
The Hilltop service to be checked.
- Type:
str
- clip(low_clip: float | None = None, high_clip: float | None = None)[source]¶
Clip data within specified low and high values.
- Parameters:
low_clip (float or None, optional) – The lower bound for clipping, by default None. If None, the low clip value from the class defaults is used.
high_clip (float or None, optional) – The upper bound for clipping, by default None. If None, the high clip value from the class defaults is used.
- Return type:
None
Notes
This method clips the data in both the standard and check series within the specified low and high values. It uses the filters.clip function for the actual clipping process.
Examples
>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1") >>> processor.clip(low_clip=0, high_clip=100) >>> processor.standard_data["Value"] <clipped standard series within the specified range> >>> processor.check_data["Value"] <clipped check series within the specified range>
- classmethod complete_yaml_parameters(config_path)[source]¶
Ensure a yaml holds all relevant parameters, filling in missing from/to dates.
- data_exporter(file_location=None, ftype='xml', standard: bool = True, quality: bool = True, check: bool = True, trimmed=True)[source]¶
Export data to file.
- Parameters:
file_location (str | None) – The file path where the file will be saved. If ‘ftype’ is “csv” or “xml”, this should be a full file path including extension. If ‘ftype’ is “hilltop_csv”, multiple files will be created, so ‘file_location’ should be a prefix that will be appended with “_std_qc.csv” for the file containing the standard and quality data, and “_check.csv” for the check data file. If None, uses self.export_file_name
ftype (str, optional) – Avalable options are “xml”, “hilltop_csv”, “csv”, “check”.
standard (bool, optional) – Whether standard data is exported, default true
check (bool, optional) – Whether check data is exported, default true
quality (bool, optional) – Whether quality data is exported, default true
trimmed (bool, optional) – If True, export trimmed data; otherwise, export the full data. Default is True.
- Return type:
None
- Raises:
ValueError –
If ftype is not a recognised string
Notes
This method exports data to a CSV file.
Examples
>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1") >>> processor.data_exporter("output.xml", trimmed=True) >>> # Check the generated XML file at 'output.xml'
- property defaults¶
The default settings.
- Type:
dict
- delete_range(from_date, to_date, tstype_standard=True, tstype_check=False, tstype_quality=False, gap_limit=None)[source]¶
Delete a range of data from specified time series types.
DEPRECATED: The use of this method is discouraged as it completely removes rows from the dataframes. User is encouraged to use ‘remove_range’ which marks rows for removal, but retains the timestamp to be associated with the other values in the row such as the raw value, reason for removal, etc.
- Parameters:
from_date (str) – The start date of the range to delete.
to_date (str) – The end date of the range to delete.
tstype_standard (bool, optional) – Flag to delete data from the standard series, by default True.
tstype_check (bool, optional) – Flag to delete data from the check series, by default False.
tstype_quality (bool, optional) – Flag to delete data from the quality series, by default False.
gap_limit (int, optional) – How big missing data is required to insert a gap.
- Return type:
None
Notes
This method deletes a specified range of data from the selected time series types. The range is defined by the from_date and to_date parameters.
Examples
>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1") >>> processor.delete_range(from_date="2022-01-01", to_date="2022-12-31", tstype_standard=True) >>> processor.standard_data <standard series with specified range deleted> >>> processor.delete_range(from_date="2022-01-01", to_date="2022-12-31", tstype_check=True) >>> processor.check_data <check series with specified range deleted>
- diagnosis()[source]¶
Provide a diagnosis of the data.
- Return type:
None
Notes
This method analyzes the state of the data, including the standard, check, and quality series. It provides diagnostic information about the data distribution, gaps, and other relevant characteristics.
Examples
>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1") >>> processor.import_data() >>> processor.diagnosis() >>> # View diagnostic information about the data.
- enforce_measurement_at_site(measurement_name, hilltop)[source]¶
Unimplemented test that measurement is in a given hilltop.
- property frequency¶
The frequency of the data.
- Type:
str
- classmethod from_config_yaml(config_path, fetch_quality=False)[source]¶
Initialises a Processor class given a config file.
- Parameters:
config_path (string) – Path to config.yaml.
fetch_quality (bool, optional) – Whether to fetch any existing quality data, default false
- Return type:
Processor, Annalist
- property from_date¶
The start date of the data.
- Type:
str
- classmethod from_processing_parameters_dict(processing_parameters, fetch_quality=False)[source]¶
Initialises a Processor class given a config file.
- Parameters:
processing_parameters (dict) – Dictionary of processing parameters
fetch_quality (bool, optional) – Whether to fetch any existing quality data, default false
- Return type:
Processor, Annalist
- gap_closer(gap_limit: int | None = None)[source]¶
Close small gaps in the standard series.
DEPRECATED: The use of this method is discouraged as it completely removes rows from the dataframes. The gap closing functionality has been moved to data_exporter, where gaps are handled automatically before data export.
- Parameters:
gap_limit (int or None, optional) – The maximum number of consecutive missing values to close, by default None. If None, the gap limit from the class defaults is used.
- Return type:
None
Notes
This method closes small gaps in the standard series by replacing consecutive missing values with interpolated or backfilled values. The gap closure is performed using the evaluator.small_gap_closer function.
Examples
>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1") >>> processor.gap_closer(gap_limit=5) >>> processor.standard_data["Value"] <updated standard series with closed gaps>
- get_measurement_dataframe(measurement, hts_type)[source]¶
Get a dataframe of a given measurement for other processor parameters.
- import_check(check_hts_filename: str | None = None, site: str | None = None, check_measurement_name: str | None = None, check_data_source_name: str | None = None, check_item_info: dict | None = None, check_item_name: str | None = None, check_data: DataFrame | None = None, from_date: str | None = None, to_date: str | None = None, base_url: str | None = None)[source]¶
Import Check data.
- Parameters:
check_hts_filename (str or None, optional) – Where to get check data from
site (str or None, optional) – Which site to get data from
check_measurement_name (str or None, optional) – Name for measurement to get
check_data_source_name (str or None, optional) – Name for data source to get
check_item_info (dict or None, optional) – ItemInfo to be used in hilltop xml
check_item_name (str or None, optional) – ItemName to be used in hilltop xml
check_data (pd.DataFrame or None, optional) – data which just gets overwritten I think? should maybe be removed
from_date (str or None, optional) – The start date for data retrieval. If None, defaults to the earliest available data.
to_date (str or None, optional) – The end date for data retrieval. If None, defaults to latest available data.
base_url (str, optional) – Base of the url to use for the hilltop server request. Defaults to the Processor value.
- Returns:
check_data
- Return type:
pd.DataFrame
- Raises:
TypeError – If the parsed Check data is not a pandas.DataFrame.
Notes
This method imports Check data from the specified server based on the provided parameters. It retrieves data using the data_acquisition.get_data function. The data is parsed and formatted according to the item_info in the data source.
Examples
>>> processor = Processor(...) # initialize processor instance >>> processor.import_check( ... from_date='2022-01-01', to_date='2022-01-10', overwrite=True ... )
- import_data(from_date: Timestamp | str | None = None, to_date: Timestamp | str | None = None, standard: bool = True, check: bool = True, quality: bool = True)[source]¶
Import data using the class parameter range.
- Parameters:
from_date (str or None, optional) – start of data to be imported, if None will use defaults
to_date (str or None, optional) – end of data to be imported, if None will use defaults
standard (bool, optional) – Whether to import standard data, by default True.
check (bool, optional) – Whether to import check data, by default True.
quality (bool, optional) – Whether to import quality data, by default False.
- Return type:
None
Notes
This method imports data for the specified date range, using the class parameters _from_date and _to_date. It updates the internal series data in the Processor instance for standard, check, and quality measurements separately.
Examples
>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1") >>> processor.import_data("2022-01-01", "2022-12-31",standard=True, check=True) False
- import_quality(standard_hts_filename: str | None = None, site: str | None = None, standard_measurement_name: str | None = None, standard_data_source_name: str | None = None, quality_data: DataFrame | None = None, from_date: str | None = None, to_date: str | None = None, base_url: str | None = None)[source]¶
Import quality data.
- Parameters:
standard_hts_filename (str or None, optional) – Where to get quality data from
site (str or None, optional) – Which site to get data from
standard_measurement_name (str or None, optional) – Name for measurement to get
standard_data_source_name (str or None, optional) – Name for data source to get
quality_data (pd.DataFrame or None, optional) – data which just gets overwritten I think? should maybe be removed
from_date (str or None, optional) – The start date for data retrieval. If None, defaults to the earliest available data.
to_date (str or None, optional) – The end date for data retrieval. If None, defaults to latest available data.
base_url (str, optional) – Base of the url to use for the hilltop server request. Defaults to the Processor value.
- Return type:
pd.DataFrame
- Raises:
TypeError – If the parsed Quality data is not a pandas.Series.
Notes
This method imports Quality data from the specified server based on the provided parameters. It retrieves data using the data_acquisition.get_data function and updates the Quality Series in the instance. The data is parsed and formatted according to the item_info in the data source.
Examples
>>> processor = Processor(...) # initialize processor instance >>> processor.import_quality( ... from_date='2022-01-01', to_date='2022-01-10', overwrite=True ... )
- import_standard(standard_hts_filename: str | None = None, site: str | None = None, standard_measurement_name: str | None = None, standard_data_source_name: str | None = None, standard_item_info: dict | None = None, standard_data: DataFrame | None = None, from_date: str | None = None, to_date: str | None = None, frequency: str | None = None, base_url: str | None = None, infer_frequency: bool | None = None)[source]¶
Import standard data.
- Parameters:
standard_hts_filename (str or None, optional) – The standard Hilltop service. If None, defaults to the standard HTS.
site (str or None, optional) – The site to be processed. If None, defaults to the site on the processor object.
standard_measurement_name (str or None, optional) – The standard measurement to be processed. If None, defaults to the standard measurement name on the processor object.
standard_data_source_name (str or None, optional) – The name of the standard data source. If None, defaults to the standard data source name on the processor object.
standard_item_info (dict or None, optional) – The item information for the standard data. If None, defaults to the standard item info on the processor object.
standard_data (pd.DataFrame or None, optional) – The standard data. If None, makes an empty standard_data object
from_date (str or None, optional) – The start date for data retrieval. If None, defaults to the earliest available data.
to_date (str or None, optional) – The end date for data retrieval. If None, defaults to latest available data.
frequency (str or None, optional) – The frequency of the data. If None and infer_frequency, defaults to the frequency on the processor object. If that’s also None, self.infer_frequency is consulted to determine whether to infer the frequency from the data.
base_url (str or None, optional) – URL to look for hilltop server. Will use self.base_url if None.
infer_frequency (str or None, optional.) – Whether to look for frequency. Uses self.infer_frequency if None. If True and frequency is provided will issue a warning.
- Returns:
The standard data
- Return type:
pd.DataFrame
- Raises:
ValueError –
If no standard data is found within the specified date range.
- TypeError
If the parsed Standard data is not a pandas.Series.
Notes
This method imports Standard data from the specified server based on the provided parameters. It retrieves data using the data_acquisition.get_data function and updates the Standard Series in the instance. The data is parsed and formatted according to the item_info in the data source.
Examples
>>> processor = Processor(...) # initialize processor instance >>> processor.import_standard( ... from_date='2022-01-01', to_date='2022-01-10' ... )
- interpolate_depth_profiles(depth: int | float, measurement: str, site: str | None = None, from_date: str | None | Timestamp = None, to_date: str | None | Timestamp = None)[source]¶
Looks up depth profile and find interpolates for given depth.
- Parameters:
depth (numeric) – what depth to interpolate to, in meters
measurement (str) – measurement + data source name e.g. “Water Temperature (Depth Profile)”
site (str | None) – site to use to look for depth profiles, if none will use default
from_date (str | pd.Timestamp | None) – start of period to look for, if none will use
to_date (str | pd.Timestamp | None)
- pad_data_with_nan_to_set_freq()[source]¶
Set the data to the correct frequency, filled with NaNs as appropriate.
- Return type:
None
Notes
This method adjusts the time series data to the correct frequency, filling missing values with NaNs as appropriate. It modifies the standard series in-place.
Examples
>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1") >>> processor.pad_data_with_nan_to_set_freq() >>> processor.standard_data <standard series with missing values filled with NaNs>
- plot_check_data(tag_list=None, check_names=None, ghosts=False, diffs=False, align_checks=False, fig=None, **kwargs)[source]¶
Implement plotting.plot_qc_codes.
- plot_processing_overview_chart(fig=None, **kwargs)[source]¶
Plot a processing overview chart.
- Parameters:
fig (plotly.graph_objects.Figure, optional) – The figure to plot on, by default None.
kwargs (dict) – Additional keyword arguments to pass to the plot
- Returns:
The figure with the processing overview chart.
- Return type:
plotly.graph_objects.Figure
- quality_code_evaluator¶
Decorate class methods to provide Annalist functionality.
Used as a decorator for class methods to provide Annalist logging. Unlike
function_logger, this decorator preserves knowledge of the class instance of the method that it decorates, which can be used to log information that is only available at runtime.This logger looks for input arguments instance that are named the same as custom fields in the formatter. If none such arguments are found, it looks for attributes on the parent class that match. If any are found, they are passed to Annalist to log them according to the formatter specification.
Examples
Class methods can be decorated with the
ClassLoggerto provide logging that preserves knowledge of the class instance. However, some linters have a difficult time understanding this syntax. For example,mypydoes not like custom decorator on __init__, even though this is perfectly legal code. In this case, add the linter comment# type: ignoreinline:- class MyClass:
@ClassLogger # type: ignore def __init__(self, prop1, …):
self._prop1 = prop1 …
It is also possible to decorate properties. These should be decorated on the
setter, and not the@property. Once again,mypyis not a big fan of this syntax, so add the# type: ignoreline if necessary:@property def prop1(self): return self._prop1 @ClassLogger # type: ignore @prop1.setter def prop1(self, value): self._prop1 = value
Do not decorate the
@propertymethod itself. This creates an infinite loop, as the logger calls the property, which calls the property …Normal methods, static methods, and class methods can be decorated as normal.
@ClassLogger def normal_method(self, arg):
…
@ClassLogger @staticmethod def static_method(arg):
…
@ClassLogger @classmethod def class_method(cls, arg):
…
I haven’t tried all the magic methods.
__init__works fine.__repr__does not, it does the infinite loop thing.
- quality_data¶
Decorate class methods to provide Annalist functionality.
Used as a decorator for class methods to provide Annalist logging. Unlike
function_logger, this decorator preserves knowledge of the class instance of the method that it decorates, which can be used to log information that is only available at runtime.This logger looks for input arguments instance that are named the same as custom fields in the formatter. If none such arguments are found, it looks for attributes on the parent class that match. If any are found, they are passed to Annalist to log them according to the formatter specification.
Examples
Class methods can be decorated with the
ClassLoggerto provide logging that preserves knowledge of the class instance. However, some linters have a difficult time understanding this syntax. For example,mypydoes not like custom decorator on __init__, even though this is perfectly legal code. In this case, add the linter comment# type: ignoreinline:- class MyClass:
@ClassLogger # type: ignore def __init__(self, prop1, …):
self._prop1 = prop1 …
It is also possible to decorate properties. These should be decorated on the
setter, and not the@property. Once again,mypyis not a big fan of this syntax, so add the# type: ignoreline if necessary:@property def prop1(self): return self._prop1 @ClassLogger # type: ignore @prop1.setter def prop1(self, value): self._prop1 = value
Do not decorate the
@propertymethod itself. This creates an infinite loop, as the logger calls the property, which calls the property …Normal methods, static methods, and class methods can be decorated as normal.
@ClassLogger def normal_method(self, arg):
…
@ClassLogger @staticmethod def static_method(arg):
…
@ClassLogger @classmethod def class_method(cls, arg):
…
I haven’t tried all the magic methods.
__init__works fine.__repr__does not, it does the infinite loop thing.
- quality_encoder(gap_limit: int | None = None, max_qc: int | float | None = None, interval_dict: dict | None = None)[source]¶
Encode quality information in the quality series.
- Parameters:
gap_limit (int or None, optional) – The maximum number of consecutive missing values to consider as gaps, by default None. If None, the gap limit from the class defaults is used.
max_qc (numeric or None, optional) – Maximum quality code possible at site If None, the max qc from the class defaults is used.
interval_dict (dict or None, optional) – Dictionary that dictates when to downgrade data with old checks Takes pd.DateOffset:quality_code pairs If None, the interval_dict from the class defaults is used.
- Return type:
None
Notes
This method encodes quality information in the quality series based on the provided standard series, check series, and measurement information. It uses the evaluator.quality_encoder function to determine the quality flags for the data.
Examples
>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1") >>> processor.quality_encoder(gap_limit=5) >>> processor.quality_data["Value"] <updated quality series with encoded quality flags>
- remove_flatlined_values(span: int = 3)[source]¶
Remove repeated values in std series a la flatline_value_remover().
- remove_one_spikes(threshold_factor: float = 3.0, window_size: int = 5)[source]¶
Remove one-spikes from the data.
A one-point spike is defined as a data point that deviates significantly from both its preceding and following points and the local trend. For the removal of more complex multi-spikes, use the remove_spikes() function.
NOTE: This function only works when baseline data is fairly stable. If baseline data is noisy or has high variability, use one_spike_filter_mad() instead.
- Parameters:
threshold_factor (float) – Multiplier for the standard deviation to define the spike threshold. Default is 3.0. Increasing this value makes the spike detection less sensitive.
window_size (int) – The size of the rolling window to compute local statistics. Default is 5. Increasing this value makes the spike detection less sensitive.
- Return type:
None
Notes
This method removes spikes from the standard series using the specified parameters. It utilizes the filters.remove_one_spikes function for the actual spike removal process.
Examples
>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1") >>> processor.remove_one_spikes(threshold_factor=3.0, window_size=5) >>> processor.standard_data["Value"] <standard series with spikes removed>
- remove_one_spikes_mad(threshold_factor: float = 2.5)[source]¶
Remove one-spikes from the data using Median Absolute Deviation (MAD).
A one-point spike is defined as a data point that deviates significantly from both its preceding and following points and the local trend. For the removal of more complex multi-spikes, use the remove_spikes() function.
NOTE: This function is more robust to noisy or variable baseline data than remove_one_spikes().
- Parameters:
input_data (pandas.Series) – The input time series data.
threshold_factor (float) – Multiplier for the MAD to define the spike threshold. Default is 2.5.
- Return type:
None
Notes
This method removes spikes from the standard series using the specified parameters. It utilizes the filters.remove_one_spikes_mad function for the actual spike removal process.
Examples
>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1") >>> processor.remove_one_spikes_mad(threshold_factor=2.5) >>> processor.standard_data["Value"] <standard series with spikes removed>
- remove_outliers(span: int | None = None, delta: float | None = None)[source]¶
Remove outliers from the data.
- Parameters:
span (int or None, optional) – The span parameter for smoothing, by default None. If None, the span value from the class defaults is used.
delta (float or None, optional) – The delta parameter for identifying outliers, by default None. If None, the delta value from the class defaults is used.
- Return type:
None
Notes
This method removes outliers from the standard series using the specified span and delta values. It utilizes the filters.remove_outliers function for the actual outlier removal process.
Examples
>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1") >>> processor.remove_outliers(span=10, delta=2.0) >>> processor.standard_data["Value"] <standard series with outliers removed>
- remove_range(from_date, to_date)[source]¶
Mark a range in standard_data for removal.
- Parameters:
from_date (str) – The start date of the range to delete.
to_date (str) – The end date of the range to delete.
- Return type:
None
Notes
This method deletes a specified range of data from the selected time series types. The range is defined by the from_date and to_date parameters.
Examples
>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1") >>> processor.remove_range(from_date="2022-01-01", to_date="2022-12-31", tstype_standard=True) >>> processor.standard_data <standard series with specified range deleted> >>> processor.remove_range(from_date="2022-01-01", to_date="2022-12-31", tstype_check=True) >>> processor.check_data <check series with specified range deleted>
- remove_spikes(low_clip: float | None = None, high_clip: float | None = None, span: int | None = None, delta: float | None = None)[source]¶
Remove spikes from the data.
- Parameters:
low_clip (float or None, optional) – The lower clipping threshold, by default None. If None, the low_clip value from the class defaults is used.
high_clip (float or None, optional) – The upper clipping threshold, by default None. If None, the high_clip value from the class defaults is used.
span (int or None, optional) – The span parameter for smoothing, by default None. If None, the span value from the class defaults is used.
delta (float or None, optional) – The delta parameter for identifying spikes, by default None. If None, the delta value from the class defaults is used.
- Return type:
None
Notes
This method removes spikes from the standard series using the specified parameters. It utilizes the filters.remove_spikes function for the actual spike removal process.
Examples
>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1") >>> processor.remove_spikes(low_clip=10, high_clip=20, span=5, delta=2.0) >>> processor.standard_data["Value"] <standard series with spikes removed>
- report_processing_issue(start_time=None, end_time=None, code=None, comment=None, series_type=None, message_type=None)[source]¶
Add an issue to be reported for processing usage.
This method adds an issue to the processing_issues DataFrame.
- Parameters:
start_time (str | None) – The start time of the issue.
end_time (str | None) – The end time of the issue.
code (str | None) – The code of the issue.
comment (str | None) – The comment of the issue.
series_type (str | None) – The type of the series the issue is related to.
message_type (str | None) – Should be one of: [“debug”, “info”, “warning”, “error”]
- property site¶
The site to be processed.
- Type:
str
- standard_data¶
Decorate class methods to provide Annalist functionality.
Used as a decorator for class methods to provide Annalist logging. Unlike
function_logger, this decorator preserves knowledge of the class instance of the method that it decorates, which can be used to log information that is only available at runtime.This logger looks for input arguments instance that are named the same as custom fields in the formatter. If none such arguments are found, it looks for attributes on the parent class that match. If any are found, they are passed to Annalist to log them according to the formatter specification.
Examples
Class methods can be decorated with the
ClassLoggerto provide logging that preserves knowledge of the class instance. However, some linters have a difficult time understanding this syntax. For example,mypydoes not like custom decorator on __init__, even though this is perfectly legal code. In this case, add the linter comment# type: ignoreinline:- class MyClass:
@ClassLogger # type: ignore def __init__(self, prop1, …):
self._prop1 = prop1 …
It is also possible to decorate properties. These should be decorated on the
setter, and not the@property. Once again,mypyis not a big fan of this syntax, so add the# type: ignoreline if necessary:@property def prop1(self): return self._prop1 @ClassLogger # type: ignore @prop1.setter def prop1(self, value): self._prop1 = value
Do not decorate the
@propertymethod itself. This creates an infinite loop, as the logger calls the property, which calls the property …Normal methods, static methods, and class methods can be decorated as normal.
@ClassLogger def normal_method(self, arg):
…
@ClassLogger @staticmethod def static_method(arg):
…
@ClassLogger @classmethod def class_method(cls, arg):
…
I haven’t tried all the magic methods.
__init__works fine.__repr__does not, it does the infinite loop thing.
- property standard_hts_filename¶
The standard Hilltop service.
- Type:
str
- property standard_measurement_name¶
The site to be processed.
- Type:
str
- property to_date¶
The end date of the data.
- Type:
str
- to_xml_data_structure(standard=True, quality=True, check=True)[source]¶
Convert Processor object data to a list of XML data structures.
- Returns:
List of DataSourceBlob instances representing the data in the Processor object.
- Return type:
list of data_structure.DataSourceBlob
Notes
This method converts the data in the Processor object, including standard, check, and quality series, into a list of DataSourceBlob instances. Each DataSourceBlob contains information about the site, data source, and associated data.
Examples
>>> processor = Processor(base_url="https://hilltop-server.com", site="Site1") >>> processor.import_data() >>> xml_data_list = processor.to_xml_data_structure() >>> # Convert Processor data to a list of XML data structures.
hydrobot.testicle module¶
Module contents¶
Top-level package for Hydro Processing Tools.