HydroBr

PyPi

Here I’ll show you how to use some of HydroBr’s functions. If you have any doubts about the functions/methods available, you use the python’s help function or read the docs page.

ANA database

Here is an example of usage of the functions that provides data from the Brazilian National Water Agency (Agência Nacional de Águas - ANA) database.

Importing the package

import hydrobr

Getting data from ANA

List of stations

HydroBr has two functions for list of stations from ANA:

hydrobr.get_data.ANA.list_prec_stations - To get the list of precipitation stations
hydrobr.get_data.ANA.list_flow_stations - To get the list of flow/stage stations

When you use one of these functions, you will be able to get the list of stations directly from the ANA database (source=’ANA’), where you will get a list with many stations that have no one registered data, leading you to lost time in your search for data.

I highly recommend you to use the functions standard source, ‘ANAF’, where you will get a filtered list of stations, which already contains some description about each of that. The description includes:

date of first measured data (FirstDate)
date of last measured data (EndDate)
number of years with data (NYD)
missing data percentage between the start and the last date (MD)
number of years without any missing data (N_YWOMD)
percentage of years with missing data (YWMD).

# To get the list of prec stations - source='ANAF' is the standart
list_stations = hydrobr.get_data.ANA.list_prec_stations() 
# To show the first five rows of the data
list_stations.head() 

	Name	Code	Type	SubBasin	City	State	Responsible	Latitude	Longitude	StartDate	EndDate	NYD	MD	N_YWOMD	YWMD
0	SALINÓPOLIS	47000	2	32	SALINÓPOLIS	PARÁ	INMET	-0.6500	-47.5500	1958/01/01	1964/12/31	7	25.0	0	100.0
1	SALINÓPOLIS	47002	2	32	SALINÓPOLIS	PARÁ	ANA	-0.6231	-47.3536	1977/12/09	2019/08/31	43	3.5	35	18.6
2	CURUÇA	47003	2	32	CURUÇA	PARÁ	ANA	-0.7375	-47.8536	1981/07/01	2019/07/31	39	2.4	29	25.6
3	PRIMAVERA	47004	2	32	PRIMAVERA	PARÁ	ANA	-0.9294	-47.0994	1982/02/18	2019/08/31	38	0.0	35	7.9
4	MARUDA	47005	2	32	MARAPANIM	PARÁ	ANA	-0.6336	-47.6583	1989/08/21	2019/07/31	31	5.0	20	35.5

# Getting the first five stations code as a list
stations_code = list_stations.Code.to_list()[:5] 

If we need to know about the spatial distribution from a list of stations, we can use the hydrobr.Plot.spatial_stations function.

To use this function you’ll need a mapbox access token, that you can get here.
The functions from hydrobr.Plot provides plotly figure, to show that we need to import the plotly library. Thus it is possible to configure the layout of the plot.

In the example below, I will plot the flow station distribution in the state of Rio de Janeiro.

from plotly.offline import plot
map_box_access_token = 'your-mapbox-token-access-here'
flow_stations_list = hydrobr.get_data.ANA.list_flow_stations(
                                        state='RIO DE JANEIRO')

#Get the spatial stations plot fig
spatial_fig=hydrobr.Plot.spatial_stations(flow_stations_list,map_box_access_token)
# If you want tu update the plot zoom! 
spatial_fig.update_layout(mapbox=dict(zoom=7))
plot(spatial_fig,filename='spatial' + '.html')

The .html figure output:

Stations’ Data

HydroBr has three functions to gate data from the ANA database:

hydrobr.get_data.ANA.prec_data - To get the precipitation data from a list of precipitation stations
hydrobr.get_data.ANA.flow_data - To get the flow data from a list of flow/stage stations
hydrobr.get_data.ANA.stage_data - To get the stage data from a list of flow/stage stations

#Gettin the data
data_stations = hydrobr.get_data.ANA.prec_data(stations_code) 

100%|██████████████████████████████████| 5/5 [00:13<00:00,  2.62s/it]

data_stations.info()

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 22523 entries, 1958-01-01 to 2019-08-31
Freq: D
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   00047000  1918 non-null   float64
 1   00047002  14516 non-null  float64
 2   00047003  13397 non-null  float64
 3   00047004  13524 non-null  float64
 4   00047005  10212 non-null  float64
dtypes: float64(5)
memory usage: 1.0 MB

In the last command, we used the pandas’ info function, and it returns basic information about the data that we got. For instance, for the station with code 00047003, we could see 13397 registered daily data (Freq: D) between 1958-01-01 and 2019-08-31.

But we can plot the temporal data availability using the Gantt Plot.

The functions from hydrobr.Plot provides plotly figure, to show that we need to import the plotly library. Thus it is possible to configure the layout of the plot.

from plotly.offline import plot

gantt_fig = hydrobr.Plot.gantt(data_stations) #Get the Gantt Fig

#Updating the layout
gantt_fig.update_layout(
    autosize=False,
    width=1000,
    height=500,
    xaxis_title = 'Year',
    yaxis_title = 'Station Code',
    font=dict(family="Courier New, monospace", size=12))

#To plot and save the gantt plot as html
plot(gantt_fig,filename='gantt' + '.html') 

The .html figure output: