PyPi
DOI PyV

Here I’ll show you how to use some of HydroBr’s functions. If you have any doubts about the functions/methods available, you use the python’s help function or read the docs page.

ANA database

Here is an example of usage of the functions that provides data from the Brazilian National Water Agency (Agência Nacional de Águas - ANA) database.

Importing the package


import hydrobr


Getting data from ANA

List of stations

HydroBr has two functions for list of stations from ANA:

  • hydrobr.get_data.ANA.list_prec_stations - To get the list of precipitation stations
  • hydrobr.get_data.ANA.list_flow_stations - To get the list of flow/stage stations

When you use one of these functions, you will be able to get the list of stations directly from the ANA database (source=’ANA’), where you will get a list with many stations that have no one registered data, leading you to lost time in your search for data.

I highly recommend you to use the functions standard source, ‘ANAF’, where you will get a filtered list of stations, which already contains some description about each of that. The description includes:

  • date of first measured data (FirstDate)
  • date of last measured data (EndDate)
  • number of years with data (NYD)
  • missing data percentage between the start and the last date (MD)
  • number of years without any missing data (N_YWOMD)
  • percentage of years with missing data (YWMD).
# To get the list of prec stations - source='ANAF' is the standart
list_stations = hydrobr.get_data.ANA.list_prec_stations() 
# To show the first five rows of the data
list_stations.head() 


Name Code Type SubBasin City State Responsible Latitude Longitude StartDate EndDate NYD MD N_YWOMD YWMD
0 SALINÓPOLIS 47000 2 32 SALINÓPOLIS PARÁ INMET -0.6500 -47.5500 1958/01/01 1964/12/31 7 25.0 0 100.0
1 SALINÓPOLIS 47002 2 32 SALINÓPOLIS PARÁ ANA -0.6231 -47.3536 1977/12/09 2019/08/31 43 3.5 35 18.6
2 CURUÇA 47003 2 32 CURUÇA PARÁ ANA -0.7375 -47.8536 1981/07/01 2019/07/31 39 2.4 29 25.6
3 PRIMAVERA 47004 2 32 PRIMAVERA PARÁ ANA -0.9294 -47.0994 1982/02/18 2019/08/31 38 0.0 35 7.9
4 MARUDA 47005 2 32 MARAPANIM PARÁ ANA -0.6336 -47.6583 1989/08/21 2019/07/31 31 5.0 20 35.5


# Getting the first five stations code as a list
stations_code = list_stations.Code.to_list()[:5] 

If we need to know about the spatial distribution from a list of stations, we can use the hydrobr.Plot.spatial_stations function.

  • To use this function you’ll need a mapbox access token, that you can get here.
  • The functions from hydrobr.Plot provides plotly figure, to show that we need to import the plotly library. Thus it is possible to configure the layout of the plot.

In the example below, I will plot the flow station distribution in the state of Rio de Janeiro.

from plotly.offline import plot
map_box_access_token = 'your-mapbox-token-access-here'
flow_stations_list = hydrobr.get_data.ANA.list_flow_stations(
                                        state='RIO DE JANEIRO')


#Get the spatial stations plot fig
spatial_fig=hydrobr.Plot.spatial_stations(flow_stations_list,map_box_access_token)
# If you want tu update the plot zoom! 
spatial_fig.update_layout(mapbox=dict(zoom=7))
plot(spatial_fig,filename='spatial' + '.html')

The .html figure output:


Stations’ Data

HydroBr has three functions to gate data from the ANA database:

  • hydrobr.get_data.ANA.prec_data - To get the precipitation data from a list of precipitation stations
  • hydrobr.get_data.ANA.flow_data - To get the flow data from a list of flow/stage stations
  • hydrobr.get_data.ANA.stage_data - To get the stage data from a list of flow/stage stations
#Gettin the data
data_stations = hydrobr.get_data.ANA.prec_data(stations_code) 
100%|██████████████████████████████████| 5/5 [00:13<00:00,  2.62s/it]


data_stations.info()
<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 22523 entries, 1958-01-01 to 2019-08-31
Freq: D
Data columns (total 5 columns):
 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   00047000  1918 non-null   float64
 1   00047002  14516 non-null  float64
 2   00047003  13397 non-null  float64
 3   00047004  13524 non-null  float64
 4   00047005  10212 non-null  float64
dtypes: float64(5)
memory usage: 1.0 MB

In the last command, we used the pandas’ info function, and it returns basic information about the data that we got. For instance, for the station with code 00047003, we could see 13397 registered daily data (Freq: D) between 1958-01-01 and 2019-08-31.

But we can plot the temporal data availability using the Gantt Plot.

The functions from hydrobr.Plot provides plotly figure, to show that we need to import the plotly library. Thus it is possible to configure the layout of the plot.

from plotly.offline import plot


gantt_fig = hydrobr.Plot.gantt(data_stations) #Get the Gantt Fig

#Updating the layout
gantt_fig.update_layout(
    autosize=False,
    width=1000,
    height=500,
    xaxis_title = 'Year',
    yaxis_title = 'Station Code',
    font=dict(family="Courier New, monospace", size=12))

#To plot and save the gantt plot as html
plot(gantt_fig,filename='gantt' + '.html') 

The .html figure output: