Forming Queries

Get Familiar with querying the database. BUT don’t forget your cheat sheets!

Process

Getting Connected

Getting connected to the database is easiest done using the snowexsql library function get_db

# Import the function to get connect to the db
from snowexsql.db import get_db

# This is what you will use for all of hackweek to access the db
db_name = 'snow:hackweek@db.snowexdata.org/snowex'

Importing the tables classes

These are critical for build queries. You will need at least one of these every query since they reflect the data were interested in.

from snowexsql.data import SiteData, PointData, LayerData, ImageData

Query Time!

We build queries in python using session.query(). Whatever we put inside of the query parentheses is what we will get back in the results!

# This is what you will use for all of hackweek to access the db
engine, session = get_db(db_name)

# Lets grab a single row from the points table
qry = session.query(PointData).limit(1)

# Execute that query!
result = qry.all()

session.close()

Pause for moment and consider what is in result….

Is it:

A. a single value
B. a bunch of values
C. an object
D. a row of values
# uncomment the line below and print out the results 
print(result)
[<snowexsql.data.PointData object at 0x7fb6d81248b0>]

This feels soooo limited :)

Questions

  • What happens if we changed the number in the limit? What will we get back?

  • Where are our column names?

  • What if I only wanted a single column and not a whole row?

Filtering

The database had a silly number of records, and asking for all of them will crash your computer.

So let talk about using .filter()

All queries can be reduced by applying session.query(__).filter(__) and a lot can go into the parentheses. This is where your cheat sheet will come in handy.

# This is what you will use for all of hackweek to access the db
engine, session = get_db(db_name)

# Its convenient to store a query like the following 
qry = session.query(LayerData)

# Then filter on it to just density profiles
qry = qry.filter(LayerData.type == 'density')

# protect ourselves from a lot of data
qry = qry.limit(5)

result = qry.all()
print(result)

session.close()
[<snowexsql.data.LayerData object at 0x7fb6d81272e0>, <snowexsql.data.LayerData object at 0x7fb6d8127970>, <snowexsql.data.LayerData object at 0x7fb6d8127af0>, <snowexsql.data.LayerData object at 0x7fb6d8127b50>, <snowexsql.data.LayerData object at 0x7fb6d8127bb0>]

Questions

  • What happens if I filter on a qry that’s been filtered?

  • What happens if I just want a single column/attribute back? How do I do that?

How do I know what to filter on?

Queries and .distinct()!

# This is what you will use for all of hackweek to access the db
engine, session = get_db(db_name)

# Get the unique datanames in the table
results = session.query(LayerData.type).distinct().all()
print('Available types = {}'.format(', '.join([r[0] for r in results])))

# Get the unique instrument in the table
results = session.query(LayerData.instrument).distinct().all()
print('\nAvailable Instruments = {}'.format(', '.join([str(r[0]) for r in results])))

# Get the unique dates in the table
results = session.query(LayerData.date).distinct().all()
print('\nAvailable Dates = {}'.format(', '.join([str(r[0]) for r in results])))

# Get the unique surveyors in the table
results = session.query(LayerData.observers).distinct().all()
print('\nAvailable surveyors = {}'.format(', '.join([str(r[0]) for r in results])))

session.close()
Available types = sample_signal, force, grain_size, density, reflectance, permittivity, lwc_vol, manual_wetness, equivalent_diameter, specific_surface_area, grain_type, temperature, hand_hardness
Available Instruments = IS3-SP-15-01US, IRIS, snowmicropen, None, IS3-SP-11-01F
Available Dates = 2020-03-12, 2020-02-04, 2020-03-07, 2020-02-21, 2020-04-06, 2020-01-09, 2020-05-03, 2019-12-10, 2020-01-04, 2020-02-27, 2020-02-13, 2020-02-14, 2020-04-17, 2020-01-19, 2020-01-17, 2020-02-19, 2019-12-18, 2020-02-26, 2019-12-30, 2020-02-03, 2020-03-13, 2020-01-30, 2019-10-24, 2020-03-11, 2020-02-01, 2020-01-24, 2020-04-22, 2020-02-09, 2020-04-19, 2020-03-31, 2020-05-12, 2020-01-08, 2020-01-23, 2020-02-06, 2020-03-05, 2020-04-01, 2019-10-29, 2020-01-31, 2020-04-18, 2020-04-29, 2020-01-22, 2020-02-20, 2020-02-07, 2020-02-12, 2020-05-06, 2020-03-18, 2019-12-03, 2020-02-08, 2020-05-13, 2019-12-20, 2019-12-23, 2020-04-28, 2020-02-28, 2020-05-20, 2020-02-05, 2020-01-28, 2020-01-16, 2020-04-21, 2019-12-19, 2020-02-11, 2020-03-28, 2020-01-27, 2020-02-10, 2020-03-06, 2020-03-04, 2019-11-24, 2020-04-09, 2020-03-21, 2020-03-16, 2020-02-02, 2020-04-27, 2020-02-25, 2020-04-08, 2020-03-25, 2019-12-16, 2020-01-29
Available surveyors = None, Juha Lemmetyinen, Kate Hale, Céline Vargel, Carrie Vuyovich, Juha Lemmetyinen & Ioanna Merkouriadi, Carrie Vuyovich & Juha Lemmetyinen, Kehan Yang

Recap

You just explored using the session object to form queries and compounding filters results with it

You should know:

  • How to build queries using filtering

  • How to isolate column data

  • Determine what values to filter on

If you don’t feel comfortable with these, you are probably not alone, let’s discuss it!