Usage¶

There is a tiny SQLite database (1.4 MB) included with inspectomop to give first-time users a limited experimental playground and the ability to run code from the examples below.

Note

InspectOMOP does NOT contain EHR data from real patients. The data are entirely synthetic and come from the SynPUF dataset released by Centers for Medicaid and Medicare Services (CMS).

Connecting to a database¶

Inspector objects are in charge of interfacing with the backend database, extracting the available OMOP CDM tables, and performing queries.

Inspectors require a single parameter, connection_url, for instantiation:

In [1]: import inspectomop as iomop

In [2]: connection_url = iomop.test.test_connection_url()

In [3]: inspector = iomop.Inspector(connection_url)

connection_url is a database URL defined by SQLAlchemy that describes how to connect to your database. A database URL has three main components: a dialect, driver, and URL. The dialect indicates what type of backend DB you wish to connect to. You can use any supported by SQLAlchemy (MySql, SQLite, Postgres, etc.) out-of-the-box or a dialect written by a third party. See the full list here. The driver indicates which python DBAPI library you wish to use to run your queries. The SQLAlchemy dialects often contain a default DBAPI, so this may or may not be necessary depending on your configuration. Finally, the URL indicates where to look for the database and includes options for supplying a username and password.

'dialect+driver://username:password@host:port/database'

Note

See the SQLAlchemy docs on engine configuration for more details.

Here is an example URL for MySQL:

In [4]: mysql_url = 'mysql://johnny:appleseed@localhost/omop'

and one for SQLite:

In [5]: sql_url = 'sqlite:////abs/path/to/tiny_omop_test.sqlite3'

As you can see SQLite URLs are slightly different. They include an extra ‘/’ and thus will have ‘///’ for relative paths and ‘////’ for absolute paths.

Inspecting a database¶

Accessing tables¶

The tables property of an Inspector contains a dictionary of associated OMOP tables that are accessible by table name.

In [6]: inspector.tables.keys()
Out[6]: dict_keys(['location', 'procedure_occurrence', 'cost', 'note', 'relationship', 'cohort_definition', 'device_exposure', 'specimen', 'concept_ancestor', 'drug_strength', 'concept_synonym', 'person', 'note_nlp', 'concept', 'measurement', 'death', 'condition_occurrence', 'dose_era', 'cohort', 'attribute_definition', 'observation', 'fact_relationship', 'provider', 'vocabulary', 'care_site', 'concept_class', 'concept_relationship', 'cohort_attribute', 'drug_era', 'drug_exposure', 'source_to_concept_map', 'condition_era', 'visit_occurrence', 'payer_plan_period', 'observation_period', 'domain', 'cdm_source'])

In [7]: person = inspector.tables['person']

Accessing table columns¶

The columns in each table object are dot accessible and can be assigned to variables to construct query statements.

In [8]: from sqlalchemy import select

In [9]: person_id = person.person_id

In [10]: statement = select(person_id)

In [11]: print(statement)
SELECT main.person.person_id 
FROM main.person

Complete table descriptions¶

You can also get a description of all columns within a table, the data types, etc.

In [12]: inspector.table_info('person')
Out[12]: 
                         column      type  nullable  primary_key
                   person_id   INTEGER     False         True
           gender_concept_id   INTEGER     False        False
               year_of_birth   INTEGER     False        False
              month_of_birth   INTEGER      True        False
                day_of_birth   INTEGER      True        False
              birth_datetime  DATETIME      True        False
             race_concept_id   INTEGER     False        False
        ethnicity_concept_id   INTEGER     False        False
                 location_id   INTEGER      True        False
                 provider_id   INTEGER      True        False
               care_site_id   INTEGER      True        False
        person_source_value      TEXT      True        False
        gender_source_value      TEXT      True        False
   gender_source_concept_id   INTEGER      True        False
          race_source_value      TEXT      True        False
     race_source_concept_id   INTEGER      True        False
     ethnicity_source_value      TEXT      True        False
ethnicity_source_concept_id   INTEGER      True        False

Running built-in queries¶

A basic example¶

There are a variety of built in queries available in the Queries submodule. A typical query takes arguments for inputs (concept_ids, keywords, etc.), an Inspector to run the query against, and optionally a list of columns to subset from the default columns returned by the query.

# retrieve concepts for a list of concept_ids
In [13]: from inspectomop.queries.general import concepts_for_concept_ids

In [14]: concept_ids = [2, 3, 4, 7, 8, 10, 46287342, 46271022]

In [15]: return_columns = ['concept_name', 'concept_id']

In [16]: statement = concepts_for_concept_ids(concept_ids, inspector, return_columns=return_columns)

In [17]: with inspector.connect() as connection:
   ....:    results = connection.execute(statement).all()
   ....: 

In [18]: results
Out[18]: 
[(2, 'Gender'),
 (3, 'Race'),
 (4, 'Ethnicity'),
 (7, 'Metadata'),
 (8, 'Visit'),
 (10, 'Procedure'),
 (46271022, 'Chronic kidney disease'),
 (46287342, '2 ML Verapamil hydrochloride 2.5 MG/ML Injection')]

Note

You can get a list of columns a query returns by looking at the return_columns parameter in the docstring for each query.

Specifying how results are returned¶

By default all queries return an sqlalchemy.sql.expression.Executable statement that can be evaluated in a connection context from Inspector.connect() in a fashion identical to SQLAlchemy.

See the SQLAlchemy Unified Tutorial.

Working directly with statements¶

In [19]: statement = concepts_for_concept_ids(concept_ids, inspector)

In [20]: with inspector.connect() as connection:
   ....:    results = connection.execute(statement)
   ....: 

   #get the return column names
In [21]: results.keys()
Out[21]: RMKeyView(['concept_id', 'concept_name', 'concept_code', 'concept_class_id', 'standard_concept', 'vocabulary_id', 'vocabulary_name'])

   #get one row
In [22]: results.fetchone()
Out[22]: (2, 'Gender', 'OMOP generated', 'Domain', '', 'Domain', 'OMOP Domain')

   #get many rows
In [23]: two_results = results.fetchmany(2)

In [24]: len(two_results)
Out[24]: 2

   #iterating over rows
In [25]: for row in two_results:
   ....:       print(row[:2])
   ....: 
(3, 'Race')
(4, 'Ethnicity')

Returning results as Pandas DataFrames¶

In addition to the typical database cursor methods like .fetchone() and .fetchall() inspectomop.Results objects also have two handy methods, .as_pandas() and .as_pandas_chunks() for returning results as pandas DataFrames.

#return the results as as a dataframe
In [26]: with inspector.connect() as connection:
   ....:    results = connection.execute(concepts_for_concept_ids(concept_ids, inspector)).as_pandas()
   ....: 

In [27]: results[['concept_name','vocabulary_id']]
Out[27]: 
                                       concept_name vocabulary_id
0                                            Gender        Domain
1                                              Race        Domain
2                                         Ethnicity        Domain
3                                          Metadata        Domain
4                                             Visit        Domain
5                                         Procedure        Domain
6                            Chronic kidney disease        SNOMED
7  2 ML Verapamil hydrochloride 2.5 MG/ML Injection        RxNorm

#return the results in chunks
In [28]: chunksize = 3

In [29]: with inspector.connect() as connection:
   ....:    results = connection.execute(concepts_for_concept_ids(concept_ids, inspector)).as_pandas_chunks(chunksize)
   ....: 

In [30]: for num, chunk in enumerate(results):
   ....:       print('chunk {}'.format(num + 1))
   ....:       print(chunk['concept_name'])
   ....: 
chunk 1
0       Gender
1         Race
2    Ethnicity
Name: concept_name, dtype: str
chunk 2
0     Metadata
1        Visit
2    Procedure
Name: concept_name, dtype: str
chunk 3
0                              Chronic kidney disease
1    2 ML Verapamil hydrochloride 2.5 MG/ML Injection
Name: concept_name, dtype: str

Creating custom queries¶

From SQLAlchemy SQL Expressions¶

Statements built out of constructs from SQLAlchemy’s SQL Expression API make queries backend-neutral paving the way for sharable code that can be used in a plug-and-play fashion. While there is no guarantee that every query will work with every backend, most of the basic selects, joins, etc should run without issue.

SQLAlchemy is extremely powerful, but like any software package, has a bit of a learning curve. It is highly recommended that users read the SQLAlchemy Unified Tutorial and note the warning below.

Below are a few simple examples of using SQLAlchemy expression language constructs for running queries on the OMOP CDM.

Warning

Tables from Inspector.tables are actually mapped to ORM objects. These are NOT the same as Table objects from the SQLAlchemy Core API, although they can be used in nearly identical fashion in SQL Expressions with the following caveat about accessing table columns:

In [31]: from sqlalchemy import alias

In [32]: p = inspector.tables['person']

In [33]: p_alias = alias(inspector.tables['person'], 'p_alias')

# p is an automapped ORM object with dot accessible columns
In [34]: p
Out[34]: sqlalchemy.ext.automap.person

In [35]: p.person_id
Out[35]: <sqlalchemy.orm.attributes.InstrumentedAttribute at 0x73c33a54c7c0>

# p_alias is an Alias object.
# Columns must be accessed using .c.column
In [36]: p_alias
Out[36]: <sqlalchemy.sql.selectable.Alias at 0x73c33c79bc50; p_alias>

In [37]: p_alias.c.person_id
Out[37]: Column('person_id', INTEGER(), table=<p_alias>, primary_key=True, nullable=False)

# and so this fails
In [38]: p_alias.person_id
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
Cell In[38], line 1
----> 1 p_alias.person_id

AttributeError: 'Alias' object has no attribute 'person_id'

Explanation: Using a portion of the SQLAlchemy ORM to infer table structure was a conscious design decision. Although it makes for a bit of confusion when constructing queries with SQL expressions users that work in an interactive development environment (iPython, Jupyter Notebooks, etc.) get the benefit of dot accessible column properties. In addition, automapping alleviates compatibility issues that would inevitably arise with hard-coded table structures on future versions of the OMOP CDM.

Select all of the conditions for person 1:

In [39]: from sqlalchemy import select, and_

In [40]: c = inspector.tables['concept']

In [41]: co = inspector.tables['condition_occurrence']

In [42]: person_id = 1

In [43]: statement = select(co.condition_start_date, co.condition_concept_id, c.concept_name).\
   ....:             where(and_(\
   ....:                 co.person_id == person_id,\
   ....:                 co.condition_concept_id == c.concept_id))
   ....: 

In [44]: print(statement)
SELECT main.condition_occurrence.condition_start_date, main.condition_occurrence.condition_concept_id, main.concept.concept_name 
FROM main.condition_occurrence, main.concept 
WHERE main.condition_occurrence.person_id = :person_id_1 AND main.condition_occurrence.condition_concept_id = main.concept.concept_id

In [45]: with inspector.connect() as con:
   ....:    results = con.execute(statement).as_pandas()
   ....: 

In [46]: results
Out[46]: 
   condition_start_date  ...                                       concept_name
0            2010-03-12  ...                                       Osteoporosis
1            2009-07-25  ...                                           Backache
2            2009-07-25  ...                                      Low back pain
3            2010-08-17  ...                                        Neck sprain
4            2010-11-05  ...                 Subchronic catatonic schizophrenia
5            2009-10-14  ...                                       Hypocalcemia
6            2010-03-12  ...                           Congestive heart failure
7            2010-11-05  ...                                      Schizophrenia
8            2010-03-12  ...               Antiallergenic drug adverse reaction
9            2010-04-01  ...                                   Bipolar disorder
10           2010-03-12  ...                          Pure hypercholesterolemia
11           2009-10-14  ...                                 Postoperative pain
12           2010-04-01  ...  Bipolar I disorder, single manic episode, in f...
13           2009-07-25  ...                                Menopausal syndrome
14           2009-07-25  ...                               Thoracic radiculitis
15           2010-03-12  ...                                 Retention of urine

[16 rows x 3 columns]

Count the number of inpatient and outpatient visits for each person broken down by visit type and sorted by person_id:

In [47]: from sqlalchemy import join, func

In [48]: vo = inspector.tables['visit_occurrence']

In [49]: j = join(vo, c, vo.visit_concept_id == c.concept_id)

In [50]: j2 = join(j, p, vo.person_id == p.person_id)

In [51]: visit_types = ['Inpatient Visit','Outpatient Visit']

In [52]: statement = select(p.person_id, func.count(vo.visit_occurrence_id).label('num_visits'), c.concept_name.label('visit_type')).\
   ....:             select_from(j2).\
   ....:             where(c.concept_name.in_(visit_types)).\
   ....:             group_by(p.person_id, c.concept_name).\
   ....:             order_by(p.person_id)
   ....: 

In [53]: with inspector.connect() as con:
   ....:    results = con.execute(statement).as_pandas()
   ....: 

In [54]: results
Out[54]: 
   person_id  num_visits        visit_type
0          1           1   Inpatient Visit
1          1           1  Outpatient Visit
2          2           4   Inpatient Visit
3          2           2  Outpatient Visit
4          3           1  Outpatient Visit
5          5           4  Outpatient Visit
6          7          18  Outpatient Visit
7          8          11  Outpatient Visit
8          9           2  Outpatient Visit

From Strings¶

You can execute unaltered SQL strings directly, but remember to always used parametrized code for shared/production projects.

Warning

Only use strings for rapid prototyping and in-house projects! Executing strings directly breaks backend compatibility and can potentially lead to SQL injection attacks!

Example:

In [55]: from sqlalchemy import text

In [56]: statement = text('select person_id from person')

In [57]: with inspector.connect() as con:
   ....:    results = con.execute(statement).as_pandas()
   ....: 

In [58]: results
Out[58]: 
   person_id
0          1
1          2
2          3
3          4
4          5
5          6
6          7
7          8
8          9
9         16

Usage¶

Connecting to a database¶

Inspecting a database¶

Accessing tables¶

Accessing table columns¶

Complete table descriptions¶

Running built-in queries¶

A basic example¶

Specifying how results are returned¶

Working directly with statements¶

Returning results as Pandas DataFrames¶

Creating custom queries¶

From SQLAlchemy SQL Expressions¶

From Strings¶

Table of Contents

This Page

Usage¶

Connecting to a database¶

Inspecting a database¶

Accessing tables¶

Accessing table columns¶

Complete table descriptions¶

Running built-in queries¶

A basic example¶

Specifying how results are returned¶

Working directly with statements¶

Returning results as Pandas DataFrames¶

Creating custom queries¶

From SQLAlchemy SQL Expressions¶

From Strings¶

Sharing custom queries as functions¶