Welcome to InspectOMOP!¶
Date: Jun 01, 2020 Version: 0.1.5
inspecotmop is a database agnostic lightweight python 3 package that assists with abstracting data from electronic health record (EHR) databases that follow the OMOP common data model (CDM). The source is available on GitHub. Feel free to contribute or fork!
- OHDSI: Observational Health Data Sciences and Informatics
- OMOP: Observation Medical Outcomes Partnership
Background¶
Interoperability is an important goal in healthcare and medical informatics research. In an ideal world, one would be able to download the source code for a published project and only have to change a handful of lines of code to repeat an experiment, but this is far from the case. Repeating experiments in medical informatics, especially when utilizing EHR data requires spending an inordinate amount of time on ETL (Extraction, Transformation, and Loading). Why? Part of the reason is that health care data can be recorded using any number of medical vocabularies, ontologies, and data formats which prohibits direct communication and necessitates an intermediate step of data mapping and normalization. This can be achieved by adopting a common data model such as the OMOP CDM from the OHDSI group and serving the data using a relational database management systems (RDBMS), but creates a new interoperability problem. RDBMS typically use structured query language (SQL) as a fundamental mechanism for abstracting data and SQL itself is not universally portable across systems. This is the problem inspectomop addresses. Inspectomop utilizes SQLAlchemy to act as a universal translator of SQL dialects and makes sharing end-to-end informatics projects possible.

Interoperability problems and solutions: P1. Data cannot be directly shared between EHRs. S1. Adopt the OMOP CDM P2. SQL queries are not universally portable across RDBMs. S2. InspectOMOP, an SQL dialect agnostic python package.
Table of Contents¶
Installation¶
From Source¶
Download the inspectomop project from GitHub.
From the root directory of inspectomop run:
python setup.py install
Dependencies¶
- python: version 3.0 or higher
- sqlalchemy: version 1.2.1 or higher
- pandas: version 0.22.0 or higher
Usage¶
There is a tiny SQLite database (1.4 MB) included with inspectomop to give first-time users a limited experimental playground and the ability to run code from the examples below.
Note
Inspectomop does NOT contain EHR data from real patients. The data are entirely synthetic and come from the SynPUF dataset released by Centers for Medicaid and Medicare Services (CMS).
Connecting to a database¶
Inspector
objects are in charge of interfacing with the backend database, extracting the available OMOP CDM tables, and performing queries.
Inspectors require a single parameter, connection_url, for instantiation:
In [1]: import inspectomop as iomop
In [2]: connection_url = iomop.test.test_connection_url()
In [3]: inspector = iomop.Inspector(connection_url)
connection_url is a database URL defined by SQLAlchemy that describes how to connect to your database. A database URL has three main components: a dialect, driver, and URL. The dialect indicates what type of backend DB you wish to connect to. You can use any supported by SQLAlchemy (MySql, SQLite, Postgres, etc.) out-of-the-box or a dialect written by a third party. See the full list here. The driver indicates which python DBAPI library you wish to use to run your queries. The SQLAlchemy dialects often contain a default DBAPI, so this may or may not be necessary depending on your configuration. Finally, the URL indicates where to look for the database and includes options for supplying a username and password.
'dialect+driver://username:password@host:port/database'
Note
See the SQLAlchemy docs on engine configuration for more details.
Here is an example URL for MySQL:
In [4]: mysql_url = 'mysql://johnny:appleseed@localhost/omop'
and one for SQLite:
In [5]: sql_url = 'sqlite:////abs/path/to/tiny_omop_test.sqlite3'
As you can see SQLite URLs are slightly different. They include an extra ‘/’ and thus will have ‘///’ for relative paths and ‘////’ for absolute paths.
Inspecting a database¶
Accessing tables¶
The tables property of an Inspector
contains a dictionary of associated OMOP tables that are accessible by table name.
In [6]: inspector.tables.keys()
Out[6]: dict_keys(['attribute_definition', 'care_site', 'cdm_source', 'cohort', 'cohort_attribute', 'cohort_definition', 'concept', 'concept_ancestor', 'concept_class', 'concept_relationship', 'concept_synonym', 'condition_era', 'condition_occurrence', 'cost', 'death', 'device_exposure', 'domain', 'dose_era', 'drug_era', 'drug_exposure', 'drug_strength', 'fact_relationship', 'location', 'measurement', 'note', 'note_nlp', 'observation', 'observation_period', 'payer_plan_period', 'person', 'procedure_occurrence', 'provider', 'relationship', 'source_to_concept_map', 'specimen', 'visit_occurrence', 'vocabulary'])
In [7]: person = inspector.tables['person']
Accessing table columns¶
The columns in each table object are dot accessible and can be assigned to variables to construct query statements.
In [8]: from sqlalchemy import select
In [9]: person_id = person.person_id
In [10]: statement = select([person_id])
In [11]: print(statement)
SELECT main.person.person_id
FROM main.person
Complete table descriptions¶
You can also get a description of all columns within a table, the data types, etc.
In [12]: inspector.table_info('person')
Out[12]:
column type nullable primary_key
0 person_id INTEGER False True
1 gender_concept_id INTEGER False False
2 year_of_birth INTEGER False False
3 month_of_birth INTEGER True False
4 day_of_birth INTEGER True False
5 birth_datetime DATETIME True False
6 race_concept_id INTEGER False False
7 ethnicity_concept_id INTEGER False False
8 location_id INTEGER True False
9 provider_id INTEGER True False
10 care_site_id INTEGER True False
11 person_source_value TEXT True False
12 gender_source_value TEXT True False
13 gender_source_concept_id INTEGER True False
14 race_source_value TEXT True False
15 race_source_concept_id INTEGER True False
16 ethnicity_source_value TEXT True False
17 ethnicity_source_concept_id INTEGER True False
Running built-in queries¶
A basic example¶
There are a variety of built in queries available in the Queries submodule. A typical query takes arguments for inputs (concept_ids, keywords, etc.), an Inspector
to run the query against, and optionally a list of columns to subset from the default columns returned by the query.
# retrieve concepts for a list of concept_ids
In [13]: from inspectomop.queries.general import concepts_for_concept_ids
In [14]: concept_ids = [2, 3, 4, 7, 8, 10, 46287342, 46271022]
In [15]: return_columns = ['concept_name', 'concept_id']
In [16]: concepts_for_concept_ids(concept_ids, inspector, return_columns=return_columns).fetchall()
Out[16]:
[(2, 'Gender'),
(3, 'Race'),
(4, 'Ethnicity'),
(7, 'Metadata'),
(8, 'Visit'),
(10, 'Procedure'),
(46271022, 'Chronic kidney disease'),
(46287342, '2 ML Verapamil hydrochloride 2.5 MG/ML Injection')]
Note
You can get a list of columns a query returns by looking at the return_columns parameter in the docstring for each query.
Specifying how results are returned¶
By default all queries return a Results
object. Results objects behave like database cursors and have the expected methods such as .fetchone() and .fetchall() for fetching rows.
Note
Results objects ultimately point back to the underlying DBAPI used for interacting with the DB (pymssql for SQL Server, sqlite3 for SQLite, etc). More or less these should follow the pythong DB API spec for cursor objects. Most of this is handled by SQLAlchemy. Results
is a subclass of sqlalchemy.engine.ResultProxy
with additional methods for working with pandas.
Fetching examples¶
In [17]: results = concepts_for_concept_ids(concept_ids, inspector)
#get the return column names
In [18]: results.keys()
Out[18]:
['concept_id',
'concept_name',
'concept_code',
'concept_class_id',
'standard_concept',
'vocabulary_id',
'vocabulary_name']
#get one row
In [19]: results.fetchone()
Out[19]: (2, 'Gender', 'OMOP generated', 'Domain', '', 'Domain', 'OMOP Domain')
#get many rows
In [20]: two_results = results.fetchmany(2)
In [21]: len(two_results)
Out[21]: 2
#iterating over rows
In [22]: for row in results:
....: print(row[:2])
....:
(7, 'Metadata')
(8, 'Visit')
(10, 'Procedure')
(46271022, 'Chronic kidney disease')
(46287342, '2 ML Verapamil hydrochloride 2.5 MG/ML Injection')
Results as pandas DataFrames¶
Results objects also have two handy methods, .as_pandas() and .as_pandas_chunks(), for returning results as pandas DataFrames.
#return the results as as a dataframe
In [23]: results = concepts_for_concept_ids(concept_ids, inspector).as_pandas()
In [24]: results[['concept_name','vocabulary_id']]
Out[24]:
concept_name vocabulary_id
0 Gender Domain
1 Race Domain
2 Ethnicity Domain
3 Metadata Domain
4 Visit Domain
5 Procedure Domain
6 Chronic kidney disease SNOMED
7 2 ML Verapamil hydrochloride 2.5 MG/ML Injection RxNorm
# return the results in chunks
In [25]: chunksize = 3
In [26]: results = concepts_for_concept_ids(concept_ids, inspector).as_pandas_chunks(chunksize)
In [27]: for num, chunk in enumerate(results):
....: print('chunk {}'.format(num + 1))
....: print(chunk['concept_name'])
....:
chunk 1
0 Gender
1 Race
2 Ethnicity
Name: concept_name, dtype: object
chunk 2
0 Metadata
1 Visit
2 Procedure
Name: concept_name, dtype: object
chunk 3
0 Chronic kidney disease
1 2 ML Verapamil hydrochloride 2.5 MG/ML Injection
Name: concept_name, dtype: object
Creating custom queries¶
From SQLAlchemy SQL Expressions¶
Statements built out of constructs from SQLAlchemy’s SQL Expression API make queries backend-neutral paving the way for sharable code that can be used in a plug-and-play fashion. While there is no guarantee that every query will work with every backend, most of the basic selects, joins, etc should run without issue.
SQLAlchemy is extremely powerful, but like any software package, has a bit of a learning curve. It is highly recommended that users read the SQL Expression Language Tutorial and note the warning below.
Below are a few simple examples of using SQLAlchemy expression language constructs for running queries on the OMOP CDM.
Warning
Tables from Inspector.tables are actually mapped to ORM objects. These are NOT the same as Table objects from the SQLAlchemy Core API, although they can be used in nearly identical fashion in SQL Expressions with the following caveat about accessing table columns:
In [28]: from sqlalchemy import alias
In [29]: p = inspector.tables['person']
In [30]: p_alias = alias(inspector.tables['person'], 'p_alias')
# p is an automapped ORM object with dot accessible columns
In [31]: p
Out[31]: sqlalchemy.ext.automap.person
In [32]: p.person_id
Out[32]: <sqlalchemy.orm.attributes.InstrumentedAttribute at 0x7f10b49c5570>
# p_alias is an Alias object.
# Columns must be accessed using .c.column
In [33]: p_alias
Out[33]: <sqlalchemy.sql.selectable.Alias at 0x7f10b4a026d8; p_alias>
In [34]: p_alias.c.person_id
Out[34]: Column('person_id', INTEGER(), table=<p_alias>, primary_key=True, nullable=False)
# and so this fails
In [35]: p_alias.person_id
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-35-a8a43d4647a5> in <module>
----> 1 p_alias.person_id
AttributeError: 'Alias' object has no attribute 'person_id'
Explanation: Using a portion of the SQLAlchemy ORM to infer table structure was a conscious design decision. Although it makes for a bit of confusion when constructing queries with SQL expressions users that work in an interactive development environment (iPython, Jupyter Notebooks, etc.) get the benefit of dot accessible column properties. In addition, automapping alleviates compatibility issues that would inevitably arise with hard-coded table structures on future versions of the OMOP CDM.
Select all of the conditions for person 1:
In [36]: from sqlalchemy import select, and_
In [37]: c = inspector.tables['concept']
In [38]: co = inspector.tables['condition_occurrence']
In [39]: person_id = 1
In [40]: statement = select([co.condition_start_date, co.condition_concept_id, c.concept_name]).\
....: where(and_(\
....: co.person_id == person_id,\
....: co.condition_concept_id == c.concept_id))
....:
In [41]: print(statement)
SELECT main.condition_occurrence.condition_start_date, main.condition_occurrence.condition_concept_id, main.concept.concept_name
FROM main.condition_occurrence, main.concept
WHERE main.condition_occurrence.person_id = :person_id_1 AND main.condition_occurrence.condition_concept_id = main.concept.concept_id
In [42]: inspector.execute(statement).as_pandas()
Out[42]:
condition_start_date ... concept_name
0 2010-03-12 ... Osteoporosis
1 2009-07-25 ... Backache
2 2009-07-25 ... Low back pain
3 2010-08-17 ... Neck sprain
4 2010-11-05 ... Subchronic catatonic schizophrenia
5 2009-10-14 ... Hypocalcemia
6 2010-03-12 ... Congestive heart failure
7 2010-11-05 ... Schizophrenia
8 2010-03-12 ... Antiallergenic drug adverse reaction
9 2010-04-01 ... Bipolar disorder
10 2010-03-12 ... Pure hypercholesterolemia
11 2009-10-14 ... Postoperative pain
12 2010-04-01 ... Bipolar I disorder, single manic episode, in f...
13 2009-07-25 ... Menopausal syndrome
14 2009-07-25 ... Thoracic radiculitis
15 2010-03-12 ... Retention of urine
[16 rows x 3 columns]
Count the number of inpatient and outpatient visits for each person broken down by visit type and sorted by person_id:
In [43]: from sqlalchemy import join, func
In [44]: vo = inspector.tables['visit_occurrence']
In [45]: j = join(vo, c, vo.visit_concept_id == c.concept_id)
In [46]: j2 = join(j, p, vo.person_id == p.person_id)
In [47]: visit_types = ['Inpatient Visit','Outpatient Visit']
In [48]: statement = select([p.person_id, func.count(vo.visit_occurrence_id).label('num_visits'), c.concept_name.label('visit_type')]).\
....: select_from(j2).\
....: where(c.concept_name.in_(visit_types)).\
....: group_by(p.person_id, c.concept_name).\
....: order_by(p.person_id)
....:
In [49]: inspector.execute(statement).as_pandas()
Out[49]:
person_id num_visits visit_type
0 1 1 Inpatient Visit
1 1 1 Outpatient Visit
2 2 4 Inpatient Visit
3 2 2 Outpatient Visit
4 3 1 Outpatient Visit
5 5 4 Outpatient Visit
6 7 18 Outpatient Visit
7 8 11 Outpatient Visit
8 9 2 Outpatient Visit
From Strings¶
You can execute unaltered SQL strings directly, but remember to always used parametrized code for shared/production projects.
Warning
Only use strings for rapid prototyping and in-house projects! Executing strings directly breaks backend compatibility and can potentially lead to SQL injection attacks!
Example:
In [50]: inspector.execute('select person_id from person').as_pandas()
Out[50]:
person_id
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 16
Sharing custom queries as functions¶
Custom queries that may prove useful to the OMOP CDM community can easily be shared by wrapping them in a function and following a standard recipe. View the source code on GitHub to get a better feel of how to construct queries and contribute (via pull request or posting your function in issues).
In general, consider the following:
- appropriately named query functions should begin with the data you intend to return and end with the data/parameters you expect as input. E.g. concepts_for_concept_ids
- the return value for a query should always be a
Results
object. This provides consistency and gives the end-user control over how to process the results. - write a docstring following the numpydoc docstring guide to accompany your code.
Prototype:
def output_for_input(inputs, inspector, return_columns=None):
"""
Short description.
Longer explanation.
Parameters
----------
inputs : type
description of inputs
inspector : inspectomop.inspector.Inspector
return_columns : list of str, optional
- optional subset of columns to return from the query
- columns : ['col_name_1', 'col_name_2']
Returns
-------
results : inspectomop.results.Results
a cursor-like object with methods such as fetchone(), fetchmany() etc.
Notes
-----
Optional
"""
columns = [] # specify return columns
if return_columns: # filter based on end-user selection
columns = [col for col in columns if col.name in return_columns]
statement = select([columns]).where(inputs == criteria)
return inspector.execute(statement)
API Reference¶
Inspector¶
inspectomop.Inspector
Constructor¶
Inspector (connection_url) |
Creates an Inspector object which can be used to run OMOP data queries |
Attributes¶
Inspector.connection_url |
password@host:port/database’ used to specify the dialect, location, etc. |
Inspector.engine |
A convenience hook to the underlying sqlalchemy engine. |
Inspector.tables |
A dictionary containing all OMOP CDM tables in the connected database. |
Inspector.vocabularies_tables |
A dictionary containing all of the Vocabularies OMOP CDM tables in the connected database. |
Inspector.metadata_tables |
A dictionary containing all of the MetaData OMOP CDM tables in the connected database. |
Inspector.clinical_tables |
A dictionary containing all of the Clinical OMOP CDM tables in the connected database. |
Inspector.health_system_tables |
A dictionary containing all of the Health System OMOP CDM tables in the connected database. |
Inspector.health_economics_tables |
A dictionary containing all of the Health Economics OMOP CDM tables in the connected database. |
Inspector.derived_elements_tables |
A dictionary containing all of the Derived Elements OMOP CDM tables in the connected database. |
Methods¶
Inspector.attach_sqlite_db (db_file, schema_name) |
For SQLite backends, attaches an additional sqlite database file. |
Inspector.execute (statement) |
Executes an SQL query on the OMOP CDM. |
Inspector.table_info (table_name) |
Return a Pandas DataFrame describing the fields and properties of a table. |
Results¶
inspectomop.Results
Constructor¶
Results (results_proxy) |
A cursor-like object with methods such as fetchone, fetchmany etc. |
Warning
Although a public constructor exists, Results objects are meant to be instantiated indirectly from calls to inspector.execute()
Methods¶
Results.as_pandas () |
Return all rows from a results object as a pandas DataFrame |
Results.as_pandas_chunks (chunksize) |
Yields a pandas DataFrame with n_rows = chunksize |
Results.fetchone () |
Fetch one row, just like DB-API cursor.fetchone() . |
Results.fetchmany ([size]) |
Fetch many rows, just like DB-API cursor.fetchmany(size=cursor.arraysize) . |
Results.fetchall () |
Fetch all rows, just like DB-API cursor.fetchall() . |
Queries¶
inspectomop.queries
Care Site¶
inspectomop.queries.care_site
facility_counts_by_type (inspector[, …]) |
Returns facility counts by type in the OMOP CDM i.e. |
patient_counts_by_care_site_type (inspector) |
Returns pateints counts by facility type. |
Condition¶
inspectomop.queries.condition
anatomical_site_by_keyword (keyword, inspector) |
Retrieves anitomical site concepts given a keyword. |
condition_concept_for_concept_id (concept_id, …) |
Retrieves the condition concept for a condition_concept_id. |
condition_concepts_for_keyword (keyword, …) |
Retrieves standard concepts for a condition/keyword. |
condition_concepts_for_source_codes (…[, …]) |
Retrieves standard condition concepts for source codes. |
condition_concepts_occurring_at_anatomical_site_concept_id (…) |
Retrieves condition concepts that occur at a given anatomical site. |
conditions_caused_by_pathogen_or_causative_agent_concept_id (…) |
Retreives all conditions caused by a pathogen or other causative agent concept_id. |
disease_causing_agents_for_keyword (keyword, …) |
Retrieves disease causing agents by keyword. |
source_codes_for_concept_ids (concept_ids, …) |
Retreives source condition concepts for OMOP concept_ids. |
pathogen_concept_for_keyword (keyword, inspector) |
Retrieves pathogen concepts based on a keyword with ‘Organsim’ as the concept_class_id. |
place_of_service_counts_for_condition_concept_id (…) |
Provides counts of conditions stratified by place_of_service (Office, Inpatient Hospital, etc.) |
Drug¶
inspectomop.queries.drug
drug_classes_for_drug_concept_id (concept_id, …) |
Returns drug classes for drug or ingredient concept_ids. |
drug_concepts_for_ingredient_concept_id (…) |
Get all drugs that contain a given ingredient. |
indications_for_drug_concept_id (concept_id, …) |
Find all indications for a drug given a concept_id. |
ingredients_for_drug_concept_ids (…[, …]) |
Get ingredients for brand or generic drug concept_ids. |
ingredient_concept_ids_for_ingredient_names (…) |
Get concept_ids for a list of ingredients. |
General¶
inspectomop.queries.general
ancestors_for_concept_id (concept_id, inspector) |
Find all ancestor concepts for a concept_id. |
children_for_concept_id (concept_id, inspector) |
Find all child concepts for a concept_id. |
concepts_for_concept_ids (concept_ids, inspector) |
Returns concept information for a list of concept_ids |
descendants_for_concept_id (concept_id, inspector) |
Find all descendant concepts for a concept_id. |
parents_for_concept_id (concept_id, inspector) |
Find all parent concepts for a concept_id. |
related_concepts_for_concept_id (concept_id, …) |
Find all concepts related to a concept_id. |
siblings_for_concept_id (concept_id, inspector) |
Find all sibling concepts for a concept_id i.e.(concepts that share common parents). |
synonyms_for_concept_ids (concept_ids, inspector) |
Returns concept information for a list of concept_ids |
standard_vocab_for_source_code (source_code, …) |
Convert source code to all mapped standard vocabulary concepts. |
Observation¶
inspectomop.queries.observation
observation_concepts_for_keyword (keyword, …) |
Search for LOINC and UCUM concepts by keyword. |
Payer Plan¶
inspectomop.queries.payer_plan
counts_by_years_of_coverage (inspector) |
Returns counts of payer coverage based on continuous coverage (payer_plan_period_start_date - payer_plan_period_end_date)365.25. |
patient_distribution_by_plan_type (inspector) |
Returns counts of payer coverage by plan type. |
Person¶
inspectomop.queries.person
patient_counts_by_gender (inspector[, …]) |
Returns patient counts grouped by gender for the database or alternativily, for a supplied list of person_ids. |
patient_counts_by_year_of_birth (inspector[, …]) |
Returns patient counts grouped by year of birth for the database or alternativily, for a supplied list of person_ids. |
patient_counts_by_residence_state (inspector) |
Returns patient counts grouped by state for the database or alternativily, for a supplied list of person_ids. |
patient_counts_by_zip_code (inspector[, …]) |
Returns patient counts grouped by zip code for the database or alternativily, for a supplied list of person_ids. |
patient_counts_by_year_of_birth_and_gender (…) |
Returns patient counts stratified by year of birth and gender for the database or alternativily, for a supplied list of person_ids. |
Indices and tables¶
Acknowledgements¶
- This package builds off of amazing open source packages from the python community. Many thanks to the developers and maintainers of pandas, SQLAlchemy, and numpy.
- A big thank you to the OHDSI group for their continued efforts on improving the OMOP CDM and developing outstanding tools to advance the field of medical informatics.
- Most of the queries included in inspectomop were derived from the OMOP-Queries repository on GitHub.
- Test data were taken from a 1k sample of patients from the SynPUF dataset converted by to the OMOP CDM and provided by LTS Computing LLC