.. _usage: ===== Usage ===== There is a tiny SQLite database (1.4 MB) included with **inspectomop** to give first-time users a limited experimental playground and the ability to run code from the examples below. .. note:: Inspectomop does **NOT** contain EHR data from real patients. The data are entirely synthetic and come from the `SynPUF `__ dataset released by Centers for Medicaid and Medicare Services (CMS). Connecting to a database ======================== :py:class:`.Inspector` objects are in charge of interfacing with the backend database, extracting the available OMOP CDM tables, and performing queries. Inspectors require a single parameter, `connection_url`, for instantiation: .. ipython:: python import inspectomop as iomop connection_url = iomop.test.test_connection_url() inspector = iomop.Inspector(connection_url) `connection_url` is a database URL defined by `SQLAlchemy `__ that describes how to connect to your database. A database URL has three main components: a dialect, driver, and URL. The dialect indicates what type of backend DB you wish to connect to. You can use any supported by SQLAlchemy (MySql, SQLite, Postgres, etc.) out-of-the-box or a dialect written by a third party. See the full list `here `__. The driver indicates which python DBAPI library you wish to use to run your queries. The SQLAlchemy dialects often contain a default DBAPI, so this may or may not be necessary depending on your configuration. Finally, the URL indicates where to look for the database and includes options for supplying a username and password. :: 'dialect+driver://username:password@host:port/database' .. note:: See the SQLAlchemy docs on `engine configuration `__ for more details. Here is an example URL for MySQL: .. ipython:: python mysql_url = 'mysql://johnny:appleseed@localhost/omop' and one for SQLite: .. ipython:: python sql_url = 'sqlite:////abs/path/to/tiny_omop_test.sqlite3' As you can see SQLite URLs are slightly different. They include an extra '/' and thus will have '///' for relative paths and '////' for absolute paths. Inspecting a database ===================== Accessing tables ~~~~~~~~~~~~~~~~ The tables property of an :py:class:`.Inspector` contains a dictionary of associated OMOP tables that are accessible by table name. .. ipython:: python inspector.tables.keys() person = inspector.tables['person'] Accessing table columns ~~~~~~~~~~~~~~~~~~~~~~~ The columns in each table object are dot accessible and can be assigned to variables to construct query statements. .. ipython:: python from sqlalchemy import select person_id = person.person_id statement = select([person_id]) print(statement) Complete table descriptions ~~~~~~~~~~~~~~~~~~~~~~~~~~~ You can also get a description of all columns within a table, the data types, etc. .. ipython:: python inspector.table_info('person') Running built-in queries ======================== A basic example ~~~~~~~~~~~~~~~ There are a variety of built in queries available in the :ref:`queries` submodule. A typical query takes arguments for inputs (concept_ids, keywords, etc.), an :py:class:`.Inspector` to run the query against, and optionally a list of columns to subset from the default columns returned by the query. .. ipython:: python :okwarning: # retrieve concepts for a list of concept_ids from inspectomop.queries.general import concepts_for_concept_ids concept_ids = [2, 3, 4, 7, 8, 10, 46287342, 46271022] return_columns = ['concept_name', 'concept_id'] concepts_for_concept_ids(concept_ids, inspector, return_columns=return_columns).fetchall() .. note:: You can get a list of columns a query returns by looking at the `return_columns` parameter in the docstring for each query. Specifying how results are returned =================================== By default all queries return a :py:class:`.Results` object. Results objects behave like database cursors and have the expected methods such as .fetchone() and .fetchall() for fetching rows. .. note:: Results objects ultimately point back to the underlying DBAPI used for interacting with the DB (pymssql for SQL Server, sqlite3 for SQLite, etc). More or less these should follow the `pythong DB API spec `__ for cursor objects. Most of this is handled by SQLAlchemy. :py:class:`.Results` is a subclass of :py:class:`sqlalchemy.engine.ResultProxy` with additional methods for working with pandas. Fetching examples ~~~~~~~~~~~~~~~~~ .. ipython:: python results = concepts_for_concept_ids(concept_ids, inspector) #get the return column names results.keys() #get one row results.fetchone() #get many rows two_results = results.fetchmany(2) len(two_results) #iterating over rows for row in results: print(row[:2]) Results as pandas DataFrames ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Results objects also have two handy methods, .as_pandas() and .as_pandas_chunks(), for returning results as pandas DataFrames. .. ipython:: python #return the results as as a dataframe results = concepts_for_concept_ids(concept_ids, inspector).as_pandas() results[['concept_name','vocabulary_id']] # return the results in chunks chunksize = 3 results = concepts_for_concept_ids(concept_ids, inspector).as_pandas_chunks(chunksize) for num, chunk in enumerate(results): print('chunk {}'.format(num + 1)) print(chunk['concept_name']) Creating custom queries ======================= From SQLAlchemy SQL Expressions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Statements built out of constructs from SQLAlchemy's *SQL Expression API* make queries backend-neutral paving the way for sharable code that can be used in a plug-and-play fashion. While there is no guarantee that `every` query will work with `every` backend, most of the basic selects, joins, etc should run without issue. SQLAlchemy is extremely powerful, but like any software package, has a bit of a learning curve. It is highly recommended that users read the `SQL Expression Language Tutorial `__ and note the warning below. Below are a few simple examples of using SQLAlchemy expression language constructs for running queries on the OMOP CDM. .. warning:: Tables from Inspector.tables are actually mapped to ORM objects. These are *NOT* the same as `Table` objects from the SQLAlchemy Core API, although they can be used in nearly identical fashion in SQL Expressions with the following caveat about accessing table columns: .. ipython:: python :okexcept: from sqlalchemy import alias p = inspector.tables['person'] p_alias = alias(inspector.tables['person'], 'p_alias') # p is an automapped ORM object with dot accessible columns p p.person_id # p_alias is an Alias object. # Columns must be accessed using .c.column p_alias p_alias.c.person_id # and so this fails p_alias.person_id *Explanation:* Using a portion of the SQLAlchemy ORM to infer table structure was a conscious design decision. Although it makes for a bit of confusion when constructing queries with SQL expressions users that work in an interactive development environment (iPython, Jupyter Notebooks, etc.) get the benefit of dot accessible column properties. In addition, automapping alleviates compatibility issues that would inevitably arise with hard-coded table structures on future versions of the OMOP CDM. Select all of the conditions for person 1: .. ipython:: python from sqlalchemy import select, and_ c = inspector.tables['concept'] co = inspector.tables['condition_occurrence'] person_id = 1 statement = select([co.condition_start_date, co.condition_concept_id, c.concept_name]).\ where(and_(\ co.person_id == person_id,\ co.condition_concept_id == c.concept_id)) print(statement) inspector.execute(statement).as_pandas() Count the number of inpatient and outpatient visits for each person broken down by visit type and sorted by person_id: .. ipython:: python from sqlalchemy import join, func vo = inspector.tables['visit_occurrence'] j = join(vo, c, vo.visit_concept_id == c.concept_id) j2 = join(j, p, vo.person_id == p.person_id) visit_types = ['Inpatient Visit','Outpatient Visit'] statement = select([p.person_id, func.count(vo.visit_occurrence_id).label('num_visits'), c.concept_name.label('visit_type')]).\ select_from(j2).\ where(c.concept_name.in_(visit_types)).\ group_by(p.person_id, c.concept_name).\ order_by(p.person_id) inspector.execute(statement).as_pandas() From Strings ~~~~~~~~~~~~ You `can` execute unaltered SQL strings directly, but remember to always used parametrized code for shared/production projects. .. warning:: Only use strings for rapid prototyping and in-house projects! Executing strings directly breaks backend compatibility and can potentially lead to SQL injection attacks! Example: .. ipython:: python inspector.execute('select person_id from person').as_pandas() Sharing custom queries as functions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Custom queries that may prove useful to the OMOP CDM community can easily be shared by wrapping them in a function and following a standard recipe. `View the source code `__ on GitHub to get a better feel of how to construct queries and contribute (via pull request or posting your function in issues). In general, consider the following: * appropriately named query functions should begin with the data you intend to return and end with the data/parameters you expect as input. E.g. `concepts_for_concept_ids` * the return value for a query should `always` be a :py:class:`.Results` object. This provides consistency and gives the end-user control over how to process the results. * write a docstring following the `numpydoc docstring guide `__ to accompany your code. Prototype: .. code-block:: python def output_for_input(inputs, inspector, return_columns=None): """ Short description. Longer explanation. Parameters ---------- inputs : type description of inputs inspector : inspectomop.inspector.Inspector return_columns : list of str, optional - optional subset of columns to return from the query - columns : ['col_name_1', 'col_name_2'] Returns ------- results : inspectomop.results.Results a cursor-like object with methods such as fetchone(), fetchmany() etc. Notes ----- Optional """ columns = [] # specify return columns if return_columns: # filter based on end-user selection columns = [col for col in columns if col.name in return_columns] statement = select([columns]).where(inputs == criteria) return inspector.execute(statement)