Skip to content

Query Guide

KG query guide

Other relevant docs

CL_KG user stories, schema and roadmap

OWL-2-NEO mapping

Access

Refer to the Access guide for instructions on accessing the Cell Type Knowledge Graph.

Useful MATCH clauses

Clause to find author annotated cell sets for a specific dataset:

MATCH (cc:Cell_cluster)-[:has_source]->(ds) 
WHERE ds.publication = ['https://doi.org/10.1038/s41586-021-03852-1']

Alternatively, one could use ds.title or ds.collection as a restriction, these take dataset titles and colleciton URLs for CxG respectively.

Clause to match cell sets to CL annotations:

MATCH p=(c:Class:Cell)<-[:composed_primarily_of]-(s1:Cell_cluster)

Using CL structure to find annotations with some specified cell type or its subclasses

MATCH p=(superclass:Cell)<-[:SUBCLASSOF*0..]-(c:Class:Cell)<-[:composed_primarily_of]-(s1:Cell_cluster)

Clause to finds subsets (subclusters) of cell sets

MATCH (s1)<-[:subcluster_of*..]-(s2)

Clause to find the proportion of cells in a cell set from specific tissue

MATCH (cc:Cell_cluster {label: 'Enterocyte' )-[t:tissue]->(anat)
RETURN t.percentage, anat.label, anat.short_form

Putting it all together

Query to find the proportion of cells by tissue on a specific annotation

This was motivated by finding an annotation:

Author_cell_type: 'Enterocyte'; cell_type: 'enterocyte of colon'

Question: Does the tissue origin justify annotating this as a cell type of the colon.

MATCH (cc:Cell_cluster)-[:has_source]->(ds { publication: ['https://doi.org/10.1038/s41586-021-03852-1']}) 
MATCH p=(c:Cell { label: 'enterocyte of colon'})<-[:composed_primarily_of]
-(cc:Cell_cluster{label: 'Enterocyte'})-[t:tissue]->(anat)
RETURN anat.label, t.percentage[0]
anat.label t.percentage[0]
ileum 58.96
duodenum 25.89
large intestine 1.21
jejunum 6.06
mesenteric lymph node 0.01
small intestine 6.96
colon 0.9

Conclusion: >97% if cells are from the small intestine so this annotation is incorrect.

For a specific dataset, find author annotations that are more granular than the CxG CL annotation

MATCH (s1:Cell_cluster)-[:has_source]->(ds) 
WHERE ds.publication = ['https://doi.org/10.1038/s41586-021-03852-1']
MATCH p=(c:Class:Cell)<-[:composed_primarily_of]-(s1)<-[:subcluster_of*..]-(s2)
RETURN p

For all datasets, find author annotations taht are more granular than the CxG CL annotation, and where the CL term is a leaf node

MATCH p=(c:Class:Cell)<-[:composed_primarily_of]-(s1)<-[:subcluster_of*..]-(s2) where not (c)<-[:SUBCLASSOF]-() return p

Find leaf node CL terms with nested cell sets underneath

Example results from Sikemma:

image

** Query to find General term used for specific class**

e.g. T-Cell here:

image

MATCH p=(c:Class:Cell)<-[:composed_primarily_of]-(s1:Cluster) where (c)<-[:SUBCLASSOF]-() and not (s1)<-[:subcluster_of]-() return p

image

c.label s1.label More specific term needed* CL term to use Notes
serous secreting cell SMG serous (nasal) Y SMG = submucosal gland. We have no nasal SMG serous term.
tracheobronchial goblet cell Goblet (subsegmental) ? Need to check if could be more precise
tracheobronchial serous cell SMG serous (bronchial) n serous secreting cell of bronchus submucosal gland
bronchial goblet cell Goblet (bronchial) n
CD4-positive, alpha-beta T cell CD4 T cells n
mucus secreting cell SMG mucous n mucus secreting cell of bronchus submucosal gland
ciliated columnar cell of tracheobronchial tree Multiciliated (non-nasal) n
dendritic cell Migratory DCs y
lung macrophage Interstitial Mph perivascular y "perivascular macrophage" is currently brain specific! We also have lung interstitial macrophage
plasmacytoid dendritic cell Plasmacytoid DCs n
smooth muscle cell SM activated stress response y?
epithelial cell of alveolus of lung AT0 Y
fibroblast Subpleural fibroblasts Y
tracheobronchial smooth muscle cell Smooth muscle N
epithelial cell of lower respiratory tract pre-TB secretory Y TB = tracheobronchial
CD8-positive, alpha-beta T cell CD8 T cells N
T cell T cells proliferating N
conventional dendritic cell DC1 N
brush cell of trachebronchial tree Tuft N

*quick assessment - may not be 100% acurate