Cell Visualization
In this tutorial, we will make visualization of the graph components that constitute cells in MPX data.
After completing this tutorial, you will be able to:
- Select components of interest: Based on colocalization scores or other metrics, choose cells with potential uropods or other features of interest.
- Visualize individual cell graphs: Convert the bipartite graph of a component to a layout using networkx or igraph. Color nodes based on pixel type (A or B) or other attributes.
- Visualize multiple components and markers: Calculate layouts, visualize marker abundance on each node (UPIA) using color scales.
- Visualize marker detection: Color nodes based on whether specific markers were detected or not.
Setup
- Python
- R
import igraph
import matplotlib.pyplot as plt
import numpy as np
from pathlib import Path
import pandas as pd
from pixelator import read, simple_aggregate
import seaborn as sns
import networkx as nx
library(pixelatorR)
library(SeuratObject)
library(dplyr)
library(tidyr)
library(stringr)
library(ggplot2)
library(here)
library(patchwork)
Load data
In this tutorial we will continue with the stimulated T cell data that we analyzed in the previous tutorial Spatial Analysis (Karlsson et al, bioRxiv 2023). Here we read the data files and apply the same filtering steps as in Spatial Analysis.
- Python
- R
DATA_DIR = Path.cwd().parents[3] / "data"
data_files = [
DATA_DIR / "Uropod_control.dataset.pxl",
DATA_DIR / "Uropod_CD54_fixed_RANTES_stimulated.dataset.pxl",
]
pg_data_combined = simple_aggregate(
["Control", "Stimulated"], [read(path) for path in data_files]
)
components_to_keep = pg_data_combined.adata[
(pg_data_combined.adata.obs["edges"] >= 8000)
& (pg_data_combined.adata.obs["tau_type"] == "normal")
].obs.index
pg_data_combined = pg_data_combined.filter(
components=components_to_keep
)
polarity_scores = pg_data_combined.polarization
colocalization_scores = pg_data_combined.colocalization
edgelist = pg_data_combined.edgelist
# The folders the data is located in:
data_files <-
c(Control = here("data/Uropod_control.dataset.pxl"),
Stimulated = here("data/Uropod_CD54_fixed_RANTES_stimulated.dataset.pxl"))
# Read .pxl files as a list of Seurat objects
pg_data <- lapply(data_files, ReadMPX_Seurat)
# Combine data to Seurat object
pg_data_combined <-
merge(pg_data[[1]],
y = pg_data[-1],
add.cell.ids = names(pg_data))
# Add sample column to meta data
pg_data_combined <-
AddMetaData(pg_data_combined,
metadata = str_remove(colnames(pg_data_combined), "_.*"),
col.name = "sample")
rm(pg_data)
gc()
used (Mb) gc trigger (Mb) max used (Mb)
Ncells 3675024 196.3 5565435 297.3 5565435 297.3
Vcells 58526350 446.6 192361003 1467.6 200309376 1528.3
# Filter cells
pg_data_combined <-
pg_data_combined %>%
subset(edges >= 8000) %>%
subset(tau_type == "normal")
# Fetch polarity scores from Seurat object
polarity_scores <- PolarizationScores(pg_data_combined, meta_data_columns = "sample")
# Fetch colocalization scores from Seurat object
colocalization_scores <- ColocalizationScores(pg_data_combined, meta_data_columns = "sample") %>%
# Remove self-correlation
filter(marker_1 != marker_2)
The data we are using constists of two samples; one control sample of unstimulated T cells, and one sample of T cells stimulated by CD54 immobilization and a chemokine to encourage cells to attain a migratory phenotype. In the previous tutorial, we found that many of the stimulated cells formed uropods, a bulge that helps the cell to migrate, through the increased colocalization of CD37, CD50, and CD162, along with the polarization of these and several other proteins.
- Python
- R
colocalization_df = colocalization_scores[
(colocalization_scores["marker_1"] == "CD162")
& (colocalization_scores["marker_2"] == "CD50")
]
colocalization_plot = (
sns.catplot(colocalization_df, kind="violin", x="sample", y="pearson_z")
.set_xlabels("Sample")
.set_ylabels("MPX colocalization score")
)
colocalization_plot.fig.subplots_adjust(top=0.9)
colocalization_plot.fig.suptitle("Colocalization between CD162 and CD50")
plt.show()
colocalization_scores %>%
filter(marker_1 == "CD162",
marker_2 == "CD50") %>%
ggplot(aes(sample, pearson_z, fill = sample)) +
geom_violin(draw_quantiles = 0.5) +
theme_minimal() +
labs(x = "Sample",
y = "MPX colocalization score",
title = "Colocalization between CD162 and CD50")
In this tutorial, we will try to select some components that have formed the uropod, as well as some that haven’t, and visualize these. Although many more cells seem to have formed uropods in the stimulated samples, it seems like there are also a few in the control sample. To get some examples, let’s select the cells with the highest colocalization of CD50 and CD162 from both the control and stimulated samples. Let’s also pick some cells with low colocalization between CD50 and CD162 to compare with. We will call the cells with high colocalization of CD50 and CD162 “Uropod”, while the ones with low colocalization score “No uropod”.
- Python
- R
plot_components = colocalization_df.groupby(["sample"]).apply(
lambda df: df.sort_values("pearson_z").pipe(
lambda df: df.iloc[[0, 1, len(df) - 2, len(df) - 1]]
)
)
plot_components["component_type"] = plot_components["pearson_z"].transform(
lambda x: "Uropod" if x > 10 else "No uropod"
)
plot_components["component_number"] = np.int64(
plot_components["component"].str.replace(r"RCVCMP(.*)_.*", r"\1", regex=True)
)
plot_components <-
colocalization_scores %>%
unite(contrast, marker_1, marker_2, sep = "/") %>%
filter(contrast == "CD162/CD50") %>%
arrange(-pearson_z) %>%
filter(pearson_z > 0) %>%
group_by(sample) %>%
# Keep the 2 with highest score per sample
filter(row_number() %in% 1:2 | rev(row_number()) %in% 1:2) %>%
select(sample, component, contrast, pearson_z) %>%
# Add a column of whether the cell is "uropod-like"
mutate(uropod_type = ifelse(pearson_z > 10,
"Uropod",
"No uropod"),
component_number = str_extract(component, "[0-9]+") %>% as.numeric())
plot_components
# A tibble: 8 × 6
# Groups: sample [2]
sample component contrast pearson_z uropod_type component_number
<chr> <chr> <chr> <dbl> <chr> <dbl>
1 Stimulated Stimulated_RCVCMP0… CD162/C… 28.7 Uropod 51
2 Stimulated Stimulated_RCVCMP0… CD162/C… 23.7 Uropod 87
3 Control Control_RCVCMP0000… CD162/C… 23.4 Uropod 318
4 Control Control_RCVCMP0000… CD162/C… 23.0 Uropod 2
5 Control Control_RCVCMP0000… CD162/C… 0.164 No uropod 776
6 Control Control_RCVCMP0000… CD162/C… 0.155 No uropod 21
7 Stimulated Stimulated_RCVCMP0… CD162/C… 0.0655 No uropod 909
8 Stimulated Stimulated_RCVCMP0… CD162/C… 0.0227 No uropod 331
Visualize single cell graph components in 2D
To start with, let’s visualize single component. Each graph component corresponds to a cell, and is represented by a bipartite graph where nodes are UPIAs and UPIBs. The edges linking a UPIA node and a UPIB node correspond to antibody molecules that have been detected in the intersection of a specific pair of one A pixel (UPIA) and one B pixel (UPIB). MPX graph data is well represented by force directed layouts, such as the Fruchterman Reingold (fr) and Kamada Kawai (kk) layouts as demonstrated here.
- Python
- R
# Here we demonstrate how to use the networkx plotting function to plot a precomputed
# graph layout. This is useful when you want to plot the entire graph, including
# both vertices and edges.
# For more information on how to plot networkx graphs directly see:
# https://networkx.org/documentation/stable/reference/drawing.html
example_graph = pg_data_combined.graph(
"RCVCMP0000318_Control", simplify=True, use_full_bipartite=True
)
raw_nx_graph = example_graph.raw
nx.draw(raw_nx_graph, node_size=30)
The function Plot2DGraph
can be used to draw components using the 2D
layouts. Before we can draw the component graphs, we need to run
LoadCellGraphs
to load the graphs we want to visualize and then run
ComputeLayout
to create the 2D layout.
When we call LoadCellGraphs
, we can decide exactly what components to
load (cells = ...
). ComputeLayout
will check what components have
been loaded and compute a layout for all of those components.
pg_data_combined <- pg_data_combined %>%
LoadCellGraphs(cells = "Control_RCVCMP0000318") %>%
ComputeLayout(layout_method = "fr") %>%
ComputeLayout(layout_method = "kk")
# Plot component with fr layout
Plot2DGraph(pg_data_combined, cells = "Control_RCVCMP0000318", layout_method = "fr")
# Plot component with kk layout
Plot2DGraph(pg_data_combined, cells = "Control_RCVCMP0000318", layout_method = "kk")
These graph visualizations showcase the structure of the graph of a cell. If we now add the nodes (UPIA and UPIB), we see how the edges in the bipartite graph always connects one UPIA to one UPIB.
- Python
- R
# Note that this is a simplified example, for many real life
# scenarios you will want to control this with a proper
# colors palette.
colors = {"red": "#eb4034", "blue": "#323ea8"}
node_colors = list(
map(
lambda x: colors["red"] if x == "A" else colors["blue"],
example_graph.vs.get_attribute("pixel_type"),
)
)
nx.draw(raw_nx_graph, node_color=node_colors, node_size=30)
Plot2DGraph
can also color the nodes by marker counts with the
marker
parameter. For bi-partite graphs, it is also possible to color
the nodes by their types: A or B.
Plot2DGraph(pg_data_combined, cells = "Control_RCVCMP0000318", layout_method = "fr",
marker = "node_type", show_Bnodes = TRUE)
Now, let’s calculate layouts for all components we picked. Here we only display the UPIA nodes.
- Python
- R
def generate_2d_layouts():
for component in plot_components["component"].unique():
graph = pg_data_combined.graph(component,
add_node_marker_counts=True,
simplify=True,
use_full_bipartite=True)
data = graph.layout_coordinates("fruchterman_reingold",
only_keep_a_pixels=True,
get_node_marker_matrix=True)
data["component"] = component
yield data
component_2d_layouts = pd.concat(generate_2d_layouts())
component_2d_layouts = component_2d_layouts.merge(
plot_components[["component", "sample", "component_type", "component_number"]],
on="component",
)
unique_sample_names = component_2d_layouts["sample"].unique()
number_of_unique_components = component_2d_layouts.groupby(["sample"])[
"component_number"
].nunique()[0]
# Build a grid that we can then add the individual plots to
fig, axes = plt.subplots(
len(unique_sample_names),
number_of_unique_components,
sharex=False,
sharey=False,
figsize=(12, 8),
)
for row, sample_name in enumerate(unique_sample_names):
for col, (col_name, df) in enumerate(
component_2d_layouts[component_2d_layouts["sample"] == sample_name].groupby(
["component_type", "component_number"]
)
):
p = sns.scatterplot(df, x="x", y="y", ax=axes[row, col], alpha=0.3)
removed_labels = p.set(
yticks=[],
xticks=[],
xlabel="",
ylabel="",
title=f"{sample_name} \n {col_name[0]} / {col_name[1]}",
)
sns.despine(fig=fig, top=True, right=True, left=True, bottom=True)
fig
pg_data_combined <- pg_data_combined %>%
LoadCellGraphs(cells = plot_components$component) %>%
ComputeLayout(layout_method = "fr", seed = 1)
Plot2DGraph(pg_data_combined, cells = plot_components$component, layout_method = "fr") +
plot_layout(ncol = 4) &
theme(plot.title = element_text(size = 8))
Now, let’s visualize the abundance of some markers within the
components. We will visualize the uropod markers CD37, CD50, and CD162,
as well as some background markers CD45, CD3E, and B2M. Here we simply
visualize the antibody count (log10(n + 1)
) by coloring each UPIA by
the number of antibody molecules detected, scaled from minimum to
maximum per marker.
- Python
- R
# We select the markers we are interested in plotting
plot_markers = ["CD50", "CD37", "CD162", "CD45", "CD3E", "B2M"]
nodes_and_markers = component_2d_layouts[
["sample", "component_type", "component_number", "x", "y"] + plot_markers
]
# We transform the marker counts using log1p and then we rescale them
# to always be between 0 and 1 by min/max-scaling per marker.
nodes_and_markers[plot_markers] = np.log1p(nodes_and_markers[plot_markers])
nodes_and_markers[plot_markers] = nodes_and_markers.groupby(
["sample", "component_number"]
)[plot_markers].transform(lambda df: df / df.max())
# We reformat the data to be in long format for easier plottning.
nodes_and_markers = pd.melt(
nodes_and_markers,
id_vars=["sample", "component_type", "component_number", "x", "y"],
value_vars=plot_markers,
)
nodes_and_markers["sample/type/component"] = nodes_and_markers["sample"].str.cat(
nodes_and_markers[["component_type", "component_number"]].astype(str), "/"
)
def facet_scatter(x, y, c, **kwargs):
kwargs.pop("color")
scatter_plot = plt.scatter(x, y, c=c, alpha=0.3, **kwargs)
scatter_plot.axes.get_xaxis().set_visible(False)
scatter_plot.axes.get_yaxis().set_visible(False)
# Sort the values to make sure that the high values end up on-top when overplottning
nodes_and_markers = nodes_and_markers.sort_values(["sample/type/component", "value"])
# Build a grid of the components and markers
cmap = sns.color_palette("magma", as_cmap=True)
grid = sns.FacetGrid(
nodes_and_markers, col="sample/type/component", row="variable", margin_titles=True
)
grid = grid.map(facet_scatter, "x", "y", "value", cmap=cmap)
grid = grid.despine(top=True, right=True, left=True, bottom=True)
grid = grid.set_titles(
col_template="{col_name}", row_template="{row_name}", fontweight="bold", size=12
)
# Draw the legend
cax = grid.fig.add_axes([1.02, 0.25, 0.02, 0.6])
points = plt.scatter([], [], c=[], cmap=cmap)
colorbar = grid.fig.colorbar(points, cax=cax)
grid
plot_markers <-
c("CD50", "CD37", "CD162", "CD45", "CD3E", "B2M")
plot_titles <- paste(plot_components$sample,
plot_components$uropod_type,
plot_components$component_number,
sep = "\n") %>%
setNames(nm = plot_components$component)
Plot2DGraphM(pg_data_combined,
layout_method = "fr",
cells = plot_components$component,
markers = plot_markers,
colors = viridis::inferno(n = 11),
titles = plot_titles) +
plot_layout(heights = c(1, rep(2, 6)))
We can also visualize the distribution of a combination of markers. Here, we simply color UPIAs by whether CD50 and/or CD162 are detected.
- Python
- R
# Build up a vectore with information about which of CD50 and CD162 were detected
component_2d_layouts["detected"] = "Not detected"
component_2d_layouts["detected"][
component_2d_layouts["CD50"].astype(bool)
] = "CD50 detected"
component_2d_layouts["detected"][
component_2d_layouts["CD162"].astype(bool)
] = "CD162 detected"
component_2d_layouts["detected"][
component_2d_layouts["CD50"].astype(bool)
& component_2d_layouts["CD162"].astype(bool)
] = "Both detected"
# Sort the values to make sure that the high values end up on-top when overplottning
component_2d_layouts["detected"] = pd.Categorical(
component_2d_layouts["detected"],
categories=["Not detected", "CD50 detected", "CD162 detected", "Both detected"],
ordered=True,
)
component_2d_layouts = component_2d_layouts.sort_values(
["sample", "component_number", "detected"]
)
# We use a custom palette here to make the detection status
# easier to see
palette = sns.color_palette(["#535657", "#0AA3B8", "#DB1048", "#F4B301"])
fig, axes = plt.subplots(
len(unique_sample_names),
number_of_unique_components,
sharex=False,
sharey=False,
figsize=(16, 8),
)
for row, sample_name in enumerate(unique_sample_names):
for col, (col_name, df) in enumerate(
component_2d_layouts[component_2d_layouts["sample"] == sample_name].groupby(
["component_type", "component_number"]
)
):
plot = sns.scatterplot(
df,
x="x",
y="y",
hue="detected",
ax=axes[row, col],
alpha=0.7,
palette=palette,
legend=True,
).set(
yticks=[],
xticks=[],
xlabel="",
ylabel="",
title=f"{sample_name} \n {col_name[0]} / {col_name[1]}",
)
# This a trick to give us a single legend for all the plots
# We pick one of the legends, save it and then remove all
# the individual legends.
# Finally we create an empty axes to place it in on the side.
handles, labels = fig.axes[0].get_legend_handles_labels()
for ax in fig.axes:
ax.get_legend().remove()
cax = fig.add_axes([0.15, 0.1, 0, 0.9])
axis_off = cax.axis("off")
cax.legend(
handles,
labels,
)
sns.despine(fig=fig, top=True, right=True, left=True, bottom=True)
fig
detection_palette <-
c("Not detected" = "gray20",
"CD162 detected" = "#0AA3B8",
"CD50 detected" = "#DB1048",
"Both detected" = "#F4B301")
# Fetch layout coordinates
plot_data <- lapply(plot_components$component, function(nm) {
cg <- CellGraphs(pg_data_combined)[[nm]]
counts <- CellGraphData(cg, slot = "counts")[, c("CD162", "CD50")] %>%
as.matrix() %>% as_tibble()
node_type <- CellGraphData(cg, slot = "cellgraph") %>%
pull(node_type)
CellGraphData(cg, slot = "layout")$fr %>%
mutate(component = nm) %>%
mutate(counts, node_type) %>%
filter(node_type == "A")
}) %>% do.call(bind_rows, .) %>%
left_join(plot_components, by = "component")
# Create categories depending on marker counts
plot_data_detection <-
plot_data %>%
mutate(category = case_when(CD50 == 0 &
CD162 == 0 ~ "Not detected",
CD50 == 0 ~ "CD162 detected",
CD162 == 0 ~ "CD50 detected",
TRUE ~ "Both detected")) %>%
arrange(CD50 + CD162)
plot_data_detection %>%
ggplot(aes(x, y, color = category)) +
geom_point(size = 0.5) +
scale_color_manual(values = detection_palette) +
coord_fixed() +
facet_grid( ~ uropod_type + sample + component_number) +
theme_void() +
ggtitle("Colocalization of CD50 and CD162")
In this tutorial we chose interesting cell components from our MPX data and created layouts depicting their spatial structure and marker abundance in 2D. While 2D visualizations give us valuable insights into the structure and composition of individual cell graphs, they lack the depth to represent the full complexity of the intricate patterns of proteins on the cell membrane.
If you are looking for that, jump to the advanced topics section to learn how to explore cell graphs in 3D in your favourite programming language.