Skip to main content

Cell Visualization

In this tutorial, we will make visualization of the graph components that constitute cells in MPX data.

After completing this tutorial, you will be able to:

  • Select components of interest: Based on colocalization scores or other metrics, choose cells with potential uropods or other features of interest.
  • Visualize individual cell graphs: Convert the bipartite graph of a component to a layout using networkx or igraph. Color nodes based on pixel type (A or B) or other attributes.
  • Visualize multiple components and markers: Calculate layouts, visualize marker abundance on each node (UPIA) using color scales.
  • Visualize marker detection: Color nodes based on whether specific markers were detected or not.

Setup

import igraph
import matplotlib.pyplot as plt
import numpy as np
from pathlib import Path
import pandas as pd
from pixelator import read, simple_aggregate
import seaborn as sns
import networkx as nx

Load data

In this tutorial we will continue with the stimulated T cell data that we analyzed in the previous tutorial Spatial Analysis (Karlsson et al, bioRxiv 2023). Here we read the data files and apply the same filtering steps as in Spatial Analysis.

DATA_DIR = Path.cwd().parents[3] / "data"
data_files = [
DATA_DIR / "Uropod_control.dataset.pxl",
DATA_DIR / "Uropod_CD54_fixed_RANTES_stimulated.dataset.pxl",
]

pg_data_combined = simple_aggregate(
["Control", "Stimulated"], [read(path) for path in data_files]
)


components_to_keep = pg_data_combined.adata[
(pg_data_combined.adata.obs["edges"] >= 8000)
& (pg_data_combined.adata.obs["tau_type"] == "normal")
].obs.index

pg_data_combined = pg_data_combined.filter(
components=components_to_keep
)

polarity_scores = pg_data_combined.polarization
colocalization_scores = pg_data_combined.colocalization
edgelist = pg_data_combined.edgelist

The data we are using constists of two samples; one control sample of unstimulated T cells, and one sample of T cells stimulated by CD54 immobilization and a chemokine to encourage cells to attain a migratory phenotype. In the previous tutorial, we found that many of the stimulated cells formed uropods, a bulge that helps the cell to migrate, through the increased colocalization of CD37, CD50, and CD162, along with the polarization of these and several other proteins.

colocalization_df = colocalization_scores[
(colocalization_scores["marker_1"] == "CD162")
& (colocalization_scores["marker_2"] == "CD50")
]


colocalization_plot = (
sns.catplot(colocalization_df, kind="violin", x="sample", y="pearson_z")
.set_xlabels("Sample")
.set_ylabels("MPX colocalization score")
)


colocalization_plot.fig.subplots_adjust(top=0.9)
colocalization_plot.fig.suptitle("Colocalization between CD162 and CD50")
plt.show()

In this tutorial, we will try to select some components that have formed the uropod, as well as some that haven’t, and visualize these. Although many more cells seem to have formed uropods in the stimulated samples, it seems like there are also a few in the control sample. To get some examples, let’s select the cells with the highest colocalization of CD50 and CD162 from both the control and stimulated samples. Let’s also pick some cells with low colocalization between CD50 and CD162 to compare with. We will call the cells with high colocalization of CD50 and CD162 “Uropod”, while the ones with low colocalization score “No uropod”.

plot_components = colocalization_df.groupby(["sample"]).apply(
lambda df: df.sort_values("pearson_z").pipe(
lambda df: df.iloc[[0, 1, len(df) - 2, len(df) - 1]]
)
)
plot_components["component_type"] = plot_components["pearson_z"].transform(
lambda x: "Uropod" if x > 10 else "No uropod"
)

plot_components["component_number"] = np.int64(
plot_components["component"].str.replace(r"RCVCMP(.*)_.*", r"\1", regex=True)
)

Visualize single cell graph components in 2D

To start with, let’s visualize single component. Each graph component corresponds to a cell, and is represented by a bipartite graph where nodes are UPIAs and UPIBs. The edges linking a UPIA node and a UPIB node correspond to antibody molecules that have been detected in the intersection of a specific pair of one A pixel (UPIA) and one B pixel (UPIB). MPX graph data is well represented by force directed layouts, such as the Fruchterman Reingold (fr) and Kamada Kawai (kk) layouts as demonstrated here.

# Here we demonstrate how to use the networkx plotting function to plot a precomputed
# graph layout. This is useful when you want to plot the entire graph, including
# both vertices and edges.
# For more information on how to plot networkx graphs directly see:
# https://networkx.org/documentation/stable/reference/drawing.html

example_graph = pg_data_combined.graph(
"RCVCMP0000318_Control", simplify=True, use_full_bipartite=True
)
raw_nx_graph = example_graph.raw
nx.draw(raw_nx_graph, node_size=30)

These graph visualizations showcase the structure of the graph of a cell. If we now add the nodes (UPIA and UPIB), we see how the edges in the bipartite graph always connects one UPIA to one UPIB.

# Note that this is a simplified example, for many real life
# scenarios you will want to control this with a proper
# colors palette.
colors = {"red": "#eb4034", "blue": "#323ea8"}
node_colors = list(
map(
lambda x: colors["red"] if x == "A" else colors["blue"],
example_graph.vs.get_attribute("pixel_type"),
)
)
nx.draw(raw_nx_graph, node_color=node_colors, node_size=30)

Now, let’s calculate layouts for all components we picked. Here we only display the UPIA nodes.

def generate_2d_layouts():
for component in plot_components["component"].unique():
graph = pg_data_combined.graph(component,
add_node_marker_counts=True,
simplify=True,
use_full_bipartite=True)
data = graph.layout_coordinates("fruchterman_reingold",
only_keep_a_pixels=True,
get_node_marker_matrix=True)
data["component"] = component
yield data

component_2d_layouts = pd.concat(generate_2d_layouts())
component_2d_layouts = component_2d_layouts.merge(
plot_components[["component", "sample", "component_type", "component_number"]],
on="component",
)
unique_sample_names = component_2d_layouts["sample"].unique()
number_of_unique_components = component_2d_layouts.groupby(["sample"])[
"component_number"
].nunique()[0]

# Build a grid that we can then add the individual plots to
fig, axes = plt.subplots(
len(unique_sample_names),
number_of_unique_components,
sharex=False,
sharey=False,
figsize=(12, 8),
)
for row, sample_name in enumerate(unique_sample_names):
for col, (col_name, df) in enumerate(
component_2d_layouts[component_2d_layouts["sample"] == sample_name].groupby(
["component_type", "component_number"]
)
):
p = sns.scatterplot(df, x="x", y="y", ax=axes[row, col], alpha=0.3)
removed_labels = p.set(
yticks=[],
xticks=[],
xlabel="",
ylabel="",
title=f"{sample_name} \n {col_name[0]} / {col_name[1]}",
)
sns.despine(fig=fig, top=True, right=True, left=True, bottom=True)

fig

Now, let’s visualize the abundance of some markers within the components. We will visualize the uropod markers CD37, CD50, and CD162, as well as some background markers CD45, CD3E, and B2M. Here we simply visualize the antibody count (log10(n + 1)) by coloring each UPIA by the number of antibody molecules detected, scaled from minimum to maximum per marker.

# We select the markers we are interested in plotting
plot_markers = ["CD50", "CD37", "CD162", "CD45", "CD3E", "B2M"]
nodes_and_markers = component_2d_layouts[
["sample", "component_type", "component_number", "x", "y"] + plot_markers
]

# We transform the marker counts using log1p and then we rescale them
# to always be between 0 and 1 by min/max-scaling per marker.
nodes_and_markers[plot_markers] = np.log1p(nodes_and_markers[plot_markers])
nodes_and_markers[plot_markers] = nodes_and_markers.groupby(
["sample", "component_number"]
)[plot_markers].transform(lambda df: df / df.max())

# We reformat the data to be in long format for easier plottning.
nodes_and_markers = pd.melt(
nodes_and_markers,
id_vars=["sample", "component_type", "component_number", "x", "y"],
value_vars=plot_markers,
)
nodes_and_markers["sample/type/component"] = nodes_and_markers["sample"].str.cat(
nodes_and_markers[["component_type", "component_number"]].astype(str), "/"
)

def facet_scatter(x, y, c, **kwargs):
kwargs.pop("color")
scatter_plot = plt.scatter(x, y, c=c, alpha=0.3, **kwargs)
scatter_plot.axes.get_xaxis().set_visible(False)
scatter_plot.axes.get_yaxis().set_visible(False)


# Sort the values to make sure that the high values end up on-top when overplottning
nodes_and_markers = nodes_and_markers.sort_values(["sample/type/component", "value"])

# Build a grid of the components and markers
cmap = sns.color_palette("magma", as_cmap=True)
grid = sns.FacetGrid(
nodes_and_markers, col="sample/type/component", row="variable", margin_titles=True
)
grid = grid.map(facet_scatter, "x", "y", "value", cmap=cmap)
grid = grid.despine(top=True, right=True, left=True, bottom=True)
grid = grid.set_titles(
col_template="{col_name}", row_template="{row_name}", fontweight="bold", size=12
)

# Draw the legend
cax = grid.fig.add_axes([1.02, 0.25, 0.02, 0.6])
points = plt.scatter([], [], c=[], cmap=cmap)
colorbar = grid.fig.colorbar(points, cax=cax)

grid

We can also visualize the distribution of a combination of markers. Here, we simply color UPIAs by whether CD50 and/or CD162 are detected.

# Build up a vectore with information about which of CD50 and CD162 were detected
component_2d_layouts["detected"] = "Not detected"
component_2d_layouts["detected"][
component_2d_layouts["CD50"].astype(bool)
] = "CD50 detected"
component_2d_layouts["detected"][
component_2d_layouts["CD162"].astype(bool)
] = "CD162 detected"
component_2d_layouts["detected"][
component_2d_layouts["CD50"].astype(bool)
& component_2d_layouts["CD162"].astype(bool)
] = "Both detected"

# Sort the values to make sure that the high values end up on-top when overplottning
component_2d_layouts["detected"] = pd.Categorical(
component_2d_layouts["detected"],
categories=["Not detected", "CD50 detected", "CD162 detected", "Both detected"],
ordered=True,
)
component_2d_layouts = component_2d_layouts.sort_values(
["sample", "component_number", "detected"]
)
# We use a custom palette here to make the detection status 
# easier to see
palette = sns.color_palette(["#535657", "#0AA3B8", "#DB1048", "#F4B301"])

fig, axes = plt.subplots(
len(unique_sample_names),
number_of_unique_components,
sharex=False,
sharey=False,
figsize=(16, 8),
)
for row, sample_name in enumerate(unique_sample_names):
for col, (col_name, df) in enumerate(
component_2d_layouts[component_2d_layouts["sample"] == sample_name].groupby(
["component_type", "component_number"]
)
):
plot = sns.scatterplot(
df,
x="x",
y="y",
hue="detected",
ax=axes[row, col],
alpha=0.7,
palette=palette,
legend=True,
).set(
yticks=[],
xticks=[],
xlabel="",
ylabel="",
title=f"{sample_name} \n {col_name[0]} / {col_name[1]}",
)

# This a trick to give us a single legend for all the plots
# We pick one of the legends, save it and then remove all
# the individual legends.
# Finally we create an empty axes to place it in on the side.
handles, labels = fig.axes[0].get_legend_handles_labels()
for ax in fig.axes:
ax.get_legend().remove()

cax = fig.add_axes([0.15, 0.1, 0, 0.9])
axis_off = cax.axis("off")
cax.legend(
handles,
labels,
)
sns.despine(fig=fig, top=True, right=True, left=True, bottom=True)

fig

In this tutorial we chose interesting cell components from our MPX data and created layouts depicting their spatial structure and marker abundance in 2D. While 2D visualizations give us valuable insights into the structure and composition of individual cell graphs, they lack the depth to represent the full complexity of the intricate patterns of proteins on the cell membrane.

If you are looking for that, jump to the advanced topics section to learn how to explore cell graphs in 3D in your favourite programming language.