scSemiProfiler.initial_setup.initsetup

scSemiProfiler.initial_setup.initsetup(name, bulk, logged=False, normed=True, geneselection=True, batch=4)[source]

Initial setup of the semi-profiling pipeline, including processing the bulk data, clustering for finding the initial representatives. Bulk data should be provided as an ‘h5ad’ file. Sample IDs should be stored in adata.obs[‘sample_ids’] and gene names should be stored in adata.var.index. If not using active learning for iterative representative selection, directly set the batch size to be the total number of representatives desired.

Parameters
  • name (str) – Project name.

  • bulk (str) – Path to bulk data as an h5ad file. Sample IDs should be stored in adata.obs[‘sample_ids’] and gene names should be stored in adata.var.index.

  • logged (bool) – Whether the data has been logged or not

  • normed (bool) – Whether the library size has been normalized or not

  • geneselection (typing.Union[bool, int]) – Either a boolean value indicating whether to perform gene selection using the bulk data or not, or a integer specifying the number of highly variable genes should be selected.

  • batch (int) – Representative selection batch size.

Return type

None

Returns

None

Example

>>> import scSemiProfiler
>>> name = 'runexample'
>>> bulk = 'example_data/bulkdata.h5ad'
>>> logged = False
>>> normed = True
>>> geneselection = False
>>> batch = 2
>>> scSemiProfiler.initsetup(name, bulk,logged,normed,geneselection,batch)