# >>===================================================<< # ||+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|| # ||--------------------|B|I|F|-----------------------||| # |||B|u|r|r|o|w|s|-|W|h|e|e|l|e|r|-|T|r|a|n|s|f|o|r|m||| # ||+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+|| # >>===================================================<< # Template by Leo Ackermann (2025) from ks import simple_kark_sort # >>==========================================================================<< # >> Textbook FM-index class << # >>==========================================================================<< class TextbookFMindex: # >>===[ Class constructor ]================================================ def __init__(self, seq, verbose=False): ''' Create a new FMindex (textbook version) ''' # Fields declaration self.bwt = None self.sa = None self.countmap = None self.ranks = None self.next_smallest_letter = None # Fields initialization # TODO # >>===[ Pretty printer ]=================================================== def pretty_print(self): ''' Pretty printer for the FM index ''' # This function take a list ((name_i, array_i))_i as input, and print # them vertically, side by side, using name_i's as headers # NOTE. We assume here that {header}\t and {content}\t reach the same # column def print_arrays_vertically(entitled_arrays, tabsize=4): # Prettier printing of None def xstr(s): return 'ยท' if s is None else str(s) if entitled_arrays == []: raise Exception(" cannot be empty") header = "#\t|\t" separator = '-' * len(f"#\t".expandtabs(tabsize)) \ + '+' + '-' * (len(f"+\t".expandtabs(tabsize))-1) for (name, array) in entitled_arrays: header += f"{name}\t" separator += '-' * len(f"{name}\t".expandtabs(tabsize)) print(header) print(separator) len_arrays = (len(entitled_arrays[0][1])) for i in range(len_arrays): row = f"{i}\t|\t" for (name, array) in entitled_arrays: row += f"{xstr(array[i])}\t" print(row) print() print_arrays_vertically([]) print() # >>===[ Attribute initialisation functions ]=============================== def _compute_sa(self, seq): ''' Compute the suffix array of the sequence (with termination symbol), in linear time ''' pass # TODO def _compute_bwt(self, seq): ''' Compute the Burrows-Wheeler transform using the SA-based definition, in linear time ''' pass # TODO def _compute_ranks(self): ''' Compute the ranks arrays while scanning the Burrows-Wheeler transform ''' pass # TODO def _compute_countmap(self): ''' Compute the countmap array from the ranks arrays arrays ''' pass # TODO def _compute_next_smallest_letter(self): ''' Compute the map of -> +{None} that maps a letter to the next one in the lexicographic order, using countmap ''' pass # TODO # >>===[ LF-mapping ]======================================================= def _lf_mapping(self, i): ''' The LF-mapping function, with O(1) lookup time ''' pass # TODO # >>===[ Burrows-Wheeler transform inversion ]============================== def get_string(self): ''' Retrieve the string encoded within the BWT, in linear time ''' pass # TODO # >>===[ Pattern matching functions ]======================================= def _get_pattern_interval(self, pattern): ''' Retrieve the F-interval (i_min, i_max) where a pattern lies. If no such interval exists, returns (None, None). ''' pass # TODO def membership(self, pattern): ''' FM-index based membership pattern-matching ''' pass # TODO def count(self, pattern): ''' FM-index based count pattern-matching ''' pass # TODO def locate(self, pattern): ''' FM-index based locate pattern-matching ''' pass # TODO # >>==========================================================================<< # >> FM-index class << # >>==========================================================================<< class FMindex(TextbookFMindex): # >>===[ Class constructor ]================================================ def __init__(self, seq, verbose=False): ''' Create a new FMindex ''' # Inherit fields and initialisation from the Textbook FMindex class super().__init__(seq, verbose) # Extra fields declaration # TODO # Extra fields initialization # TODO # >>===[ Subsampling and on-the-fly retrieval functions ]=================== def _subsample_ranks(self): ''' Subsample the ranks arrays, setting most entries to ''' pass # TODO def _get_ranks_entry(self, letter, i): ''' Equivalent to TextbookFMindex's ''' pass # TODO def _subsample_sa(self): ''' Subsample the suffix array, setting most entries to ''' pass # TODO def _get_sa_entry(self, i): ''' Equivalent to TextbookFMindex's ''' pass # TODO # >>===[ OVERRIDE functions calling or ]============= # TODO