Skip to content

Refactor inner join #53

@mchav

Description

@mchav

This was written in one shot so there is some repetition.

        leftIndicesToGroup = M.elems $ M.filterWithKey (\k _ -> k `elem` cs) (D.columnIndices left)
        leftRowRepresentations = VU.generate (fst (D.dimensions left)) (D.mkRowRep leftIndicesToGroup left)
        -- key -> [index0, index1]
        leftKeyCountsAndIndices   = VU.foldr (\(i, v) acc -> M.insertWith (++) v [i] acc) M.empty (VU.indexed leftRowRepresentations)
        -- key -> [index0, index1]
        rightIndicesToGroup = M.elems $ M.filterWithKey (\k _ -> k `elem` cs) (D.columnIndices right)
        rightRowRepresentations = VU.generate (fst (D.dimensions right)) (D.mkRowRep rightIndicesToGroup right)
        rightKeyCountsAndIndices  = VU.foldr (\(i, v) acc -> M.insertWith (++) v [i] acc) M.empty (VU.indexed rightRowRepresentations)
        -- key -> [(left_indexes0, right_indexes1)]
        mergedKeyCountsAndIndices = M.foldrWithKey (\k v m -> if k `M.member` rightKeyCountsAndIndices then M.insert k (VU.fromList v, VU.fromList (rightKeyCountsAndIndices M.! k)) m else m) M.empty leftKeyCountsAndIndices
        -- [(ints, ints)]
        leftAndRightIndicies = M.elems mergedKeyCountsAndIndices
        -- [(ints, ints)] (expanded to n * m)
        expandedIndices = map (\(l, r) -> (mconcat (replicate (VU.length r) l), mconcat (replicate (VU.length l) r))) leftAndRightIndicies
        expandedLeftIndicies = mconcat (map fst expandedIndices)
        expandedRightIndicies = mconcat (map snd expandedIndices)
        -- df
        expandedLeft = left { columns = VB.map (D.atIndicesStable expandedLeftIndicies) (D.columns left), dataframeDimensions = (VU.length expandedLeftIndicies, snd (D.dataframeDimensions left))}
        -- df 
        expandedRight = right { columns = VB.map (D.atIndicesStable expandedRightIndicies) (D.columns right), dataframeDimensions = (VU.length expandedRightIndicies, snd (D.dataframeDimensions right))}
        -- [string]
        leftColumns = D.columnNames left
        rightColumns = D.columnNames right

The comments are also not very informative.

This should be broken into functions and tested.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions