[WIP] Add rewriter for multiple similar APPROX_PERCENTILEs in same clause#14
[WIP] Add rewriter for multiple similar APPROX_PERCENTILEs in same clause#14jnmugerwa wants to merge 1 commit intoprestodb:mainfrom
Conversation
kaikalur
left a comment
There was a problem hiding this comment.
So you should move lot of this logic to actual visit methods.
|
|
||
| public ApproxPercentilePatternMatcher() | ||
| { | ||
| selectMap = new HashMap<>(); |
There was a problem hiding this comment.
Look at the Presto coding style. We use ImmutableMap.Builder.
There was a problem hiding this comment.
We now return an immutable map.
Done
| private boolean isLiteral(AstNode node) | ||
| { | ||
| return node.getId() == JJTUNSIGNEDNUMERICLITERAL; | ||
| } |
There was a problem hiding this comment.
for now, just move this method into ASTNode. Or better yet have a util class - NodeUtil that has all these static functions. Also, name the function to reflect what it does - like isUnsignedNumericLiteral
There was a problem hiding this comment.
Left in this class for now -- will factor this and similar utility methods into a NodeUtil class in a NodeUtil-specific PR.
Done
|
|
||
| private void processSelectItem(AstNode node, Map<String, List<AstNode>> map) | ||
| { | ||
| AstNode functionCall = node.GetFirstChildOfKind(JJTFUNCTIONCALL); |
There was a problem hiding this comment.
So it will be good to generalize it to handle approx percentiles anywhere in the expression. Best to do this in visitFunctionCall
| return; | ||
| } | ||
| AstNode identifier = functionCall.GetFirstChildOfKind(JJTIDENTIFIER); | ||
| if (identifier == null || !identifier.GetImage().equalsIgnoreCase("APPROX_PERCENTILE")) { |
There was a problem hiding this comment.
Add a utilitiy method - isApproxPercentile and do this check there.
| return; | ||
| } | ||
| map.putIfAbsent(firstArg.GetImage(), new LinkedList<>()); | ||
| map.get(firstArg.GetImage()).add(node); |
There was a problem hiding this comment.
Instead of getImage, use Unparser.unparse(firstArg) so it will be general.
30dc75e to
a8c3d49
Compare
a8c3d49 to
4a46b4c
Compare
4a46b4c to
a1aaeea
Compare
f7444b2 to
92a67b6
Compare
eea9344 to
b4db47a
Compare
| public class ApproxPercentileRewriter | ||
| extends Rewriter | ||
| { | ||
| private static final int ZERO_INDEXING_OFFSET = 1; |
There was a problem hiding this comment.
APPROX_PERCENTILE is 1-indexed while the data structure we store our percentiles in is 0-indexing, making this offset necessary. I feel like this name is descriptive given that context?
| extends Rewriter | ||
| { | ||
| private static final int ZERO_INDEXING_OFFSET = 1; | ||
| private static final String REPLACEMENT_FORMAT = " APPROX_PERCENTILE(%s, ARRAY%s)[%d]"; |
| .anyMatch(key -> firstArgToApproxPercentileNode.get(key).size() >= 2); | ||
| } | ||
|
|
||
| private boolean canApplyApproxPercentileRewrite(AstNode node) |
There was a problem hiding this comment.
It should be more like: needsApproxPercentileRewrite
There was a problem hiding this comment.
Maybe needsRewrite and make it take more specific type: FunctionCall
| return firstArgToPercentiles.get(firstArg).indexOf(secondArg) + ZERO_INDEXING_OFFSET; | ||
| } | ||
|
|
||
| private static Optional<AstNode> getNthArgument(AstNode node, int n) |
There was a problem hiding this comment.
Will be moved to NodeUtils in a future PR.
| return false; | ||
| } | ||
| Optional<String> image = Optional.ofNullable(identifier.get().GetImage()); | ||
| return image.isPresent() && image.get().equalsIgnoreCase("APPROX_PERCENTILE"); |
There was a problem hiding this comment.
Use a static final constant for the name of the function.
| return getNthArgument(node, 1).filter(astNode -> astNode.getId() == JJTUNSIGNEDNUMERICLITERAL).isPresent(); | ||
| } | ||
|
|
||
| private static boolean isApproxPercentileNode(AstNode node) |
There was a problem hiding this comment.
Is it needed here? You should move it to the pattern matcher class.
There was a problem hiding this comment.
Will be moved to NodeUtils in a future PR.
| { | ||
| if (isApproxPercentileNode(node) && hasUnsignedLiteralSecondArg(node)) { | ||
| String firstArg = unparse(getNthArgument(node, 0).get()).trim(); | ||
| String secondArg = unparse(getNthArgument(node, 1).get()).trim(); |
There was a problem hiding this comment.
You don't need to unparser the second argument?
There was a problem hiding this comment.
Agreed -- removed unparsing when we deal with secondArg.
| "SELECT APPROX_PERCENTILE(x, ARRAY[0.1, 0.2, 0.3]) FROM (SELECT APPROX_PERCENTILE(x, 0.1) from T);" | ||
| }; | ||
|
|
||
| private static final ImmutableMap<String, String> STATEMENT_TO_REWRITTEN_STATEMENT = |
There was a problem hiding this comment.
You can simply have two arrays - one original sql and other one positionally matching rewritten sql. Or just array of pairs.
There was a problem hiding this comment.
Implemented two arrays.
Done.
| extends Rewriter | ||
| { | ||
| private static final int ZERO_INDEXING_OFFSET = 1; | ||
| private static final String REPLACEMENT_FORMAT = " APPROX_PERCENTILE(%s, ARRAY%s)[%d]"; |
| .anyMatch(key -> firstArgToApproxPercentileNode.get(key).size() >= 2); | ||
| } | ||
|
|
||
| private boolean canApplyApproxPercentileRewrite(AstNode node) |
There was a problem hiding this comment.
Maybe needsRewrite and make it take more specific type: FunctionCall
e251722 to
e8204c4
Compare
e8204c4 to
85ae620
Compare
Added a rewriter for the following anti-pattern:
-> Query has multiple APPROX_PERCENTILEs in the same clause
-> The first argument for each APPROX_PERCENTILE is identical
-> The second argument for each APPROX_PERCENTILE is a literal