We have started developing a model Domain Specific Language (DSL) that can be
used to solve many of the same problems as model
generators, while still keeping model information in
.pysa files. The DSL aims to provide a compact way to generate models for all
code that matches a given query. This allows users to avoid writing hundereds or
thousand of models.
The most basic form of querying Pysa's DSL is by generating models based on function names. To
do so, add a
ModelQuery to your
Things to note in this example:
findclause lets you pick whether you want to model functions, methods or attributes.
whereclause is how you refine your criteria for when a model should be generated - in this example, we're filtering for functions where the name matches
modelclause is a list of models to generate. Here, the syntax means that the functions matching the where clause should be modelled as returning
When invoking Pysa, if you add the
--dump-model-query-results flag to your invocation, the generated models will be written to a file in JSON format.
You can then view this file to see the generated models.
find clause specifies what entities to model, and currently supports
"functions" indicates that you're querying for free functions,
"methods" indicates that you're only querying class methods, and
"attributes" indicates that you're querying for attributes on classes.
"attributes" also includes constructor-initialized attributes, such as
C.y in the following case:
Where clauses are a list of predicates, all of which must match for an entity to be modelled. Note that certain predicates are only compatible with specific find clause kinds.
The most basic query predicate is a name match - the name you're searching for is compiled as a regex, and the entity's fully qualified name is compared against it. A fully qualified name includes the module and class - for example, for a method
foo in class
C which is part of module
bar, the fully qualified name is
This clause will match when the entity's fully qualified name is exactly the same as the specified string.
Model queries allow for querying based on the return annotation of a function. Pysa currently only allows querying whether a function type is
Model queries allow matching callables where any parameter matches a given clause. For now, the only clauses we support for parameters is type- based ones.
This model query will taint all functions which have one parameter with type
There are cases when we want to model entities which match any of a set of clauses. The
AnyOf clause represents exactly this case.
Decorator clauses are used to find callables decorated with decorators that match a pattern. The syntax for using this clause is
Decorator(<name clause>, [<arguments clause>]).
The first argument to
Decorator should be a name clause, which is used to match the name of a decorator. The supported name clauses are the same as the ones discussed above for model query constraints, i.e.
name.matches("pattern"), which will match when the decorator matches the regex pattern specified as a string, and
name.equals("foo.bar.d1") which will match when the fully-qualified name of the decorator equals the specified string exactly.
For example, if you wanted to find all functions which are decorated by
@app.route(), a decorator imported from
my_module, you can write:
The second argument to
Decorator is an optional arguments clause, which is used to match on the arguments provided to the decorator. The supported arguments clauses are
arguments.contains(...), which will match when the arguments specified are a subset of the decorator's arguments, and
arguments.equals(...), which will match when the decorator has the specified arguments exactly.
arguments.contains() supports both positional and keyword arguments. For positional arguments, the list of positonal arguments supplied to the
arguments.contains() clause must be a prefix of the list of positional arguments on the actual decorator, i.e. the value of the argument at each position should be the same. For example, with the following Python code:
This query will match both
match2(), but not
nomatch(), since the values of the positional arguments don't match up.
For keyword arguments in
arguments.contains(), the specified keyword arguments must be a subset of the decorator's keyword arguments, but can be specified in any order. For example, with the following Python code:
This query will match both
arguments.equals() operates similarly, but will only match if the specified arguments match the decorator's arguments exactly. This means that for positional arguments, all arguments in each position must match by value exactly. Keyword arguments can be specified in a different order, but the set of specified keyword arguments and the set of the decorator's actual keyword arguments must be the same. For example, with the following Python code:
This query will match both
match2(), but not
You may use the
parent clause to specify predicates on the parent class. This predicate can only be used when the find clause specifies methods or attributes.
parent.equals clause is used to model entities when the parent's fully qualified name is an exact match for the specified string.
parent.matches clause is used to model entities when the parent's fully qualified name matches the provided regex.
parent.extends clause is used to model entities when the parent's class is a subclass of the provided class name.
The default behavior is that it will only match if the parent class is an instance of, or a direct subclass of the specified class. For example, with classes:
the above query will only model the attributes
C is considered to extend itself, and
D is a direct subclass of
C. However, it will not model
E is a sub-subclass of
If you would like to model a class and all subclasses transitively, you can use the
is_transitive flag to get this behavior.
This query will model
Not clause negates any existing clause that is valid for the entity being modelled.
The last bit of model queries is actually generating models for all entities that match the provided where clauses. For callables, we support generating models for parameters by name or position, as well as generating models for all paramaters. Additionally, we support generating models for the return annotation.
Returned taint takes the form of
TaintSpecification is either a taint annotation or a list of taint annotations.
Parameters can be tainted using the
Parameters() clause. By default, all parameters will be tained with the supplied taint specification. If you would like to only taint specific parameters matching certain conditions, an optional
where clause can be specified to accomplish this, allowing for constraints on parameter names, the annotation type of the parameter, or parameter position. For example:
To specify a constraint on parameter name, the
name.matches() clauses can be used. As in the main
where clause of the model query,
equals() searches for an exact match on the specified string, while
matches() allows a regex to be supplied as a pattern to match against.
To specify a constraint on parameter position, the
index.equals() clause can be used. It takes a single integer denoting the position of the parameter.
This clause is used to specify a constraint on parameter type annotation. Currently the clauses supported are:
type_annotation.equals(), which takes the fully-qualified name of a Python type or class and matches when there is an exact match,
type_annotation.matches(), which takes a regex pattern to match type annotations against, and
type_annotation.is_annotated_type(), which will match parameters of type
AnyOf clauses can be used in the same way as they are in the main
where clause of the model query.
Not can be used to negate any existing clause, and
AnyOf can be used to match when any one of several supplied clauses match.
Taint for attribute models requires a
AttributeModel model clause, which can only be used when the find clause specifies attributes.