Skip to main content

Model Domain Specific Language (DSL)

We have started developing a model Domain Specific Language (DSL) that can be used to solve many of the same problems as model generators, while still keeping model information in .pysa files. The DSL aims to provide a compact way to generate models for all code that matches a given query. This allows users to avoid writing hundreds or thousands of models.

Basics​

The most basic form of querying Pysa's DSL is by generating models based on function names. To do so, add a ModelQuery to your .pysa file:

ModelQuery(
# Indicates the name of the query
name = "get_foo_sources",
# Indicates that this query is looking for functions
find = "functions",
# Indicates those functions should be called 'foo'
where = [name.matches("foo")],
# Indicates that matched function should be modeled as returning 'Test' taint
model = [
Returns(TaintSource[Test]),
],
# Indicates that the generated models should include the 'foo' and 'foo2' functions
expected_models = [
"def file.foo() -> TaintSource[Test]: ...",
"def file.foo2() -> TaintSource[Test]: ..."
],
# Indicates that the generated models should not include the 'bar' function
unexpected_models = [
"def file.bar() -> TaintSource[Test]: ..."
]
)

Things to note in this example:

  1. The name clause is the name of your query.
  2. The find clause lets you pick whether you want to model functions, methods or attributes.
  3. The where clause is how you refine your criteria for when a model should be generated - in this example, we're filtering for functions whose names contain foo.
  4. The model clause is a list of models to generate. Here, the syntax means that the functions matching the where clause should be modelled as returning TaintSource[Test].
  5. The expected_models and unexpected_models clauses are optional and allow you to specify models that should or should not be generated by your query.

When invoking Pysa, if you add the --dump-model-query-results /path/to/output/file flag to your invocation, the generated models, sorted under the respective ModelQuery that created them, will be written to a file in JSON format.

$ pyre analyze --dump-model-query-results /path/to/output/file.txt
...
> Emitting the model query results to `/my/home/dir/.pyre/model_query_results.pysa`

You can then view this file to see the generated models.

You can also test DSL queries using pyre query.

Name clauses​

The name clause describes what the query is meant to find. Normally it follows the format of get_ + [what the query matches with in the where clause] + [_sinks, _source and/or _tito]. This clause should be unique for every ModelQuery within a file.

Find clauses​

The find clause specifies what entities to model, and currently supports "functions", "methods", "attributes", and "globals". "functions" indicates that you're querying for free functions, "methods" indicates that you're only querying class methods, "attributes" indicates that you're querying for attributes on classes, and "globals" indicates that you're querying for names available in the global scope.

Note that "attributes" also includes constructor-initialized attributes, such as C.y in the following case:

class C:
x = ...

def __init__(self):
self.y = ...

Note that "globals" currently don't infer the type annotation of their value, so querying is more effective when they're properly annotated.

def fun(x: int, y: str) -> int:
return x + int(y)

a = fun(1, "2") # -> typing.Any
b: int = fun(1, "2") # -> int

Where clauses​

where clauses are a list of predicates, all of which must match for an entity to be modelled. Note that certain predicates are only compatible with specific find clause kinds.

fully_qualified_name.matches​

The most basic query predicate is a name match - the name you're searching for is compiled as a regex, and the entity's fully qualified name is compared against it. A fully qualified name includes the module and class - for example, for a method foo in class C which is part of module bar, the fully qualified name is bar.C.foo.

Example:

ModelQuery(
name = "get_starting_with_foo",
find = ...,
where = [
fully_qualified_name.matches("foo.*")
],
model = ...
)
caution

matches performs a partial match! For instance, matches("bar") will match against a function named my_module.foobarbaz. To perform a full match, use ^ and $. For instance: matches("^.*\.bar$").

fully_qualified_name.equals​

This clause will match when the entity's fully qualified name is exactly the same as the specified string.

Example:

ModelQuery(
name = "get_bar_C_foo",
find = ...,
where = [
fully_qualified_name.equals("bar.C.foo")
],
model = ...
)

name.matches​

The name.matches clause is similar to fully_qualified_name.matches, but matches against the actual name of the entity, excluding module and class names.

Example:

ModelQuery(
name = "get_starting_with_foo",
find = ...,
where = [
name.matches("foo.*")
],
model = ...
)
caution

matches performs a partial match! For instance, matches("bar") will match against a function named foobarbaz. To perform a full match, use ^ and $. For instance: matches("^.*bar$").

name.equals​

The name.equals clause is similar to fully_qualified_name.equals, but matches against the actual name of the entity, excluding module and class names.

ModelQuery(
name = "get_foo",
find = ...,
where = [
name.equals("foo")
],
model = ...
)

return_annotation clauses​

Model queries allow for querying based on the return annotation of a callable. Note that this where clause does not work when the find clause specifies "attributes".

return_annotation.equals​

The clause will match when the fully-qualified name of the callable's return type matches the specified value exactly.

ModelQuery(
name = "get_return_HttpRequest_sources",
find = "functions",
where = [
return_annotation.equals("django.http.HttpRequest"),
],
model = Returns(TaintSource[UserControlled, Via[http_request]])
)

return_annotation.matches​

This is similar to the previous clause, but will match when the fully-qualified name of the callable's return type matches the specified pattern.

ModelQuery(
name = "get_return_Request_sources",
find = "methods",
where = [
return_annotation.matches(".*Request"),
],
model = Returns(TaintSource[UserControlled, Via[http_request]])
)

return_annotation.is_annotated_type​

This will match when a callable's return type is annotated with typing.Annotated. This is a type used to decorate existing types with context-specific metadata, e.g.

from typing import Annotated

def bad() -> Annotated[str, "SQL"]:
...

Example:

ModelQuery(
name = "get_return_annotated_sources",
find = functions,
where = [
return_annotation.is_annotated_type(),
],
model = Returns(TaintSource[SQL])
)

This query would match on functions like the one shown above.

return_annotation.extends​

This will match when a callable's return type is a class that is a subclass of the provided class names. Note that this will only work on class names. More complex types like Union, Callable are not supported. The extends clause also takes boolean parameters is_transitive, which when set to true means it will match when the class is a transitive subclass, otherwise it will only match when it is a direct subclass, and includes_self, which determines whether extends(T) should include T itself.

Example:

ModelQuery(
name = "get_return_annotation_extends",
find = functions,
where = [
return_annotation.extends("test.A", is_transitive=True, includes_self=True),
],
model = Returns(TaintSource[Test])
)

Given the following Python code in module test:

class A:
pass

class B(A):
pass

class C:
pass

def foo() -> A: ...
def bar() -> B: ...
def baz() -> C: ...

The above query would match bar and baz which are transitive subclasses of A, but not foo, since includes_self was False.

If the return type is Optional[T], or ReadOnly[T], they will be effectively treated as if they were type T for the purpose of matching.

from typing import Optional
from pyre_extensions import ReadOnly

# These should all also match
def bar_optional() -> Optional[B]: ...
def bar_readonly() -> ReadOnly[B]: ...
def baz2() -> Optional[ReadOnly[Optional[C]]]: ...

type_annotation clauses​

Model queries allow for querying based on the type annotation of a global. Note that this is similar to the return_annotation clauses shown previously. See also: Parameters model type_annotation clauses.

type_annotation.equals​

The clause will match when the fully-qualified name of the global's explicitly annotated type matches the specified value exactly.

ModelQuery(
name = "get_string_dicts",
find = "globals",
where = [
type_annotation.equals("typing.Dict[(str, str)]"),
],
model = GlobalModel(TaintSource[SelectDict])
)

For example, the above query when run on the following code:

unannotated_dict = {"hello": "world", "abc": "123"}
annotated_dict: Dict[str, str] = {"hello": "world", "abc": "123"}

will result in a model for annotated_dict: TaintSource[SelectDict].

type_annotation.matches​

This is similar to the previous clause, but will match when the fully-qualified name of the global's explicit type annotation matches the specified pattern.

ModelQuery(
name = "get_anys",
find = "globals",
where = [
return_annotation.matches(".*typing.Any.*"),
],
model = GlobalModel(TaintSource[SelectAny])
)

type_annotation.is_annotated_type​

This will match when a global's type is annotated with typing.Annotated. This is a type used to decorate existing types with context-specific metadata, e.g.

from typing import Annotated

result: Annotated[str, "SQL"] = ...

Example:

ModelQuery(
name = "get_return_annotated_sources",
find = globals,
where = [
return_annotation.is_annotated_type(),
],
model = GlobalModel(TaintSource[SQL])
)

This query would match on functions like the one shown above.

type_annotation.extends​

This behaves the same way as the return_annotation.extends() clause. Please refer to the section above.

any_parameter clauses​

Model queries allow matching callables where any parameter matches a given clause. For now, the only clauses we support for parameters is specifying conditions on the type annotation of a callable's parameters. These can be used in conjunction with the Parameters model clause (see type_annotation) to taint specific parameters. Note that this where clause does not work when the find clause specifies "attributes".

any_parameter.annotation.equals​

This clause will match all callables which have at least one parameter where the fully-qualified name of the parameter type matches the specified value exactly.

Example:

ModelQuery(
name = "get_parameter_HttpRequest_sources",
find = "functions",
where = [
any_parameter.annotation.equals("django.http.HttpRequest")
],
model =
Parameters(
TaintSource[UserControlled],
where=[
name.equals("request"),
name.matches("data$")
]
)
)

any_parameter.annotation.matches​

This clause will match all callables which have at least one parameter where the fully-qualified name of the parameter type matches the specified pattern.

Example:

ModelQuery(
name = "get_parameter_Request_sources",
find = "methods",
where = [
any_parameter.annotation.matches(".*Request")
],
model =
Parameters(
TaintSource[UserControlled],
where=[
type_annotation.matches(".*Request"),
]
)
)

any_parameter.annotation.is_annotated_type​

This clause will match all callables which have at least one parameter with type typing.Annotated.

Example:

ModelQuery(
name = "get_parameter_annotated_sources",
find = "functions",
where = [
any_parameter.annotation.is_annotated_type()
],
model =
Parameters(
TaintSource[Test],
where=[
type_annotation.is_annotated_type(),
]
)
)

AnyOf clauses​

There are cases when we want to model entities which match any of a set of clauses. The AnyOf clause represents exactly this case.

Example:

ModelQuery(
name = "get_AnyOf_example",
find = "methods",
where = [
AnyOf(
any_parameter.annotation.is_annotated_type(),
return_annotation.is_annotated_type(),
)
],
model = ...
)

AllOf clauses​

There are cases when we want to model entities which match all of a set of clauses. The AllOf clause may be used in this case.

Example:

ModelQuery(
name = "get_AllOf_example",
find = "methods",
where = [
AnyOf(
AllOf(
cls.extends("a.b"),
cls.name.matches("Foo"),
),
AllOf(
cls.extends("c.d"),
cls.name.matches("Bar")
)
)
],
model = ...
)

Decorator clauses​

Decorator clauses are used to find callables decorated with decorators that match a pattern. This clause takes decorator clauses as arguments.

Decorator fully_qualified_callee clauses​

The fully_qualified_callee decorator clause is used to match on the fully qualified name of a decorator. That is, the fully qualified name of a higher order function. The supported name clauses are the same as the ones discussed above for model query constraints, i.e.,

  • fully_qualified_callee.matches("pattern"), which will match when the decorator matches the regex pattern specified as a string, and
  • fully_qualified_callee.equals("foo.bar.d1"), which will match when the fully-qualified name of the decorator equals the specified string exactly.

For example, if you wanted to find all functions that are decorated by @App().route(), a decorator whose definition is in file my_module.py:

class App:
def route(self, func: Callable) -> Callable:
...

You can write:

ModelQuery(
name = "get_my_module_app_route_decorator",
find = "functions",
where = Decorator(fully_qualified_callee.equals("my_module.App.route")),
...
)

which is arguably better because it is more precise than regex matching, or

ModelQuery(
name = "get_app_route_decorator",
find = "functions",
where = Decorator(fully_qualified_callee.matches(".*\.App\.route")),
...
)

Clarification. As another example, assume the following code is in file test.py:

class Flask:
def route(self, func: Callable) -> Callable:
...
application = Flask()
@application.route
def my_view():
pass

Then, for decorator @application.route, clause fully_qualified_callee matches against the decorator's fully qualified name test.Flask.route, as oppposed to the local identifier's fully qualified name test.application.route (that refers to this decorator).

Decorator name clauses​

The name clause is similar to fully_qualified_name, but matches against the actual name of the entity, excluding module and class names.

Decorator arguments clauses​

The arguments clauses is used to match on the arguments provided to the decorator. The supported arguments clauses are arguments.contains(...), which will match when the arguments specified are a subset of the decorator's arguments, and arguments.equals(...), which will match when the decorator has the specified arguments exactly.

arguments.contains() supports both positional and keyword arguments. For positional arguments, the list of positonal arguments supplied to the arguments.contains() clause must be a prefix of the list of positional arguments on the actual decorator, i.e. the value of the argument at each position should be the same. For example, with the following Python code:

@d1(a, 2)
def match1():
...

@d1(a, 2, 3, 4)
def match2():
...

@d1(2, a):
def nomatch():
...

This query will match both match1() and match2(), but not nomatch(), since the values of the positional arguments don't match up.

ModelQuery(
name = "get_d1_decorator",
find = "functions",
where = Decorator(
fully_qualified_name.matches("d1"),
arguments.contains(a, 2)
),
...
)

For keyword arguments in arguments.contains(), the specified keyword arguments must be a subset of the decorator's keyword arguments, but can be specified in any order. For example, with the following Python code:

@d1(a, 2, foo="Bar")
def match1():
...

@d1(baz="Boo", foo="Bar")
def match2():
...

This query will match both match1() and match2():

ModelQuery(
name = "get_d1_decorator",
find = "functions",
where = Decorator(
fully_qualified_name.matches("d1"),
arguments.contains(foo="Bar")
),
...
)

arguments.equals() operates similarly, but will only match if the specified arguments match the decorator's arguments exactly. This means that for positional arguments, all arguments in each position must match by value exactly. Keyword arguments can be specified in a different order, but the set of specified keyword arguments and the set of the decorator's actual keyword arguments must be the same. For example, with the following Python code:

@d1(a, 2, foo="Bar", baz="Boo")
def match1():
...

@d1(a, 2, baz="Boo", foo="Bar")
def match2():
...

@d1(2, a, baz="Boo", foo="Bar")
def nomatch1():
...

@d1(a, 2, 3, baz="Boo", foo="Bar")
def nomatch2():
...

This query will match both match1() and match2(), but not nomatch1() or nomatch2():

ModelQuery(
name = "get_d1_decorator",
find = "functions",
where = Decorator(
fully_qualified_name.matches("d1"),
arguments.equals(a, 2, foo="bar", baz="Boo")
),
...
)

Decorator Not, AllOf and AnyOf clauses​

The Not, AllOf and AnyOf clauses can be used in decorators clauses in the same way as they are in the main where clause of the model query.

cls.fully_qualified_name.equals clause​

You may use the cls clause to specify predicates on the class. This predicate can only be used when the find clause specifies methods or attributes.

The cls.fully_qualified_name.equals clause is used to model entities when the class's fully qualified name is an exact match for the specified string.

Example:

ModelQuery(
name = "get_childOf_foo_Bar",
find = "methods",
where = cls.name.equals("foo.Bar"),
...
)

cls.fully_qualified_name.matches clause​

The cls.fully_qualified_name.matches clause is used to model entities when the class's fully qualified name matches the provided regex.

Example:

ModelQuery(
name = "get_childOf_Foo",
find = "methods",
where = cls.fully_qualified_name.matches(".*Foo.*"),
...
)

cls.name.matches clause​

The cls.name.matches clause is similar to cls.fully_qualified_name.matches, but matches against the actual name of the class, excluding modules.

cls.name.equals clause​

The cls.name.equals clause is similar to cls.fully_qualified_name.equals, but matches against the actual name of the class, excluding modules.

cls.extends clause​

The cls.extends clause is used to model entities when the class is a subclass of the provided class name.

Example:

ModelQuery(
name = "get_subclassOf_C",
find = "attributes",
where = cls.extends("C"),
...
)

The default behavior is that it will only match if the class is an instance of, or a direct subclass of the specified class. For example, with classes:

class C:
x = ...

class D(C):
y = ...

class E(D):
z = ...

the above query will only model the attributes C.z and D.y, since C is considered to extend itself, and D is a direct subclass of C. However, it will not model E.z, since E is a sub-subclass of C.

If you would like to model a class and all subclasses transitively, you can use the is_transitive flag.

Example:

ModelQuery(
name = "get_transitive_subclassOf_C",
find = "attributes",
where = cls.extends("C", is_transitive=True),
...
)

This query will model C.x, D.y and E.z.

If you do not want to match on the class itself, you can use the includes_self flag.

Example:

ModelQuery(
name = "get_transitive_subclassOf_C",
find = "attributes",
where = cls.extends("C", is_transitive=True, includes_self=False),
...
)

This query will model D.y and E.z.

cls.decorator clause​

The cls.decorator clause is used to specify constraints on a class decorator, so you can choose to model entities on classes only if the class it is part of has the specified decorator.

The arguments for this clause are identical to the non-class constraint Decorator, for more information, please see the Decorator clauses section.

Example:

ModelQuery(
name = "get_childOf_d1_decorator_sources",
find = "methods",
where = [
cls.decorator(
fully_qualified_name.matches("d1"),
arguments.contains(2)
),
name.equals("__init__")
],
model = [
Parameters(TaintSource[Test], where=[
Not(name.equals("self")),
Not(name.equals("a"))
])
]
)

For example, the above query when run on the following code:

@d1(2)
class Foo:
def __init__(self, a, b):
...

@d1()
class Bar:
def __init__(self, a, b):
...

@d2(2)
class Baz:
def __init__(self, a, b):
...

will result in a model for def Foo.__init__(b: TaintSource[Test]).

cls.any_child clause​

The cls.any_child clause is used to model entities when any child of the current class meets the specified constraints.

The arguments for this clause are any combination of valid class constraints (cls.name.equals, cls.name.matches, cls.fully_qualified_name.equals, cls.fully_qualified_name.matches, cls.extends, cls.decorator) and logical clauses (AnyOf, AllOf, Not), along with the optional is_transitive and includes_self clauses.

Example:

ModelQuery(
name = "get_parent_of_d1_decorator_sources",
find = "methods",
where = [
cls.any_child(
cls.decorator(
fully_qualified_name.matches("d1"),
arguments.contains(2)
)
),
name.equals("__init__")
],
model = [
Parameters(TaintSource[Test], where=[
Not(name.equals("self")),
Not(name.equals("a"))
])
]
)

Similar to the cls.extends constraint, the default behavior is that it will only match if any immediate children (or itself) of the class of the method or attribute matches against the inner clause. For example, with classes:

class Foo:
def __init__(self, a, b):
...

class Bar(Foo):
def __init__(self, a, b):
...

@d1(2)
class Baz(Bar):
def __init__(self, a, b):
...

The above query will only model the methods Bar.__init__ and Baz.__init__, since Bar is an immediate parent of Baz, and Baz is considered to extend itself. However, it will not model Foo.__init__, since Bar is a sub-subclass of Foo.

If you would like to model a class and all subclasses transitively, you can use the is_transitive flag.

Example:

ModelQuery(
name = "get_transitive_parent_of_d1_decorator_sources",
find = "attributes",
where = [
cls.any_child(
cls.decorator(
fully_qualified_name.matches("d1"),
arguments.contains(2)
),
is_transitive=True
),
name.equals("__init__")
],
...
)

This query will model Foo.__init__, Bar.__init__ and Baz.__init__.

If you would like to model all subclasses of a class excluding itself, you can use the includes_self flag.

Example:

ModelQuery(
name = "get_transitive_parent_of_d1_decorator_sources",
find = "attributes",
where = [
cls.any_child(
cls.decorator(
fully_qualified_name.matches("d1"),
arguments.contains(2)
),
is_transitive=True,
includes_self=False
),
name.equals("__init__")
],
...
)

This query will model Foo.__init__, Bar.__init__ but NOT Baz.__init__.

tip

We recommend to always specify both is_transitive and includes_self to avoid confusion.

cls.any_parent clause​

The cls.any_parent clause is used to model entities when any parent of the current class meets the specified constraints.

The arguments for this clause are any combination of valid class constraints (cls.name.equals, cls.name.matches, cls.fully_qualified_name.equals, cls.fully_qualified_name.matches, cls.extends, cls.decorator) and logical clauses (AnyOf, AllOf, Not), along with the optional is_transitive and includes_self clauses.

Example:

ModelQuery(
name = "get_children_of_d1_decorator_sources",
find = "methods",
where = [
cls.any_parent(
cls.decorator(
fully_qualified_name.matches("d1"),
arguments.contains(2)
)
),
name.equals("__init__")
],
model = [
Parameters(TaintSource[Test], where=[
Not(name.equals("self")),
Not(name.equals("a"))
])
]
)

Similar to the cls.extends constraint, the default behavior is that it will only match if any immediate parent (or itself) of the class of the method or attribute matches against the inner clause. For example, with classes:

@d1(2)
class Foo:
def __init__(self, a, b):
...

class Bar(Foo):
def __init__(self, a, b):
...

class Baz(Bar):
def __init__(self, a, b):
...

The above query will only model the methods Bar.__init__ and Foo.__init__, since Foo is an immediate parent of Bar, and Foo is considered to extend itself. However, it will not model Baz.__init__, since Foo is not an immediate parent of Baz.

If you would like to model a class and all transitive parents, you can use the is_transitive flag.

Example:

ModelQuery(
name = "get_transitive_children_of_d1_decorator_sources",
find = "attributes",
where = [
cls.any_parent(
cls.decorator(
fully_qualified_name.matches("d1"),
arguments.contains(2)
),
is_transitive=True
),
name.equals("__init__")
],
...
)

This query will model Foo.__init__, Bar.__init__ and Baz.__init__.

If you would like to model all parents of a class excluding itself, you can use the includes_self flag.

Example:

ModelQuery(
name = "get_transitive_parent_of_d1_decorator_sources",
find = "attributes",
where = [
cls.any_parent(
cls.decorator(
fully_qualified_name.matches("d1"),
arguments.contains(2)
),
is_transitive=True,
includes_self=False
),
name.equals("__init__")
],
...
)

This query will model Bar.__init__, Baz.__init__ but NOT Foo.__init__.

tip

We recommend to always specify both is_transitive and includes_self to avoid confusion.

Not clauses​

The Not clause negates any existing clause that is valid for the entity being modelled.

Example:

ModelQuery(
name = "get_Not_example",
find = "methods",
where = [
Not(
name.matches("foo.*"),
cls.fully_qualified_name.matches("testing.unittest.UnitTest"),
)
],
model = ...
)

Generated models (Model clauses)​

The last bit of model queries is actually generating models for all entities that match the provided where clauses. For callables, we support generating models for parameters by name or position, as well as generating models for all paramaters. Additionally, we support generating models for the return annotation.

Returned taint​

Returned taint takes the form of Returns(TaintSpecification), where TaintSpecification is either a taint annotation or a list of taint annotations.

ModelQuery(
name = "get_Returns_sources",
find = "methods",
where = ...,
model = [
Returns(TaintSource[Test, Via[foo]])
]
)

Parameter taint​

Parameters can be tainted using the Parameters() clause. By default, all parameters will be tained with the supplied taint specification. If you would like to only taint specific parameters matching certain conditions, an optional where clause can be specified to accomplish this, allowing for constraints on parameter names, the annotation type of the parameter, or parameter position. For example:

ModelQuery(
name = "get_Parameters_sources",
find = "methods",
where = ...,
model = [
Parameters(TaintSource[A]), # will taint all parameters by default
Parameters(
TaintSource[B],
where=[
Not(index.equals(0)) # will only taint parameters that are not the first parameter
]
),
]
)

name clauses​

To specify a constraint on parameter name, the name.equals() or name.matches() clauses can be used. As in the main where clause of the model query, equals() searches for an exact match on the specified string, while matches() allows a regex to be supplied as a pattern to match against.

Example:

ModelQuery(
name = "get_request_data_sources",
find = "methods",
where = ...,
model = [
Parameters(
TaintSource[Test],
where=[
name.equals("request"),
name.matches("data$")
]
)
]
)

index clause​

To specify a constraint on parameter position, the index.equals() clause can be used. It takes a single integer denoting the position of the parameter.

Example:

ModelQuery(
name = "get_index_sources",
find = "methods",
where = ...,
model = [
Parameters(
TaintSource[Test],
where=[
index.equals(1)
]
)
]
)

type_annotation clause​

This clause is used to specify a constraint on parameter type annotation. Currently the clauses supported are: type_annotation.equals(), which takes the fully-qualified name of a Python type or class and matches when there is an exact match, type_annotation.matches(), which takes a regex pattern to match type annotations against, and type_annotation.is_annotated_type(), which will match parameters of type typing.Annotated.

Example:

ModelQuery(
name = "get_annotated_parameters_sources",
find = "methods",
where = ...,
model = [
Parameters(
TaintSource[Test],
where=[
type_annotation.equals("foo.bar.C"), # exact match
type_annotation.matches("^List\["), # regex match
type_annotation.is_annotated_type(), # matches Annotated[T, x]
]
)
]
)

To match on the annotation portion of Annotated types, consider the following example. Suppose this code was in test.py:

from enum import Enum
from typing import Annotated, Option

class Color(Enum):
RED = 1
GREEN = 2
BLUE = 3

class Foo:
x: Annotated[Optional[int], Color.RED]
y: Annotated[Optional[int], Color.BLUE]
z: Annotated[int, "z"]

Note that the type name that should be matched against is its fully qualified name, which also includes the fully qualified name of any other types referenced (for example, typing.Optional rather than just Optional). When multiple arguments are provided to the type they are implicitly treated as being in a tuple.

Here are some examples of where clauses that can be used to specify models for the annotated attributes in this case:

ModelQuery(
name = "get_annotated_attributes_sources",
find = "attributes",
where = [
AnyOf(
type_annotation.equals("typing.Annotated[(typing.Optional[int], test.Color.RED)]"),
type_annotation.equals("typing.Annotated[(int, z)]"),
type_annotation.matches(".*Annotated\[.*Optional[int].*Color\..*\]")
type_annotation.is_annotated_type()
)
],
model = [
AttributeModel(TaintSource[Test]),
]
)

This query should generate the following models:

test.Foo.x: TaintSource[Test]
test.Foo.y: TaintSource[Test]
test.Foo.z: TaintSource[Test]

Not, AllOf and AnyOf clauses​

The Not, AllOf and AnyOf clauses can be used in the same way as they are in the main where clause of the model query. Not can be used to negate any existing clause, AllOf to match when all of several supplied clauses match, and AnyOf can be used to match when any one of several supplied clauses match.

Example:

ModelQuery(
name = "get_Not_AnyOf_AllOf_example_sources",
find = "methods",
where = ...,
model = [
Parameters(
TaintSource[Test],
where=[
Not(
AnyOf(
AllOf(
cls.extends("a.b"),
cls.name.matches("Foo"),
),
AllOf(
cls.extends("c.d"),
cls.name.matches("Bar")
)
)
)
]
)
]
)

Using ViaTypeOf with the Parameters clause​

Usually when specifying a ViaTypeOf the argument that you want to capture the value or type of should be specified. However, when writing model queries and trying to find all parameters that match certain conditions, we may not know the exact name of the parameters that will be modelled. For example:

def f1(bad_1, good_1, good_2):
pass

def f2(good_3, bad_2, good_4):
pass

Suppose we wanted to model all parameters with the prefix bad_ here and attach a ViaTypeOf to them. In this case it is still possible to attach these features to the parameter model, by using a standalone ViaTypeOf as follows:

ModelQuery(
name = "get_f_sinks",
find = "functions",
where = name.matches("f"),
model = [
Parameters(
TaintSink[Test, ViaTypeOf],
where=[
name.matches("bad_")
]
)
]
)

This would produce models equivalent to the following:

def f1(bad_1: TaintSink[Test, ViaTypeOf[bad_1]]): ...
def f2(bad_2: TaintSink[Test, ViaTypeOf[bad_2]]): ...

Models for attributes​

Taint for attribute models requires a AttributeModel model clause, which can only be used when the find clause specifies attributes.

Example:

ModelQuery(
name = "get_attribute_sources_sinks",
find = "attributes",
where = ...,
model = [
AttributeModel(TaintSource[Test], TaintSink[Test])
]
)

Using ViaAttributeName with the AttributeModel clause​

ViaAttributeName can be used within AttributeModel to add a feature containing the name of the attribute to any taint flowing through the given attributes.

For instance:

ModelQuery(
name = "get_attribute_of_Foo",
find = "attributes",
where = [cls.name.equals("Foo")],
model = [
AttributeModel(ViaAttributeName[WithTag["Foo"]])
]
)

On the following code:

class Foo:
first_name: str
last_name: str

def last_name_to_sink(foo: Foo):
sink(foo.last_name)

This will add the feature via-Foo-attribute:last_name on the flow to the sink.

Models for globals​

Taint for global models requires a GlobalModel model clause, which can only be used when the find clause specifies globals.

Example:

ModelQuery(
name = "get_global_sources",
find = "globals",
where = ...,
model = [
GlobalModel(TaintSource[Test])
]
)

Models for setting modes​

This model clause is different from the others in this section in the sense that it doesn't produce taint for the models it targets, but updates their models with specific modes to change their behavior with taint analysis.

The available modes are:

  • Obscure
    • Marks the function or method as obscure
  • SkipObscure
    • Prevents a function or method from being marked as obscure
  • SkipAnalysis
    • Skips inference of the function or model targeted, and forces the use of user-defined models for taint flow
  • SkipOverrides
    • Prevents taint propagation from the targeted model into and from overridden methods on subclasses
  • Entrypoint
    • Specifies functions or methods to be used as entrypoints for analysis, so only transitive calls from that function are analyzed
  • SkipDecoratorWhenInlining
    • Prevents the selected decorator from being inlined during analysis
    • Note: this mode will be a no-op, since model queries are generated after decorators are inlined
  • SkipModelBroadening
    • Prevents model broadening for the given function or method

For instance, instead of annotating each function separately, as in the following .pysa file:

@Entrypoint
def myfile.func1(): ...

@Entrypoint
def myfile.func2(): ...

@Entrypoint
def myfile.func3(): ...

@Entrypoint
def myfile.func4(): ...

One could instead use the following model query:

ModelQuery(
name = "get_myfile_entrypoint_functions",
find = "functions",
where = [
name.matches("myfile\.func.*")
],
model = [
Modes([Entrypoint])
]
)

The benefit is that any new functions that matches that name will also be considered entrypoints.

Note that it is also possible to include multiple modes in a Modes model clause by extending the list (e.g Modes([SkipOverrides, Obscure]).

Expected and Unexpected Models clauses​

The optional expected_models and unexpected_models clauses allow you to specify models that your ModelQuery should or should not generate the equivalent of. The models in these clauses should be syntactically correct Pysa models (see this documentation for a guide on how to write a Pysa model). If your query does not generate a model in expected_models, or if it generates a model in unexpected_models, an error will be raised.

Example:

ModelQuery(
name = "get_foo_returns_sources",
find = "functions",
where = [name.matches("foo")],
model = [
Returns(TaintSource[Test]),
],
expected_models = [
"def file.foo() -> TaintSource[Test]: ...",
"def file.foo2() -> TaintSource[Test]: ..."
],
unexpected_models = [
"def file.bar() -> TaintSource[Test]: ..."
]
)

This would not produce any errors, since the models the ModelQuery generates will contain expected_models and not unexpected_models.

Cache Queries​

Generating models for a large number of queries can be quite slow. Cache queries allow to speed up model generation by factoring out queries with similar where clause into a single query, which builds a mapping from an arbitrary name to a set of matching entities. Then, other queries can read from this cache, making them quick to execute.

For instance, imagine having the following queries:

ModelQuery(
...
find = "methods",
where = [
AnyOf(cls.extends("my_module.Foo"), cls.extends("other_module.Bar")),
fully_qualified_name.matches("\.ClassA\.method$"),
],
model = ...
)
ModelQuery(
...
find = "methods",
where = [
AnyOf(cls.extends("my_module.Foo"), cls.extends("other_module.Bar")),
fully_qualified_name.matches("\.ClassB\.method$"),
],
model = ...
)
ModelQuery(
...
find = "methods",
where = [
AnyOf(cls.extends("my_module.Foo"), cls.extends("other_module.Bar")),
fully_qualified_name.matches("\.ClassC\.other_method$"),
],
model = ...
)
# etc.

We can factor out the expensive where clause into a single query which writes to a key-value cache, using the WriteToCache clause.

ModelQuery(
...
find = "methods",
where = [AnyOf(cls.extends("my_module.Foo"), cls.extends("other_module.Bar"))],
model = WriteToCache(kind="FooBar", name=f"{class_name}:{function_name}")
)

All matching methods will be stored in a cache named FooBar, under the key {class_name}:{function_name}.

After executing the query, we might get the following cache FooBar:

ClassA:method -> {some_module.ClassA.method}
ClassB:method -> {some_other_module.ClassB.method}
ClassC:other_method -> {some_module.ClassC.other_method}

We can then read from the cache using the where clause read_from_cache:

ModelQuery(
find = "methods",
where = read_from_cache(kind="FooBar", name="ClassA:method",
model = ...
)
ModelQuery(
find = "methods",
where = read_from_cache(kind="FooBar", name="ClassB:method",
model = ...
)
ModelQuery(
find = "methods",
where = read_from_cache(kind="FooBar", name="ClassC:other_method",
model = ...
)

This will generate the same models as the first example, but model generation will be a lot faster.

In terms of time complexity, if the number of entities (methods here) is N, the number of queries is Q and the average cost of evaluating a where clause is C, the first example would have a O(N*Q*C) complexity. Using cache queries, this turns into O(N*C+Q), which is much better.

WriteToCache clause​

WriteToCache is a model clause that is used to store entities into a cache. It takes the following arguments:

  • A kind, which is the name of the cache.
  • A name as an f-string, which will be the key for the entity in the cache.

The name can use the following variables:

  • function_name: The (non-qualified) name of the function.
  • method_name: The (non-qualified) name of the method.
  • class_name: The (non-qualified) name of the class.
  • capture(identifier): The regular expression capture group called identifier. See documentation below.

For instance:

ModelQuery(
...
find = "methods",
model = WriteToCache(kind="cache_name", name=f"{class_name}:{function_name}")
)

Note that you can write multiple entities under the same name. For instance, this happens if you use name=f"{class_name}" and multiple methods of the same class match against the where clause.

read_from_cache clause​

read_from_cache is a where clause that will only match against entities with the given name in the cache. It takes the following arguments:

  • A kind, which is the name of the cache.
  • A name as a string, which is the key for the entities in the cache.

For instance:

ModelQuery(
find = "methods",
where = read_from_cache(kind="cache_name", name="Class:method"),
model = ...
)

Note that you can use read_from_cache in combination with other where clauses, as long as at least one read_from_cache clause is active on all branches.

For instance, this is disallowed:

ModelQuery(
find = "methods",
where = AnyOf(
read_from_cache(kind="cache_name", name="Class:method"),
cls.extends("module.Foo")
),
model = ...
)

Regular expression capture​

name.matches and cls.name.matches clause can use named capturing groups, which can be used in the name of WriteToCache clauses.

For instance:

ModelQuery(
find = "functions",
where = name.matches("^get_(?P<attribute>[a-z]+)$"),
model = WriteToCache(kind="cache_name", name=f"{capture(attribute)}")
)

For a function get_foo, this will create a cache for key foo.

caution

Be careful when using regular expression captures. If the capture group is not found (e.g, a typo), WriteToCache will use the empty string.

Note that we do not support numbered capture groups, e.g Foo(.*).

Logging group clauses​

The logging_group_name clause specifies that the model query should be considered part of the given group for logging purposes. This is useful when auto generating large amounts of model queries. When verbose logging is enabled (-n), Pysa will print a single line Model Query group 'XXX' generated YYY models instead of printing one line per model query in the group.

For instance:

ModelQuery(
name = "generated_dangerous_foo",
logging_group_name = "generated_dangerous",
find = "methods",
where = read_from_cache(kind="annotated", name="foo"),
model = ...
)
ModelQuery(
name = "generated_dangerous_bar",
logging_group_name = "generated_dangerous",
find = "methods",
where = read_from_cache(kind="annotated", name="bar"),
model = ...
)