Model Domain Specific Language (DSL)

We have started developing a model Domain Specific Language (DSL) that can be used to solve many of the same problems as model generators, while still keeping model information in .pysa files. The DSL aims to provide a compact way to generate models for all code that matches a given query. This allows users to avoid writing hundreds or thousands of models.

Basics

The most basic form of querying Pysa's DSL is by generating models based on function names. To do so, add a ModelQuery to your .pysa file:

ModelQuery(
  # Indicates the name of the query
  name = "get_foo_sources",
  # Indicates that this query is looking for functions
  find = "functions",
  # Indicates those functions should be called 'foo'
  where = [name.matches("foo")],
  # Indicates that matched function should be modeled as returning 'Test' taint
  model = [
    Returns(TaintSource[Test]),
  ],
  # Indicates that the generated models should include the 'foo' and 'foo2' functions
  expected_models = [
    "def file.foo() -> TaintSource[Test]: ...",
    "def file.foo2() -> TaintSource[Test]: ..."
  ],
  # Indicates that the generated models should not include the 'bar' function
  unexpected_models = [
    "def file.bar() -> TaintSource[Test]: ..."
  ]
)

Things to note in this example:

The name clause is the name of your query.
The find clause lets you pick whether you want to model functions, methods or attributes.
The where clause is how you refine your criteria for when a model should be generated - in this example, we're filtering for functions whose names contain foo.
The model clause is a list of models to generate. Here, the syntax means that the functions matching the where clause should be modelled as returning TaintSource[Test].
The expected_models and unexpected_models clauses are optional and allow you to specify models that should or should not be generated by your query.

When invoking Pysa, if you add the --dump-model-query-results /path/to/output/file flag to your invocation, the generated models, sorted under the respective ModelQuery that created them, will be written to a file in JSON format.

$ pyre analyze --dump-model-query-results /path/to/output/file.txt
...
> Emitting the model query results to `/my/home/dir/.pyre/model_query_results.pysa`

You can then view this file to see the generated models.

You can also test DSL queries using pyre query.

Name clauses

The name clause describes what the query is meant to find. Normally it follows the format of get_ + [what the query matches with in the where clause] + [_sinks, _source and/or _tito]. This clause should be unique for every ModelQuery within a file.

Find clauses

The find clause specifies what entities to model, and currently supports "functions", "methods", "attributes", and "globals". "functions" indicates that you're querying for free functions, "methods" indicates that you're only querying class methods, "attributes" indicates that you're querying for attributes on classes, and "globals" indicates that you're querying for names available in the global scope.

Note that "attributes" also includes constructor-initialized attributes, such as C.y in the following case:

class C:
  x = ...

  def __init__(self):
    self.y = ...

Note that "globals" currently don't infer the type annotation of their value, so querying is more effective when they're properly annotated.

def fun(x: int, y: str) -> int:
    return x + int(y)

a = fun(1, "2") # -> typing.Any
b: int = fun(1, "2") # -> int

Where clauses

where clauses are a list of predicates, all of which must match for an entity to be modelled. Note that certain predicates are only compatible with specific find clause kinds.

`fully_qualified_name.matches`

The most basic query predicate is a name match - the name you're searching for is compiled as a regex, and the entity's fully qualified name is compared against it. A fully qualified name includes the module and class - for example, for a method foo in class C which is part of module bar, the fully qualified name is bar.C.foo.

Example:

ModelQuery(
  name = "get_starting_with_foo",
  find = ...,
  where = [
    fully_qualified_name.matches("foo.*")
  ],
  model = ...
)

caution

matches performs a partial match! For instance, matches("bar") will match against a function named my_module.foobarbaz. To perform a full match, use ^ and $. For instance: matches("^.*\.bar$").

`fully_qualified_name.equals`

This clause will match when the entity's fully qualified name is exactly the same as the specified string.

Example:

ModelQuery(
  name = "get_bar_C_foo",
  find = ...,
  where = [
    fully_qualified_name.equals("bar.C.foo")
  ],
  model = ...
)

`name.matches`

The name.matches clause is similar to fully_qualified_name.matches, but matches against the actual name of the entity, excluding module and class names.

Example:

ModelQuery(
  name = "get_starting_with_foo",
  find = ...,
  where = [
    name.matches("foo.*")
  ],
  model = ...
)

caution

matches performs a partial match! For instance, matches("bar") will match against a function named foobarbaz. To perform a full match, use ^ and $. For instance: matches("^.*bar$").

`name.equals`

The name.equals clause is similar to fully_qualified_name.equals, but matches against the actual name of the entity, excluding module and class names.

ModelQuery(
  name = "get_foo",
  find = ...,
  where = [
    name.equals("foo")
  ],
  model = ...
)

`return_annotation` clauses

Model queries allow for querying based on the return annotation of a callable. Note that this where clause does not work when the find clause specifies "attributes".

`return_annotation.equals`

The clause will match when the fully-qualified name of the callable's return type matches the specified value exactly.

ModelQuery(
  name = "get_return_HttpRequest_sources",
  find = "functions",
  where = [
    return_annotation.equals("django.http.HttpRequest"),
  ],
  model = Returns(TaintSource[UserControlled, Via[http_request]])
)

`return_annotation.matches`

This is similar to the previous clause, but will match when the fully-qualified name of the callable's return type matches the specified pattern.

ModelQuery(
  name = "get_return_Request_sources",
  find = "methods",
  where = [
    return_annotation.matches(".*Request"),
  ],
  model = Returns(TaintSource[UserControlled, Via[http_request]])
)

`return_annotation.is_annotated_type`

This will match when a callable's return type is annotated with typing.Annotated. This is a type used to decorate existing types with context-specific metadata, e.g.

from typing import Annotated

def bad() -> Annotated[str, "SQL"]:
  ...

Example:

ModelQuery(
  name = "get_return_annotated_sources",
  find = functions,
  where = [
    return_annotation.is_annotated_type(),
  ],
  model = Returns(TaintSource[SQL])
)

This query would match on functions like the one shown above.

`return_annotation.extends`

This will match when a callable's return type is a class that is a subclass of the provided class names. Note that this will only work on class names. More complex types like Union, Callable are not supported. The extends clause also takes boolean parameters is_transitive, which when set to true means it will match when the class is a transitive subclass, otherwise it will only match when it is a direct subclass, and includes_self, which determines whether extends(T) should include T itself.

Example:

ModelQuery(
  name = "get_return_annotation_extends",
  find = functions,
  where = [
    return_annotation.extends("test.A", is_transitive=True, includes_self=True),
  ],
  model = Returns(TaintSource[Test])
)

Given the following Python code in module test:

class A:
  pass

class B(A):
  pass

class C:
  pass

def foo() -> A: ...
def bar() -> B: ...
def baz() -> C: ...

The above query would match bar and baz which are transitive subclasses of A, but not foo, since includes_self was False.

If the return type is Optional[T], or ReadOnly[T], they will be effectively treated as if they were type T for the purpose of matching.

from typing import Optional
from pyre_extensions import ReadOnly

# These should all also match
def bar_optional() -> Optional[B]: ...
def bar_readonly() -> ReadOnly[B]: ...
def baz2() -> Optional[ReadOnly[Optional[C]]]: ...

`type_annotation` clauses

Model queries allow for querying based on the type annotation of a global. Note that this is similar to the return_annotation clauses shown previously. See also: Parameters model type_annotation clauses.

`type_annotation.equals`

The clause will match when the fully-qualified name of the global's explicitly annotated type matches the specified value exactly.

ModelQuery(
  name = "get_string_dicts",
  find = "globals",
  where = [
    type_annotation.equals("typing.Dict[(str, str)]"),
  ],
  model = GlobalModel(TaintSource[SelectDict])
)

For example, the above query when run on the following code:

unannotated_dict = {"hello": "world", "abc": "123"}
annotated_dict: Dict[str, str] = {"hello": "world", "abc": "123"}

will result in a model for annotated_dict: TaintSource[SelectDict].

`type_annotation.matches`

This is similar to the previous clause, but will match when the fully-qualified name of the global's explicit type annotation matches the specified pattern.

ModelQuery(
  name = "get_anys",
  find = "globals",
  where = [
    return_annotation.matches(".*typing.Any.*"),
  ],
  model = GlobalModel(TaintSource[SelectAny])
)

`type_annotation.is_annotated_type`

This will match when a global's type is annotated with typing.Annotated. This is a type used to decorate existing types with context-specific metadata, e.g.

from typing import Annotated

result: Annotated[str, "SQL"] = ...

Example:

ModelQuery(
  name = "get_return_annotated_sources",
  find = globals,
  where = [
    return_annotation.is_annotated_type(),
  ],
  model = GlobalModel(TaintSource[SQL])
)

This query would match on functions like the one shown above.

`type_annotation.extends`

This behaves the same way as the return_annotation.extends() clause. Please refer to the section above.

`any_parameter` clauses

Model queries allow matching callables where any parameter matches a given clause. For now, the only clauses we support for parameters is specifying conditions on the type annotation of a callable's parameters. These can be used in conjunction with the Parameters model clause (see type_annotation) to taint specific parameters. Note that this where clause does not work when the find clause specifies "attributes".

`any_parameter.annotation.equals`

This clause will match all callables which have at least one parameter where the fully-qualified name of the parameter type matches the specified value exactly.

Example:

ModelQuery(
  name = "get_parameter_HttpRequest_sources",
  find = "functions",
  where = [
    any_parameter.annotation.equals("django.http.HttpRequest")
  ],
  model =
    Parameters(
      TaintSource[UserControlled],
      where=[
        name.equals("request"),
        name.matches("data$")
      ]
    )
)

`any_parameter.annotation.matches`

This clause will match all callables which have at least one parameter where the fully-qualified name of the parameter type matches the specified pattern.

Example:

ModelQuery(
  name = "get_parameter_Request_sources",
  find = "methods",
  where = [
    any_parameter.annotation.matches(".*Request")
  ],
  model =
    Parameters(
      TaintSource[UserControlled],
      where=[
        type_annotation.matches(".*Request"),
      ]
    )
)

`any_parameter.annotation.is_annotated_type`

This clause will match all callables which have at least one parameter with type typing.Annotated.

Example:

ModelQuery(
  name = "get_parameter_annotated_sources",
  find = "functions",
  where = [
    any_parameter.annotation.is_annotated_type()
  ],
  model =
    Parameters(
      TaintSource[Test],
      where=[
        type_annotation.is_annotated_type(),
      ]
    )
)

`AnyOf` clauses

There are cases when we want to model entities which match any of a set of clauses. The AnyOf clause represents exactly this case.

Example:

ModelQuery(
  name = "get_AnyOf_example",
  find = "methods",
  where = [
    AnyOf(
      any_parameter.annotation.is_annotated_type(),
      return_annotation.is_annotated_type(),
    )
  ],
  model = ...
)

`AllOf` clauses

There are cases when we want to model entities which match all of a set of clauses. The AllOf clause may be used in this case.

Example:

ModelQuery(
  name = "get_AllOf_example",
  find = "methods",
  where = [
    AnyOf(
      AllOf(
        cls.extends("a.b"),
        cls.name.matches("Foo"),
      ),
      AllOf(
        cls.extends("c.d"),
        cls.name.matches("Bar")
      )
    )
  ],
  model = ...
)

`Decorator` clauses

Decorator clauses are used to find callables decorated with decorators that match a pattern. This clause takes decorator clauses as arguments.

Decorator `fully_qualified_callee` clauses

The fully_qualified_callee decorator clause is used to match on the fully qualified name of a decorator. That is, the fully qualified name of a higher order function. The supported name clauses are the same as the ones discussed above for model query constraints, i.e.,

fully_qualified_callee.matches("pattern"), which will match when the decorator matches the regex pattern specified as a string, and
fully_qualified_callee.equals("foo.bar.d1"), which will match when the fully-qualified name of the decorator equals the specified string exactly.

For example, if you wanted to find all functions that are decorated by @App().route(), a decorator whose definition is in file my_module.py:

class App:
  def route(self, func: Callable) -> Callable:
    ...

You can write:

ModelQuery(
  name = "get_my_module_app_route_decorator",
  find = "functions",
  where = Decorator(fully_qualified_callee.equals("my_module.App.route")),
  ...
)

which is arguably better because it is more precise than regex matching, or

ModelQuery(
  name = "get_app_route_decorator",
  find = "functions",
  where = Decorator(fully_qualified_callee.matches(".*\.App\.route")),
  ...
)

Clarification. As another example, assume the following code is in file test.py:

class Flask:
    def route(self, func: Callable) -> Callable:
      ...
application = Flask()
@application.route
def my_view():
  pass

Then, for decorator @application.route, clause fully_qualified_callee matches against the decorator's fully qualified name test.Flask.route, as oppposed to the local identifier's fully qualified name test.application.route (that refers to this decorator).

Decorator `name` clauses

The name clause is similar to fully_qualified_name, but matches against the actual name of the entity, excluding module and class names.

Decorator `arguments` clauses

The arguments clauses is used to match on the arguments provided to the decorator. The supported arguments clauses are arguments.contains(...), which will match when the arguments specified are a subset of the decorator's arguments, and arguments.equals(...), which will match when the decorator has the specified arguments exactly.

arguments.contains() supports both positional and keyword arguments. For positional arguments, the list of positonal arguments supplied to the arguments.contains() clause must be a prefix of the list of positional arguments on the actual decorator, i.e. the value of the argument at each position should be the same. For example, with the following Python code:

@d1(a, 2)
def match1():
  ...

@d1(a, 2, 3, 4)
def match2():
  ...

@d1(2, a):
def nomatch():
  ...

This query will match both match1() and match2(), but not nomatch(), since the values of the positional arguments don't match up.

ModelQuery(
  name = "get_d1_decorator",
  find = "functions",
  where = Decorator(
    fully_qualified_name.matches("d1"),
    arguments.contains(a, 2)
  ),
  ...
)

For keyword arguments in arguments.contains(), the specified keyword arguments must be a subset of the decorator's keyword arguments, but can be specified in any order. For example, with the following Python code:

@d1(a, 2, foo="Bar")
def match1():
  ...

@d1(baz="Boo", foo="Bar")
def match2():
  ...

This query will match both match1() and match2():

ModelQuery(
  name = "get_d1_decorator",
  find = "functions",
  where = Decorator(
    fully_qualified_name.matches("d1"),
    arguments.contains(foo="Bar")
  ),
  ...
)

arguments.equals() operates similarly, but will only match if the specified arguments match the decorator's arguments exactly. This means that for positional arguments, all arguments in each position must match by value exactly. Keyword arguments can be specified in a different order, but the set of specified keyword arguments and the set of the decorator's actual keyword arguments must be the same. For example, with the following Python code:

@d1(a, 2, foo="Bar", baz="Boo")
def match1():
  ...

@d1(a, 2, baz="Boo", foo="Bar")
def match2():
  ...

@d1(2, a, baz="Boo", foo="Bar")
def nomatch1():
  ...

@d1(a, 2, 3, baz="Boo", foo="Bar")
def nomatch2():
  ...

This query will match both match1() and match2(), but not nomatch1() or nomatch2():

ModelQuery(
  name = "get_d1_decorator",
  find = "functions",
  where = Decorator(
    fully_qualified_name.matches("d1"),
    arguments.equals(a, 2, foo="bar", baz="Boo")
  ),
  ...
)

Decorator `Not`, `AllOf` and `AnyOf` clauses

The Not, AllOf and AnyOf clauses can be used in decorators clauses in the same way as they are in the main where clause of the model query.

`cls.fully_qualified_name.equals` clause

You may use the cls clause to specify predicates on the class. This predicate can only be used when the find clause specifies methods or attributes.

The cls.fully_qualified_name.equals clause is used to model entities when the class's fully qualified name is an exact match for the specified string.

Example:

ModelQuery(
  name = "get_childOf_foo_Bar",
  find = "methods",
  where = cls.name.equals("foo.Bar"),
  ...
)

`cls.fully_qualified_name.matches` clause

The cls.fully_qualified_name.matches clause is used to model entities when the class's fully qualified name matches the provided regex.

Example:

ModelQuery(
  name = "get_childOf_Foo",
  find = "methods",
  where = cls.fully_qualified_name.matches(".*Foo.*"),
  ...
)

`cls.name.matches` clause

The cls.name.matches clause is similar to cls.fully_qualified_name.matches, but matches against the actual name of the class, excluding modules.

`cls.name.equals` clause

The cls.name.equals clause is similar to cls.fully_qualified_name.equals, but matches against the actual name of the class, excluding modules.

`cls.extends` clause

The cls.extends clause is used to model entities when the class is a subclass of the provided class name.

Example:

ModelQuery(
  name = "get_subclassOf_C",
  find = "attributes",
  where = cls.extends("C"),
  ...
)

The default behavior is that it will only match if the class is an instance of, or a direct subclass of the specified class. For example, with classes:

class C:
  x = ...

class D(C):
  y = ...

class E(D):
  z = ...

the above query will only model the attributes C.z and D.y, since C is considered to extend itself, and D is a direct subclass of C. However, it will not model E.z, since E is a sub-subclass of C.

If you would like to model a class and all subclasses transitively, you can use the is_transitive flag.

Example:

ModelQuery(
  name = "get_transitive_subclassOf_C",
  find = "attributes",
  where = cls.extends("C", is_transitive=True),
  ...
)

This query will model C.x, D.y and E.z.

If you do not want to match on the class itself, you can use the includes_self flag.

Example:

ModelQuery(
  name = "get_transitive_subclassOf_C",
  find = "attributes",
  where = cls.extends("C", is_transitive=True, includes_self=False),
  ...
)

This query will model D.y and E.z.

`cls.decorator` clause

The cls.decorator clause is used to specify constraints on a class decorator, so you can choose to model entities on classes only if the class it is part of has the specified decorator.

The arguments for this clause are identical to the non-class constraint Decorator, for more information, please see the Decorator clauses section.

Example:

ModelQuery(
  name = "get_childOf_d1_decorator_sources",
  find = "methods",
  where = [
    name.equals("__init__"),
    cls.decorator(
      fully_qualified_name.matches("d1"),
      arguments.contains(2)
    ),
  ],
  model = [
    Parameters(TaintSource[Test], where=[
        Not(name.equals("self")),
        Not(name.equals("a"))
    ])
  ]
)

For example, the above query when run on the following code:

@d1(2)
class Foo:
  def __init__(self, a, b):
     ...

@d1()
class Bar:
  def __init__(self, a, b):
    ...

@d2(2)
class Baz:
  def __init__(self, a, b):
    ...

will result in a model for def Foo.__init__(b: TaintSource[Test]).

`cls.any_child` clause

The cls.any_child clause is used to model entities when any child of the current class meets the specified constraints.

The arguments for this clause are any combination of valid class constraints (cls.name.equals, cls.name.matches, cls.fully_qualified_name.equals, cls.fully_qualified_name.matches, cls.extends, cls.decorator) and logical clauses (AnyOf, AllOf, Not), along with the optional is_transitive and includes_self clauses.

Example:

ModelQuery(
  name = "get_parent_of_d1_decorator_sources",
  find = "methods",
  where = [
    name.equals("__init__"),
    cls.any_child(
      cls.decorator(
        fully_qualified_name.matches("d1"),
        arguments.contains(2)
      )
    ),
  ],
  model = [
    Parameters(TaintSource[Test], where=[
        Not(name.equals("self")),
        Not(name.equals("a"))
    ])
  ]
)

Similar to the cls.extends constraint, the default behavior is that it will only match if any immediate children (or itself) of the class of the method or attribute matches against the inner clause. For example, with classes:

class Foo:
  def __init__(self, a, b):
     ...

class Bar(Foo):
  def __init__(self, a, b):
    ...

@d1(2)
class Baz(Bar):
  def __init__(self, a, b):
    ...

The above query will only model the methods Bar.__init__ and Baz.__init__, since Bar is an immediate parent of Baz, and Baz is considered to extend itself. However, it will not model Foo.__init__, since Bar is a sub-subclass of Foo.

If you would like to model a class and all subclasses transitively, you can use the is_transitive flag.

Example:

ModelQuery(
  name = "get_transitive_parent_of_d1_decorator_sources",
  find = "attributes",
  where = [
    name.equals("__init__"),
    cls.any_child(
      cls.decorator(
        fully_qualified_name.matches("d1"),
        arguments.contains(2)
      ),
      is_transitive=True
    ),
  ],
  ...
)

This query will model Foo.__init__, Bar.__init__ and Baz.__init__.

If you would like to model all subclasses of a class excluding itself, you can use the includes_self flag.

Example:

ModelQuery(
  name = "get_transitive_parent_of_d1_decorator_sources",
  find = "attributes",
  where = [
    name.equals("__init__"),
    cls.any_child(
      cls.decorator(
        fully_qualified_name.matches("d1"),
        arguments.contains(2)
      ),
      is_transitive=True,
      includes_self=False
    ),
  ],
  ...
)

This query will model Foo.__init__, Bar.__init__ but NOT Baz.__init__.

tip

We recommend to always specify both is_transitive and includes_self to avoid confusion.

`cls.any_parent` clause

The cls.any_parent clause is used to model entities when any parent of the current class meets the specified constraints.

Example:

ModelQuery(
  name = "get_children_of_d1_decorator_sources",
  find = "methods",
  where = [
    name.equals("__init__"),
    cls.any_parent(
      cls.decorator(
        fully_qualified_name.matches("d1"),
        arguments.contains(2)
      )
    ),
  ],
  model = [
    Parameters(TaintSource[Test], where=[
        Not(name.equals("self")),
        Not(name.equals("a"))
    ])
  ]
)

Similar to the cls.extends constraint, the default behavior is that it will only match if any immediate parent (or itself) of the class of the method or attribute matches against the inner clause. For example, with classes:

@d1(2)
class Foo:
  def __init__(self, a, b):
     ...

class Bar(Foo):
  def __init__(self, a, b):
    ...

class Baz(Bar):
  def __init__(self, a, b):
    ...

The above query will only model the methods Bar.__init__ and Foo.__init__, since Foo is an immediate parent of Bar, and Foo is considered to extend itself. However, it will not model Baz.__init__, since Foo is not an immediate parent of Baz.

If you would like to model a class and all transitive parents, you can use the is_transitive flag.

Example:

ModelQuery(
  name = "get_transitive_children_of_d1_decorator_sources",
  find = "attributes",
  where = [
    cls.any_parent(
      cls.decorator(
        fully_qualified_name.matches("d1"),
        arguments.contains(2)
      ),
      is_transitive=True
    ),
    name.equals("__init__")
  ],
  ...
)

This query will model Foo.__init__, Bar.__init__ and Baz.__init__.

If you would like to model all parents of a class excluding itself, you can use the includes_self flag.

Example:

ModelQuery(
  name = "get_transitive_parent_of_d1_decorator_sources",
  find = "attributes",
  where = [
    cls.any_parent(
      cls.decorator(
        fully_qualified_name.matches("d1"),
        arguments.contains(2)
      ),
      is_transitive=True,
      includes_self=False
    ),
    name.equals("__init__")
  ],
  ...
)

This query will model Bar.__init__, Baz.__init__ but NOT Foo.__init__.

tip

We recommend to always specify both is_transitive and includes_self to avoid confusion.

`any_overriden_method` clause

The any_overriden_method clause is used to model methods that override a method that meets the specified constraints.

This clause accept a single argument, which is the constraint on the overriden method. It can use any constraints valid for methods, including AllOf, AnyOf and Not.

Example:

ModelQuery(
  name = "get_Foo_bar_overrides",
  find = "methods",
  where = [
    name.equals("bar"),
    any_overriden_method(
      cls.name.equals("Foo")
    )
  ],
  model = [
    Parameters(TaintSource[Test], where=[
        Not(name.equals("self")),
        Not(name.equals("a"))
    ])
  ]
)

This will add sources to all methods that override the method Foo.bar.

tip

This constraint is expensive to compute. To make it faster, we recommend to put cheaper constraints before it so it gets short-circuited. For instance, name.equals("bar") here can be placed before any_overriden_method and will make model generation faster.

To use multiple constraints inside any_overriden_method, use AllOf:

ModelQuery(
  name = "get_Foo_bar_overrides",
  find = "methods",
  where = [
    name.equals("bar"),
    any_overriden_method(AllOf(
      cls.name.equals("Foo"),
      Decorator(fully_qualified_callee.equals("my_module.my_decorator")),
    ))
  ],
  model = [
    Parameters(TaintSource[Test], where=[
        Not(name.equals("self")),
        Not(name.equals("a"))
    ])
  ]
)

To match all method overrides, use any_overriden_method(True). True is a constraint that is always met.

`Not` clauses

The Not clause negates any existing clause that is valid for the entity being modelled.

Example:

ModelQuery(
  name = "get_Not_example",
  find = "methods",
  where = [
    Not(
      name.matches("foo.*"),
      cls.fully_qualified_name.matches("testing.unittest.UnitTest"),
    )
  ],
  model = ...
)

Generated models (Model clauses)

The last bit of model queries is actually generating models for all entities that match the provided where clauses. For callables, we support generating models for parameters by name or position, as well as generating models for all paramaters. Additionally, we support generating models for the return annotation.

Returned taint

Returned taint takes the form of Returns(TaintSpecification), where TaintSpecification is either a taint annotation or a list of taint annotations.

ModelQuery(
  name = "get_Returns_sources",
  find = "methods",
  where = ...,
  model = [
    Returns(TaintSource[Test, Via[foo]])
  ]
)

Parameter taint

Parameters can be tainted using the Parameters() clause. By default, all parameters will be tained with the supplied taint specification. If you would like to only taint specific parameters matching certain conditions, an optional where clause can be specified to accomplish this, allowing for constraints on parameter names, the annotation type of the parameter, or parameter position. For example:

ModelQuery(
  name = "get_Parameters_sources",
  find = "methods",
  where = ...,
  model = [
    Parameters(TaintSource[A]), # will taint all parameters by default
    Parameters(
      TaintSource[B],
      where=[
        Not(index.equals(0))   # will only taint parameters that are not the first parameter
      ]
    ),
  ]
)

`name` clauses

To specify a constraint on parameter name, the name.equals() or name.matches() clauses can be used. As in the main where clause of the model query, equals() searches for an exact match on the specified string, while matches() allows a regex to be supplied as a pattern to match against.

Example:

ModelQuery(
  name = "get_request_data_sources",
  find = "methods",
  where = ...,
  model = [
    Parameters(
      TaintSource[Test],
      where=[
        name.equals("request"),
        name.matches("data$")
      ]
    )
  ]
)

`index` clause

To specify a constraint on parameter position, the index.equals() clause can be used. It takes a single integer denoting the position of the parameter.

Example:

ModelQuery(
  name = "get_index_sources",
  find = "methods",
  where = ...,
  model = [
    Parameters(
      TaintSource[Test],
      where=[
        index.equals(1)
      ]
    )
  ]
)

`has_position` clause

To match on parameters that have a position, the has_position() clause can be used. This is mostly used to exclude keyword-only parameters, *args and **kwargs.

Example:

ModelQuery(
  name = "get_index_sources",
  find = "methods",
  where = ...,
  model = [
    Parameters(
      TaintSource[Test],
      where=[
        has_position()
      ]
    )
  ]
)

`has_name` clause

To match on parameters that have a name, the has_name() clause can be used. This is mostly used to exclude *args and **kwargs.

Example:

ModelQuery(
  name = "get_index_sources",
  find = "methods",
  where = ...,
  model = [
    Parameters(
      TaintSource[Test],
      where=[
        has_name()
      ]
    )
  ]
)

`type_annotation` clause

This clause is used to specify a constraint on parameter type annotation. Currently the clauses supported are: type_annotation.equals(), which takes the fully-qualified name of a Python type or class and matches when there is an exact match, type_annotation.matches(), which takes a regex pattern to match type annotations against, and type_annotation.is_annotated_type(), which will match parameters of type typing.Annotated.

Example:

ModelQuery(
  name = "get_annotated_parameters_sources",
  find = "methods",
  where = ...,
  model = [
    Parameters(
      TaintSource[Test],
      where=[
        type_annotation.equals("foo.bar.C"),  # exact match
        type_annotation.matches("^List\["),   # regex match
        type_annotation.is_annotated_type(),  # matches Annotated[T, x]
      ]
    )
  ]
)

To match on the annotation portion of Annotated types, consider the following example. Suppose this code was in test.py:

from enum import Enum
from typing import Annotated, Option

class Color(Enum):
    RED = 1
    GREEN = 2
    BLUE = 3

class Foo:
  x: Annotated[Optional[int], Color.RED]
  y: Annotated[Optional[int], Color.BLUE]
  z: Annotated[int, "z"]

Note that the type name that should be matched against is its fully qualified name, which also includes the fully qualified name of any other types referenced (for example, typing.Optional rather than just Optional). When multiple arguments are provided to the type they are implicitly treated as being in a tuple.

Here are some examples of where clauses that can be used to specify models for the annotated attributes in this case:

ModelQuery(
  name = "get_annotated_attributes_sources",
  find = "attributes",
  where = [
    AnyOf(
      type_annotation.equals("typing.Annotated[(typing.Optional[int], test.Color.RED)]"),
      type_annotation.equals("typing.Annotated[(int, z)]"),
      type_annotation.matches(".*Annotated\[.*Optional[int].*Color\..*\]")
      type_annotation.is_annotated_type()
    )
  ],
  model = [
    AttributeModel(TaintSource[Test]),
  ]
)

This query should generate the following models:

test.Foo.x: TaintSource[Test]
test.Foo.y: TaintSource[Test]
test.Foo.z: TaintSource[Test]

`Not`, `AllOf` and `AnyOf` clauses

The Not, AllOf and AnyOf clauses can be used in the same way as they are in the main where clause of the model query. Not can be used to negate any existing clause, AllOf to match when all of several supplied clauses match, and AnyOf can be used to match when any one of several supplied clauses match.

Example:

ModelQuery(
  name = "get_Not_AnyOf_AllOf_example_sources",
  find = "methods",
  where = ...,
  model = [
    Parameters(
      TaintSource[Test],
      where=[
        Not(
          AnyOf(
            AllOf(
              cls.extends("a.b"),
              cls.name.matches("Foo"),
            ),
            AllOf(
              cls.extends("c.d"),
              cls.name.matches("Bar")
            )
          )
        )
      ]
    )
  ]
)

Using `ViaTypeOf` with the `Parameters` clause

Usually when specifying a ViaTypeOf the argument that you want to capture the value or type of should be specified. However, when writing model queries and trying to find all parameters that match certain conditions, we may not know the exact name of the parameters that will be modelled. For example:

def f1(bad_1, good_1, good_2):
  pass

def f2(good_3, bad_2, good_4):
  pass

Suppose we wanted to model all parameters with the prefix bad_ here and attach a ViaTypeOf to them. In this case it is still possible to attach these features to the parameter model, by using a standalone ViaTypeOf as follows:

ModelQuery(
  name = "get_f_sinks",
  find = "functions",
  where = name.matches("f"),
  model = [
    Parameters(
      TaintSink[Test, ViaTypeOf],
      where=[
        name.matches("bad_")
      ]
    )
  ]
)

This would produce models equivalent to the following:

def f1(bad_1: TaintSink[Test, ViaTypeOf[bad_1]]): ...
def f2(bad_2: TaintSink[Test, ViaTypeOf[bad_2]]): ...

Models for attributes

Taint for attribute models requires a AttributeModel model clause, which can only be used when the find clause specifies attributes.

Example:

ModelQuery(
  name = "get_attribute_sources_sinks",
  find = "attributes",
  where = ...,
  model = [
    AttributeModel(TaintSource[Test], TaintSink[Test])
  ]
)

Using `ViaAttributeName` with the `AttributeModel` clause

ViaAttributeName can be used within AttributeModel to add a feature containing the name of the attribute to any taint flowing through the given attributes.

For instance:

ModelQuery(
  name = "get_attribute_of_Foo",
  find = "attributes",
  where = [cls.name.equals("Foo")],
  model = [
    AttributeModel(ViaAttributeName[WithTag["Foo"]])
  ]
)

On the following code:

class Foo:
  first_name: str
  last_name: str

def last_name_to_sink(foo: Foo):
  sink(foo.last_name)

This will add the feature via-Foo-attribute:last_name on the flow to the sink.

Models for globals

Taint for global models requires a GlobalModel model clause, which can only be used when the find clause specifies globals.

Example:

ModelQuery(
  name = "get_global_sources",
  find = "globals",
  where = ...,
  model = [
    GlobalModel(TaintSource[Test])
  ]
)

Models for setting modes

This model clause is different from the others in this section in the sense that it doesn't produce taint for the models it targets, but updates their models with specific modes to change their behavior with taint analysis.

The available modes are:

Obscure
- Marks the function or method as obscure
SkipObscure
- Prevents a function or method from being marked as obscure
SkipAnalysis
- Skips inference of the function or model targeted, and forces the use of user-defined models for taint flow
SkipOverrides
- Prevents taint propagation from the targeted model into and from overridden methods on subclasses
Entrypoint
- Specifies functions or methods to be used as entrypoints for analysis, so only transitive calls from that function are analyzed
SkipModelBroadening
- Prevents model broadening for the given function or method

For instance, instead of annotating each function separately, as in the following .pysa file:

@Entrypoint
def myfile.func1(): ...

@Entrypoint
def myfile.func2(): ...

@Entrypoint
def myfile.func3(): ...

@Entrypoint
def myfile.func4(): ...

One could instead use the following model query:

ModelQuery(
  name = "get_myfile_entrypoint_functions",
  find = "functions",
  where = [
    name.matches("myfile\.func.*")
  ],
  model = [
    Modes([Entrypoint])
  ]
)

The benefit is that any new functions that matches that name will also be considered entrypoints.

Note that it is also possible to include multiple modes in a Modes model clause by extending the list (e.g Modes([SkipOverrides, Obscure]).

Expected and Unexpected Models clauses

The optional expected_models and unexpected_models clauses allow you to specify models that your ModelQuery should or should not generate the equivalent of. The models in these clauses should be syntactically correct Pysa models (see this documentation for a guide on how to write a Pysa model). If your query does not generate a model in expected_models, or if it generates a model in unexpected_models, an error will be raised.

Example:

ModelQuery(
  name = "get_foo_returns_sources",
  find = "functions",
  where = [name.matches("foo")],
  model = [
    Returns(TaintSource[Test]),
  ],
  expected_models = [
    "def file.foo() -> TaintSource[Test]: ...",
    "def file.foo2() -> TaintSource[Test]: ..."
  ],
  unexpected_models = [
    "def file.bar() -> TaintSource[Test]: ..."
  ]
)

This would not produce any errors, since the models the ModelQuery generates will contain expected_models and not unexpected_models.

Cache Queries

Generating models for a large number of queries can be quite slow. Cache queries allow to speed up model generation by factoring out queries with similar where clause into a single query, which builds a mapping from an arbitrary name to a set of matching entities. Then, other queries can read from this cache, making them quick to execute.

For instance, imagine having the following queries:

ModelQuery(
  ...
  find = "methods",
  where = [
    AnyOf(cls.extends("my_module.Foo"), cls.extends("other_module.Bar")),
    fully_qualified_name.matches("\.ClassA\.method$"),
  ],
  model = ...
)
ModelQuery(
  ...
  find = "methods",
  where = [
    AnyOf(cls.extends("my_module.Foo"), cls.extends("other_module.Bar")),
    fully_qualified_name.matches("\.ClassB\.method$"),
  ],
  model = ...
)
ModelQuery(
  ...
  find = "methods",
  where = [
    AnyOf(cls.extends("my_module.Foo"), cls.extends("other_module.Bar")),
    fully_qualified_name.matches("\.ClassC\.other_method$"),
  ],
  model = ...
)
# etc.

We can factor out the expensive where clause into a single query which writes to a key-value cache, using the WriteToCache clause.

ModelQuery(
  ...
  find = "methods",
  where = [AnyOf(cls.extends("my_module.Foo"), cls.extends("other_module.Bar"))],
  model = WriteToCache(kind="FooBar", name=f"{class_name}:{function_name}")
)

All matching methods will be stored in a cache named FooBar, under the key {class_name}:{function_name}.

After executing the query, we might get the following cache FooBar:

ClassA:method -> {some_module.ClassA.method}
ClassB:method -> {some_other_module.ClassB.method}
ClassC:other_method -> {some_module.ClassC.other_method}

We can then read from the cache using the where clause read_from_cache:

ModelQuery(
  find = "methods",
  where = read_from_cache(kind="FooBar", name="ClassA:method",
  model = ...
)
ModelQuery(
  find = "methods",
  where = read_from_cache(kind="FooBar", name="ClassB:method",
  model = ...
)
ModelQuery(
  find = "methods",
  where = read_from_cache(kind="FooBar", name="ClassC:other_method",
  model = ...
)

This will generate the same models as the first example, but model generation will be a lot faster.

In terms of time complexity, if the number of entities (methods here) is N, the number of queries is Q and the average cost of evaluating a where clause is C, the first example would have a O(N*Q*C) complexity. Using cache queries, this turns into O(N*C+Q), which is much better.

WriteToCache clause

WriteToCache is a model clause that is used to store entities into a cache. It takes the following arguments:

A kind, which is the name of the cache.
A name as a format string, which will be the key for the entity in the cache.

For instance:

ModelQuery(
  ...
  find = "methods",
  model = WriteToCache(kind="cache_name", name=f"{class_name}:{function_name}")
)

Note that you can write multiple entities under the same name. For instance, this happens if you use name=f"{class_name}" and multiple methods of the same class match against the where clause.

read_from_cache clause

read_from_cache is a where clause that will only match against entities with the given name in the cache. It takes the following arguments:

A kind, which is the name of the cache.
A name as a string, which is the key for the entities in the cache.

For instance:

ModelQuery(
  find = "methods",
  where = read_from_cache(kind="cache_name", name="Class:method"),
  model = ...
)

Note that you can use read_from_cache in combination with other where clauses, as long as at least one read_from_cache clause is active on all branches.

For instance, this is disallowed:

ModelQuery(
  find = "methods",
  where = AnyOf(
    read_from_cache(kind="cache_name", name="Class:method"),
    cls.extends("module.Foo")
  ),
  model = ...
)

Format strings

Format strings can be used to craft a string using information from the matched entity. They can be used in the WriteToCache name argument as well as the CrossRepositoryTaintAnchor canonical name and port arguments.

For instance:

WriteToCache(kind="cache_name", name=f"{class_name}:{function_name}")
CrossRepositoryTaintAnchor[TaintSink[Thrift], f"{class_name}:{function_mame}", f"formal({parameter_position + 1})"]

The following variables can be used:

function_name: The (non-qualified) name of the function;
method_name: The (non-qualified) name of the method;
class_name: The (non-qualified) name of the class;
parameter_name: The parameter name, when used within the Parameters clause;
parameter_position: The parameter position, when used within the Parameters clause. This will give -1 for keyword only parameters;
capture(identifier): The regular expression capture group called identifier. See documentation below.

Math operators such as +, - and * can be used on parameter_position and integer literals, such as f"{parameter_position * 2 + 1}".

Regular expression capture

name.matches and cls.name.matches clause can use named capturing groups, which can be used in the name of WriteToCache clauses.

For instance:

ModelQuery(
  find = "functions",
  where = name.matches("^get_(?P<attribute>[a-z]+)$"),
  model = WriteToCache(kind="cache_name", name=f"{capture(attribute)}")
)

For a function get_foo, this will create a cache for key foo.

caution

Be careful when using regular expression captures. If the capture group is not found (e.g, a typo), WriteToCache will use the empty string.

Note that we do not support numbered capture groups, e.g Foo(.*).

Logging group clauses

The logging_group_name clause specifies that the model query should be considered part of the given group for logging purposes. This is useful when auto generating large amounts of model queries. When verbose logging is enabled (-n), Pysa will print a single line Model Query group 'XXX' generated YYY models instead of printing one line per model query in the group.

For instance:

ModelQuery(
  name = "generated_dangerous_foo",
  logging_group_name = "generated_dangerous",
  find = "methods",
  where = read_from_cache(kind="annotated", name="foo"),
  model = ...
)
ModelQuery(
  name = "generated_dangerous_bar",
  logging_group_name = "generated_dangerous",
  find = "methods",
  where = read_from_cache(kind="annotated", name="bar"),
  model = ...
)

Basics​

Name clauses​

Find clauses​

Where clauses​

fully_qualified_name.matches​

fully_qualified_name.equals​

name.matches​

name.equals​

return_annotation clauses​

return_annotation.equals​

return_annotation.matches​

return_annotation.is_annotated_type​

return_annotation.extends​

type_annotation clauses​

type_annotation.equals​

type_annotation.matches​

type_annotation.is_annotated_type​

type_annotation.extends​

any_parameter clauses​

any_parameter.annotation.equals​

any_parameter.annotation.matches​

any_parameter.annotation.is_annotated_type​

AnyOf clauses​

AllOf clauses​

Decorator clauses​

Decorator fully_qualified_callee clauses​

Decorator name clauses​

Decorator arguments clauses​

Decorator Not, AllOf and AnyOf clauses​

cls.fully_qualified_name.equals clause​

cls.fully_qualified_name.matches clause​

cls.name.matches clause​

cls.name.equals clause​

cls.extends clause​

cls.decorator clause​

cls.any_child clause​

cls.any_parent clause​

any_overriden_method clause​

Not clauses​

Generated models (Model clauses)​

Returned taint​

Parameter taint​

name clauses​

index clause​

has_position clause​

has_name clause​

type_annotation clause​

Not, AllOf and AnyOf clauses​

Using ViaTypeOf with the Parameters clause​

Models for attributes​

Using ViaAttributeName with the AttributeModel clause​

Models for globals​

Models for setting modes​

Expected and Unexpected Models clauses​

Cache Queries​

WriteToCache clause​

read_from_cache clause​

Format strings​

Regular expression capture​

Logging group clauses​

Basics

Name clauses

Find clauses

Where clauses

`fully_qualified_name.matches`

`fully_qualified_name.equals`

`name.matches`

`name.equals`

`return_annotation` clauses

`return_annotation.equals`

`return_annotation.matches`

`return_annotation.is_annotated_type`

`return_annotation.extends`

`type_annotation` clauses

`type_annotation.equals`

`type_annotation.matches`

`type_annotation.is_annotated_type`

`type_annotation.extends`

`any_parameter` clauses

`any_parameter.annotation.equals`

`any_parameter.annotation.matches`

`any_parameter.annotation.is_annotated_type`

`AnyOf` clauses

`AllOf` clauses

`Decorator` clauses

Decorator `fully_qualified_callee` clauses

Decorator `name` clauses

Decorator `arguments` clauses

Decorator `Not`, `AllOf` and `AnyOf` clauses

`cls.fully_qualified_name.equals` clause

`cls.fully_qualified_name.matches` clause

`cls.name.matches` clause

`cls.name.equals` clause

`cls.extends` clause

`cls.decorator` clause

`cls.any_child` clause

`cls.any_parent` clause

`any_overriden_method` clause

`Not` clauses

Generated models (Model clauses)

Returned taint

Parameter taint

`name` clauses

`index` clause

`has_position` clause

`has_name` clause

`type_annotation` clause

`Not`, `AllOf` and `AnyOf` clauses

Using `ViaTypeOf` with the `Parameters` clause

Models for attributes

Using `ViaAttributeName` with the `AttributeModel` clause

Models for globals

Models for setting modes

Expected and Unexpected Models clauses

Cache Queries

WriteToCache clause

read_from_cache clause

Format strings

Regular expression capture

Logging group clauses