Dynamically Generating Models

Some sources and sinks may be too numerous or too rapidly changing for defining them statically to be practical. For these scenarios, Pysa has the concept of model generators, which can generate taint models by reading the project's source code before static analysis is started. The current set of model generators is stored in tools/generate_taint_models within the pyre-check repository.

Pysa now has the concept of a Model DSL, which supports some model generation usecases which could previously only be done with model generators. You should prefer the Model DSL if it supports your usecase.

Running Model Generators

The majority of model generators require access to a running environment. For example, the RESTApiSourceGenerator needs to be able to access urlpatterns configured for Django, meaning it has to import (and implicitly run) the file you use to configure routing. The recommended way to run model generators is to set up a small script within your repository that can run within the virtual environment for your project. This tutorial exercise provides an example of how to setup and use model generators.

Example Model Generators

The set of model generators is always changing, but below are some examples of model generators which are currently provided out of the box with Pysa.

RESTApiSourceGenerator

This model generator is intended to taint all arguments to Django view functions as UserControlled. This is useful when you have views that receive user-controlled data as arguments separate from the HttpRequest parameter, such as when capturing values from the request path.

ExitNodeGenerator

This generator is intended to taint all data returned from Django view functions as ReturnedToUser. This is useful when you have decorators which allow your view functions to return raw python types, rather than HttpResponse objects. Note that you do not need this generator if you always construct HttpResponse objects, because they are already annotated as ReturnedToUser sinks.

GraphQLSourceGenerator

This model generator is similar to the RESTApiSourceGenerator and ExitNodeGenerator discussed above, but it is intended to generate models with UserControlled and ReturnedToUser annotations for graphene-style GraphQL resolver functions.

AnnotatedFreeFunctionWithDecoratorGenerator

This model generator provides general purpose functionality to annotate all free functions which have a given decorator. The annotations can be used to mark any of the function's arguments or return types as sources, sinks, features, etc. This is useful whenever you have a function which modifies taint analysis expectations. For example, if you had a decorator which applies rate limiting to functions, you could use this model generator to add a feature to all flow passing through rate limited functions, to enable you to filter them out from a given rule.

Writing Model Generators

All model generator code lives in tools/generate_taint_models within the pyre-check repository.

Adding a new model generator

This commit provides an example of how to add a new model generator.

The basic workflow is:

  1. Create a new file under generate_taint_models of the form get_<pattern of model>.
  2. Write a class that inherits from ModelGenerator.
  3. Collect all the callables you're interested in modeling via gather_functions_to_model.
  4. Convert the callables you've collected into models. The CallableModel class is a convenience that pretty prints things in the right way - you just need to specify what kind of taint the parameters and return value should have, specify the callable to model, and call generate().
  5. Register your class in the registry (example).
  6. Write unit tests (example).
  7. Import your new class in the __init__ file (example).