Dynamically Generating Models
Some sources and sinks may be too numerous or too rapidly changing for defining
them statically to be practical. For these scenarios, Pysa has the concept of
model generators, which can generate taint models by reading the project's source code before static analysis is
started. The current set of model generators is stored in
tools/generate_taint_models
within the pyre-check repository.
Pysa now has the concept of a Model DSL, which supports some model generation usecases which could previously only be done with model generators. You should prefer the Model DSL if it supports your usecase.
Running Model Generatorsβ
The majority of model generators require access to a running environment. For
example, the RESTApiSourceGenerator
needs to be able to access urlpatterns
configured for Django, meaning it has to import (and implicitly run) the file
you use to configure routing. The recommended way to run model generators is to set
up a small script within your repository that can run within the virtual
environment for your project. This tutorial
exercise
provides an example of how to setup and use model generators.
Example Model Generatorsβ
The set of model generators is always changing, but below are some examples of model generators which are currently provided out of the box with Pysa.
RESTApiSourceGenerator
β
This model generator is intended to taint all arguments to Django view
functions as
UserControlled
. This is useful when you have views that receive
user-controlled data as arguments separate from the HttpRequest
parameter,
such as when capturing values from the request
path.
ExitNodeGenerator
β
This generator is intended to taint all data returned from Django view
functions as
ReturnedToUser
. This is useful when you have decorators which allow your view
functions to return raw python types, rather than HttpResponse
objects. Note
that you do not need this generator if you always construct HttpResponse
objects, because they are already annotated as ReturnedToUser
sinks.
GraphQLSourceGenerator
β
This model generator is similar to the RESTApiSourceGenerator
and
ExitNodeGenerator
discussed above, but it is intended to generate models with
UserControlled
and ReturnedToUser
annotations for graphene-style GraphQL
resolver
functions.
AnnotatedFreeFunctionWithDecoratorGenerator
β
This model generator provides general purpose functionality to annotate all free functions which have a given decorator. The annotations can be used to mark any of the function's arguments or return types as sources, sinks, features, etc. This is useful whenever you have a function which modifies taint analysis expectations. For example, if you had a decorator which applies rate limiting to functions, you could use this model generator to add a feature to all flow passing through rate limited functions, to enable you to filter them out from a given rule.
Writing Model Generatorsβ
All model generator code lives in
tools/generate_taint_models
within the pyre-check repository.
Adding a new model generatorβ
This commit provides an example of how to add a new model generator.
The basic workflow is:
- Create a new file under
generate_taint_models
of the formget_<pattern of model>
. - Write a class that inherits from ModelGenerator.
- Collect all the callables you're interested in modeling via
gather_functions_to_model
. - Convert the callables you've collected into models. The CallableModel class is a convenience that pretty prints things in the right way - you just need to specify what kind of taint the parameters and return value should have, specify the callable to model, and call generate().
- Write unit tests (example).
- Import your new class in the
__init__
file (example).