Running Pysa

This page walks you through the basics of running Pysa. If you want exercises to walk you through using Pysa's more advanced features, check out this tutorial.

Setup

The setup requires the following 4 types of files.

  1. Source Code (*.py): This is your application's code.
  2. Taint Config (taint.config): This file declares sources, sinks, features, and rules.
  3. Taint Models (.pysa): These files link together the information in your source code and taint.config. They tell Pysa where in our code there exist sources and sinks.
  4. Pysa Configuration (.pyre_configuration): Parts of this file are critical to using Pysa. source_directories tells Pysa the directory containing the source code you want to analyze. taint_models_path tells Pysa where to find the config and model files.

Example

Let's look at a simple taint analysis example. To follow along, create a directory static_analysis_example and navigate to it. Paste the code snippets into the appropriately named files.

1. Source Code

# static_analysis_example/source.py
import os
def get_image(url):
command = "wget -q https:{}".format(url)
return os.system(command)
def convert():
image_link = input("image link: ")
image = get_image(image_link)

Notice the following:

  • The input function is a taint source since it gets input directly from the user.
  • The os.system function is a taint sink, since we do not want user-controlled values to flow into it.
  • The return value of input is used as the URL for a wget call, which is executed by os.system. The wget can therefore be doing anything, out of the programmer's control.
  • This data flow should be identified as a potential security issue.

2. Taint Config

# static_analysis_example/stubs/taint/taint.config
{
sources: [
{
name: "UserControlled",
comment: "use to annotate user input"
}
],
sinks: [
{
name: "RemoteCodeExecution",
comment: "use to annotate execution of code"
}
],
features: [],
rules: [
{
name: "Possible shell injection",
code: 5001,
sources: [ "UserControlled" ],
sinks: [ "RemoteCodeExecution" ],
message_format: "Data from [{$sources}] source(s) may reach [{$sinks}] sink(s)"
}
]
}

This declares the valid sources and sinks that Pysa should recognize. We also tell Pysa that data flowing from a UserControlled source to a RemoteCodeExecution sink is a possible shell injection.

3. Taint Models

# static_analysis_example/stubs/taint/general.pysa
# model for raw_input
def input(__prompt = ...) -> TaintSource[UserControlled]: ...
# model for os.system
def os.system(command: TaintSink[RemoteCodeExecution]): ...

This file links together the information in source.py and taint.config. We use it to tell Pysa where in our code there exist sources and sinks.

4. Pysa Configuration

# static_analysis_example/.pyre_configuration
{
"source_directories": ["."],
"taint_models_path": "stubs/taint"
}

Pysa needs to know what directory to analyze, as well as where to find the config and model files.

Analysis

Now let's run the static analysis:

[~/static_analysis_example] $ pyre analyze
ƛ Fixpoint iterations: 2
[
{
"line": 9,
"column": 22,
"path": "source.py",
"code": 5001,
"name": "Possible shell injection",
"description":
"Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
"long_description":
"Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
"concise_description":
"Possible shell injection [5001]: Data from [UserControlled] source(s) may reach [RemoteCodeExecution] sink(s)",
"inference": null,
"define": "source.convert"
}
]

Looking at the output, we see that pyre surfaces the tainted data flow that we expected.

Let's run it again and save the results:

[~/static_analysis_example] $ pyre analyze --save-results-to ./

The --save-results-to option will save more detailed results to ./taint-output.json.

Understanding the results

See Static Analysis Post Processor.