1. Smali API

Implementation of a line-based Smali source code parser using a visitor API.

1.1. Overview

The overall structur of a decompiled Smali class is rather simple. In fact a decompiled source class contains:

  • A section describing the class’ modifiers (such as public or private), its name, super class and the implemented interfaces.

  • One section per annotation at class level. Each section describes the name, modifiers (such as runtime or system), and the type. Optionally, there are sub-annotations included as values in each section.

  • One section per field declared in this class. Each section describes the name, modifiers (such as public or static), and the type. Optionally, there are annotations placed within the section.

  • One section per method, constructor (<cinit>) and static initializer block (clinit) describing their name, parameter types and return type. Each section may store instruction information.

  • Sections for line-based comments (lines that start with a #).

As Smali source code files does not contain any package or import statements like they are present Java classes, all names must be fully qualified. The structure of these names are described in the Java ASM documentationin section 2.1.2 [1].

1.1.1. Type descriptors

Type descriptors used in Smali source code are similar to those used in compiled Java classes. The following list was taken from ASM API [1]:

Type descriptors of some Smali types

Smali type

Type descriptor

Example value

void

V

boolean

Z

true or false

char

C

'a'

byte

B

1t

short

S

2s

int

I

0x1

float

F

3.0f

long

J

5l

double

D

2.0

Object

Ljava/lang/Object;

boolean[]

[Z

The descriptors of primitive types can be represented with single characters. Class type descriptors always start with a L and end with a semicolon. In addition to that, array type descriptors will start with opened square brackets according to the number of dimensions. For instance, a two dimensional array would get two opened square brackets in its type descriptor.

This API contains a class called SVMType that can be used to retrieve type descriptors as well as class names:

 1from smali import SVMType, Signature
 2
 3# simple type instance
 4t = SVMType("Lcom/example/Class;")
 5t.simple_name # Class
 6t.pretty_name # com.example.Class
 7t.dvm_name # com/example/Class
 8t.full_name # Lcom/example/Class;
 9t.svm_type # SVMType.TYPES.CLASS
10
11# create new type instance for method signature
12m = SVMType("getName([BLjava/lang/String;)Ljava/lang/String;")
13m.svm_type # SVMType.TYPES.METHOD
14# retrieve signature instance
15s = m.signature or Signature(m)
16s.return_type # SVMType("Ljava/lang/String;")
17s.parameter_types # [SVMType("[B"), SVMType("Ljava/lang/String;")]
18s.name # getName
19s.declaring_class # would return the class before '->' (only if defined)
20
21# array types
22array = SVMType("[B")
23array.svm_type # SVMType.TYPES.ARRAY
24array.array_type # SVMType("B")
25array.dim # 1 (one dimension)

As an input can be used anything that represents the class as type descriptor, original class name or internal name (array types are supported as well).

1.1.2. Method descriptors

Unlike method descriptors in compiled Java classes, Smali’s method descriptors contain the method’s name. The general structure, described in detail in the ASM API [1] documentation, is the same. To get contents of a method descriptor the SVMType class, introduced before, can be used again:

 1from smali import SVMType
 2
 3method = SVMType("getName([BLjava/lang/String;)Ljava/lang/String;")
 4# get the method's signature
 5signature = method.signature
 6# get parameter type descriptors
 7params: list[SVMType] = signature.parameter_types
 8# get return type descriptor
 9return_type = signature.return_type
10
11# the class type can be retrieved if defined
12cls: SVMType = signature.declaring_class

Caution

The initial method descriptor must be valid as it can cause undefined behaviour if custom strings are used.

1.2. Interfaces and components

The Smali Visitor-API for generating and transforming Smali-Source files (no bytecode data) is based on the ClassVisitor class, similar to the ASM API [1] in Java. Each method in this class is called whenever the corresponding code structure has been parsed. There are two ways how to visit a code structure:

  1. Simple visit:

    All necessary information are given within the method parameters

  2. Extendend visit:

    To deep further into the source code, another visitor instance is needed (for fields, methods, sub-annotations or annotations and even inner classes)

The same rules are applied to all other visitor classes. The base class of all visitors must be VisitorBase as it contains common methods all sub classes need:

class VisitorBase:
    def __init__(self, delegate) -> None: ...
    def visit_comment(self, text: str) -> None: ...
    def visit_eol_comment(self, text: str) -> None: ...
    def visit_end(self) -> None: ...

All visitor classes come with a delegate that can be used together with the initial visitor. For instance, we can use our own visitor class together with the provided SmaliWriter that automatically writes the source code.

Note

The delegate must be an instance of the same class, so FieldVisitor objects can’t be applied to MethodVisitor objects as a delegate.

The provided Smali API provides three core components:

  • The SmaliReader class is an implementation of a line-based parser that can handle .smali files. It can use both utf-8 strings or bytes as an input. It calls the corresponding visitXXX methods on the ClassVisitor.

  • The SmaliWriter is a subclass of ClassVisitor that tries to build a Smali file based on the visited statements. It comes together with an AnnotationWriter, FieldWriter and MethodWriter. It produces an output utf-8 string that can be encoded into bytes.

  • The XXXVisitor classes delegate method calls to internal delegate candicates that must be set with initialisation.

The next sections provide basic usage examples on how to generate or transform Smali class files with these components.

1.2.1. Parsing classes

The only required component to parse an existing Smali source file is the SmaliReader component. To illustrate an example usage, assume we want to print out the parsed class name, super class and implementing interfaces:

 1from smali import ClassVisitor
 2
 3class SmaliClassPrinter(ClassVisitor):
 4    def visit_class(self, name: str, access_flags: int) -> None:
 5        # The provided name is the type descriptor - if we want the
 6        # Java class name, use a SVMType() call:
 7        # cls_name = SVMType(name).simple_name
 8        print(f'.class {name}')
 9
10    def visit_super(self, super_class: str) -> None:
11        print(f".super {super_class}")
12
13    def visit_implements(self, interface: str) -> None:
14        print(f".implements {interface}")

The second step is to use our previous defined visitor class with a SmaliReader component:

1# Let's assume the source code is stored here
2source = ...
3
4printer = SmaliClassPrinter()
5reader = SmaliReader(comments=False)
6reader.visit(source, printer)

The fifth line creates a SmaliReader that ignores all comments in the source file to parse. The visit method is called at the end to parse the source code file.

1.2.2. Generating classes

The only required component to generate a new Smali source file is the SmaliWriter component. For instance, consider the following class:

1.class public abstract Lcom/example/Car;
2.super Ljava/lang/Object;
3
4.implements Ljava/lang/Runnable;
5
6.field private id:I
7
8.method public abstract run()I
9.end method

It can be generated within seven method calls to a SmaliWriter:

 1from smali import SmaliWriter, AccessType
 2
 3writer = SmaliWriter()
 4# Create the .class statement
 5writer.visit_class("Lcom/example/Car;", AccessType.PUBLIC + AccessType.ABSTRACT)
 6# Declare the super class
 7writer.visit_super("Ljava/lang/Object;")
 8# Visit the interface implementation
 9writer.visit_implements("Ljava/lang/Runnable")
10
11# Create the field id
12writer.visit_field("id", AccessType.PRIVATE, "I")
13
14# Create the method
15m_writer = writer.visit_method("run", AccessType.PUBLIC + AccessType.ABSTRACT, [], "V")
16m_writer.visit_end()
17
18# finish class creation
19writer.visit_end()
20source_code = writer.code

At line 3 a SmaliWriter is created that will actually build the source code string.

The call to visit_class defines the class header (see line 1 of smali source code). The first argument represents the class’ type descriptor and the second its modifiers. To specify additional modifiers, use the AccessType class. It provides two ways how to retrieve the actual modifiers:

  • Either by referencing the enum (like AccessType.PUBLIC)

  • or by providing a list of keywords that should be translated into modifier flags:

    modifiers = AccessType.get_flags(["public", "final"])
    

The calls to visit_super defines the super class of our previously defined class and to visit_implements specifies which interfaces are implemented by our class. All arguments must be type descriptors to generate accurate Smali code (see section Type descriptors for more information on the type class)