2. Smali Reader API

This module contains an implementation of a line-based Smali source code parser. It can be used to parse Smali files as an output of a decompiling routine.

The parsing model is rather simple and will be described in the following chapter.

Note

Please note that no optimization is done by default, so methods or fields that won’t be visited by a SmaliWriter won’t be covered and won’t be visible in the final file.

To copy non-visited structures, just add the reader variable to the SmaliWriter when creating a new one:

1reader = SmaliReader(...)
2writer = SmaliWriter(reader)
3
4reader.visit(source_code, writer)

Hint

You can add your own copy_handler to the reader instance if you want to use your own callback to copy raw lines.

2.1. Parsing model

Parsing is done by inspecting each line as a possible input. Although, some statements consume more than one line, only one line per statement is used.

Speaking of statements, they can be devided into groups:

  • Token:

    Statements begin with a leading . and can open a new statement block. They are specified in the Token class described in Smali token

  • Invocation blocks:

    Block statements used within method declarations start with a : and just specify the block’s id.

  • Annotation values:

    Annotation values don’t have a leading identifier and will only be parsed within .annotation or .subannotation statements.

  • Method instructions:

    Same goes with method instructions - they will be handled only if a method context is present.

class smali.reader.SupportsCopy

Interface for classes that can react as a copy handler for a SmaliReader.

Note that the context is used to distinguish the current visitor.

copy(line: str, context: type = <class 'smali.visitor.ClassVisitor'>) None

Copies the given line.

Parameters:

line (str) – the line to copy

class smali.reader.SmaliReader(validate: bool = True, comments: bool = False, snippet: bool = False, errors: str = 'strict')

Basic implementation of a line-base Smali-SourceCode parser.

Parameters:
  • validate (bool, optional) – Indicates the reader should validate the input code, defaults to True

  • comments (bool, optional) – With this option enabled, the parser will also notify about comments in the source file, defaults to False

  • snippet (bool, optional) – With this option enabled, the initial class definition will be skipped, defaults to False

  • errors (str, optional) – Indicates whether this reader should throw errors (values: strict, ignore), defaults to ‘strict’

_class_def(next_line=True, inner_class=False)

Parses (and verifies) the class definition.

Parameters:
  • visitor (ClassVisitor) – the visitor instance

  • next_line (bool, optional) – whether the next line should be used, defaults to True

  • inner_class (bool, optional) – whether the class is an inner class, defaults to False

Raises:
  • SyntaxError – if EOF is reached

  • SyntaxError – if EOL is reached

Returns:

an inner class ClassVisitor instance if inner_class it True

Return type:

ClassVisitor | None

_collect_values(strip_chars=None) list

Collects all values stored in the rest of the current line.

Note that values will be splitted if ‘,’ is in a value, for instance: >>> line = “const/16 b,0xB” >>> _collect_values(‘,’) [‘const/16’, ‘b’, ‘0xB’]

Parameters:

strip_chars (str, optional) – the chars to strip first, defaults to None

Returns:

the collected values

Return type:

list

_do_visit() None

Performs the source code visit.

Parameters:
  • source (io.IOBase) – the source to read from

  • visitor (ClassVisitor) – the visitor to notify

_handle_end() None

Removes the active visitor from the stack.

_handle_source() None

Handles .source definitions and their comments.

Parameters:

visitor (ClassVisitor) – the visitor to notify

_next_line()

Reads until the next code statement.

Comments will be returned to the visitory immediately.

Parameters:
  • source (io.IOBase) – the source to read from

  • visitor (ClassVisitor) – the visitor to notify

Raises:

EOFError – if the end of file has beeen reached

_read_access_flags() list

Tries to resolve all access flags of the current line

Returns:

the list of access flags

Return type:

list

_validate_descriptor(name: str) None

Validates the given name if validation is enabled.

Parameters:

name (str) – the type descriptor, e.g. ‘Lcom/example/ABC;’

Raises:

SyntaxError – if the provided string is not a valid descriptor

_validate_token(token: str, expected: Token) None

Validates the given token if validation is enabled.

Parameters:
  • token (str) – the token to verify

  • expected (str) – the expected token value

Raises:

SyntaxError – if validation failed

property _visitor: VisitorBase

Returns the active visitor instance.

Returns:

the active visitor.

Return type:

ClassVisitor

comments: bool = True

With this option enabled, the parser will also notify about comments in the source file.

errors: str = 'strict'

Indicates whether this reader should throw errors (values: ‘strict’, ‘ignore’)

line: Line = <smali.base.Line object>

The current line. (Mainly used for debugging purposes)

snippet: bool = False

With this option enabled, the initial class definition will be skipped.

source: IOBase

The source to read from.

stack: list = []

Stores the current visitors (index 0 stores the initial visitor)

A null value indicates that no visitors are registered for the current parsing context.

validate: bool = False

Indicates the reader should validate the input code.

visit(source: IOBase, visitor: ClassVisitor) None

Parses the given input which can be any readable source.

Parameters:
  • source (io.IOBase | str | bytes) – the Smali source code

  • visitor (ClassVisitor, optional) – the visitor to use, defaults to None

Raises:
  • ValueError – If the provided values are null

  • TypeError – if the source type is not accepted

  • ValueError – if the source is not readable