Open-source, server-side web archiving

How it works

ReproZip-Web is an approach to dynamic web archiving that captures the front- and back-end of a site and encapsulates it in a single distributable, preservable file.

Step 1: Trace and pack

Use ReproZip to make a capture of your web application and generate an .rpz file*

Use ReproZip to capture the backend assets. This requires a Linux operating system setup and for the web app you want to archive to work/run as expected.

Execute ReproZip at the same time that you run the dynamic web app. ReproZip uses ptrace, an internal Unix system utility that lets one process (ReproZip) observe and control the execution of another process (the web app). ReproZip notes down everything that the web app touches as it executes into a SQLite database that is then used to create a configuration .yaml file with lots of administrative and technical metadata about what happened during the web app’s execution. ReproZip will trace and record what version of which software was used, what operating system was used, any input and output files, and provenance of what runs in what order (in case multiple commands are used to get the app running).

You can edit the config file before the package is created, but that is generally discouraged because it can affect how the app runs later down the line.

After tracing, use ReproZip again to pack the web app and all its dependencies into the .rpz file.

Step 2: Upload

Upload your .rpz bundle to ReproServer

Step 3: Upload

Click the “Web Capture” button to use Webrecorder to run a high-fidelity crawl of the site

Once your .rpz is uploaded on ReproServer, follow the instructions to record the front-end assets using automatic recording or manual recording. Interact with the site as you want. When done, click the button to end the recording and save your .wacz. The export is an .rpz with a .wacz inside

Step 4: Download and replay

Download the resulting .rpz file, which contains a .wacz! This file can then be replayed within ReproServer and deposited into a repository or archive

Your preservation-ready .rpz file is now ready! Anyone with the .rpz can upload it to ReproServer to interact with the archived site. Share the URL to the replayed app on ReproServer with others, or deposit the .rpz file in an institutional repository for long-term preservation.