summaryrefslogtreecommitdiff
path: root/fi-prune-empty2/HACKING.md
diff options
context:
space:
mode:
Diffstat (limited to 'fi-prune-empty2/HACKING.md')
-rw-r--r--fi-prune-empty2/HACKING.md161
1 files changed, 161 insertions, 0 deletions
diff --git a/fi-prune-empty2/HACKING.md b/fi-prune-empty2/HACKING.md
new file mode 100644
index 0000000..da6ed91
--- /dev/null
+++ b/fi-prune-empty2/HACKING.md
@@ -0,0 +1,161 @@
+# DESCRIPTION
+
+Command fi-prune-empty is a filter that prunes a fast-import stream,
+removing empty commits and empty merges. It writes `git replace`
+references to map between the original git objects and the resulting
+git objects.
+
+Like all of the filters in this repository, fi-prune-empty works by
+implementing a `fiutil.Handler interface` that `main.go` passes to
+`fiutil.RunHandler()`. However, there's an additional wrinkle:
+because fi-prune-empty doesn't just care whether something *is* empty
+and it cares whether that thing *became* empty as a result of
+filtering the input stream, it needs a way (the `--srcdir=` option) to
+bypass the input filtering and ask questions directly to the original
+source repo. So there are two sources of input. And the "output"
+stream actually has information flow bi-directionally, since we can
+send `get-mark` queries to it; so kinda three sources of input.
+
+# CODE LAYOUT
+
+`main.go`
+: Parses the arguments, and sets up the whole thing running by calling
+ `fiutil.RunHandler()`.
+
+`prune.go`
+: Contains logic for deciding if a commit should be pruned or not,
+ which primarily consists of the `Pruner` object.
+
+`replace.go`
+: Contains the logic for writing `git replace` refs, which primarily
+ consists of the `Replacer` object.
+
+`stream.go`
+: Contains the logic for calling to the `Pruner` and the to the
+ `Replacer` (which `main.go` aggregated together in to an
+ implementation of the `stream.go:Driver` interface) and then
+ applying the results to the stream. This primarily consists of the
+ `Handler` object.
+
+`srcrepo.go`
+: Contains the systems-y helper functions for interacting directly
+ with the source repo.
+
+# ARCHITECTURE OVERVIEW
+
+The fi-prune-empty `Handler` is one of the largest handlers that I've
+implemented, and it is sort-of spread across several files.
+
+The `stream.go:Handler` struct contains the system logic of processing
+the input stream and emiting the output stream, calling to an inner
+`stream.go:Driver` interface for all of the business logic.
+
+```ascii
+ +-[fi-prune-empty]--------------------------------------------------+
+ | |
++----------+ | +-[fiutil.RunHandler]-----------------+ | +----------+
+| src repo | +-[git fast-export]-+ +-[???]-+ | {frontend} | +-{stream.go:Handler}---+ | | | dst repo |
+| >--->| |>-->| |>-->|>------------->|>-, | | | | | |
+| | +-------------------+ +-------+ | | `->|>-, +-{Driver}--+ | | | | |
+| | | | | `->| |<><>|<>-, | {backend} | +-[git fast-import]-+ | |
+| | {--srcdir=} | {args.srcdir} | | | | | }-------------<>|<>--<>| |>---> |
+| >----------------------------------------------------------------------->| |>-------' | | +-------------------+ | |
+| | | | | +-----------+ | | | | |
++----------+ | | | | | | +----------+
+ | | +-----------------------+ | |
+ | +-------------------------------------+ |
+ | |
+ +-------------------------------------------------------------------+
+```
+
+The `Handler` contains a lot of boring clue, but it's *not just*
+boring glue. It contains all of the logic for keeping track of which
+marks and refs have been emitted, what each ref currently points to,
+and what's in the tree of the current commit. It deals with a lot of
+state. State is gross and kind of wants to gobble up all of the other
+logic. Don't let it! Keep it easy to look at how decisions are made
+by forcing it to call out to an external business-logic `Driver` for
+those decisions!
+
+> First of all, I want to say that the `Handler` was designed by
+> writing the "business-logic" how I thought best, and then
+> implementing the "system-logic" around that to make it work. It was
+> designed in a business→system order. But I'm going to explain it in
+> a system→business order.
+
+The `Driver` interface is relatively simple (I've ordered the methods
+in the rough order that they get called in):
+
+```ascii
+ +-{Driver}------------+
+ | |
+(input-commit) >---> ProcessCommit() >---> (output-commit)
+ | | - Do we even output this commit?
+ | | - Do we need to change this
+ | | commit's list of parents?
+ | |
+ | GotMark() <---------< (output-mark, output-hash)
+ | |
+ (input-mark) >---> FixMark() >---------> (output-mark)
+ | |
+ (input-mark) >---> FixCommitIsh() >----> (output-mark)
+ | |
+ | HandleDone() >------> (arbitrary stream output)
+ | |
+ +---------------------+
+```
+
+`GotMark()` is mostly just called immediately after `Handler` deals
+with the result of `ProcessCommit()`, but there are a few other times
+it can get called as well.
+
+The `Driver` is actually implemented in two parts:
+
+ 1. The `Replacer`, which mostly just listens until the very end, when
+ `HandleDone()` is called and it emits a bunch of `refs/replace/`
+ refs that record everything that happened.
+ 2. The `Pruner`, which takes an active in deciding what and how to
+ prune.
+
+`main.go` contains the aggregate `Driver` implementation; it `type
+driver struct` aggregates the `Replacer` and the `Pruner` together in
+to a complete `Driver` implementation:
+
+```ascii
+ +-{main.go:driver}----------------------+
+ | |
+ | +-{Replacer}----+-{Pruner}-----+ |
+ | | : | |
+ | | : | |
+ | | : | |
+ | ,---> Replacer.ProcessCommit() | |
+(input-commit) >-------> Pruner.ProcessCommit() >-------> (output-commit)
+ | | : | |
+ | | : | |
+ | | : | |
+ | | : | |
+ | | Replacer.GotMark() <---------, |
+ | | Pruner.GotMark() <-------------< (output-mark, output-hash)
+ | | : | |
+ | | : | |
+ | | +-------------+ | |
+ | | | | |
+ | | | | |
+ (input-mark) >--------> Pruner.FixMark() >--------------> (output-mark)
+ | | | | |
+ | | | | |
+ | | | | |
+ | | | | |
+ (input-mark) >--------> Pruner.FixCommitIsh() >---------> (output-mark)
+ | | | | |
+ | | | | |
+ | | +--------------------------+ | |
+ | | | | |
+ | | | | |
+ | | Replacer.HandleDone() >----------> (arbitrary stream output)
+ | | | | |
+ | | | | |
+ | +----------------------------+-+ |
+ | |
+ +--------------------------------------+
+```