diff options
author | Luke Shumaker <lukeshu@parabola.nu> | 2021-05-01 14:42:19 -0600 |
---|---|---|
committer | Luke Shumaker <lukeshu@parabola.nu> | 2021-05-01 15:42:56 -0600 |
commit | 11ecb691f395289f2908b6f3ebdff294e3b6133d (patch) | |
tree | b7140342f3f6958749fd142fb30ddc462b967306 | |
parent | 043c052c810ac0bdb6e276e6418f8c075242e534 (diff) |
fi-prune-empty2: Comments and source docs
-rw-r--r-- | fi-prune-empty2/HACKING.md | 161 | ||||
-rw-r--r-- | fi-prune-empty2/prune.go | 21 | ||||
-rw-r--r-- | fi-prune-empty2/replace.go | 15 | ||||
-rw-r--r-- | fi-prune-empty2/srcrepo.go | 15 | ||||
-rw-r--r-- | fi-prune-empty2/stream.go | 115 |
5 files changed, 290 insertions, 37 deletions
diff --git a/fi-prune-empty2/HACKING.md b/fi-prune-empty2/HACKING.md new file mode 100644 index 0000000..da6ed91 --- /dev/null +++ b/fi-prune-empty2/HACKING.md @@ -0,0 +1,161 @@ +# DESCRIPTION + +Command fi-prune-empty is a filter that prunes a fast-import stream, +removing empty commits and empty merges. It writes `git replace` +references to map between the original git objects and the resulting +git objects. + +Like all of the filters in this repository, fi-prune-empty works by +implementing a `fiutil.Handler interface` that `main.go` passes to +`fiutil.RunHandler()`. However, there's an additional wrinkle: +because fi-prune-empty doesn't just care whether something *is* empty +and it cares whether that thing *became* empty as a result of +filtering the input stream, it needs a way (the `--srcdir=` option) to +bypass the input filtering and ask questions directly to the original +source repo. So there are two sources of input. And the "output" +stream actually has information flow bi-directionally, since we can +send `get-mark` queries to it; so kinda three sources of input. + +# CODE LAYOUT + +`main.go` +: Parses the arguments, and sets up the whole thing running by calling + `fiutil.RunHandler()`. + +`prune.go` +: Contains logic for deciding if a commit should be pruned or not, + which primarily consists of the `Pruner` object. + +`replace.go` +: Contains the logic for writing `git replace` refs, which primarily + consists of the `Replacer` object. + +`stream.go` +: Contains the logic for calling to the `Pruner` and the to the + `Replacer` (which `main.go` aggregated together in to an + implementation of the `stream.go:Driver` interface) and then + applying the results to the stream. This primarily consists of the + `Handler` object. + +`srcrepo.go` +: Contains the systems-y helper functions for interacting directly + with the source repo. + +# ARCHITECTURE OVERVIEW + +The fi-prune-empty `Handler` is one of the largest handlers that I've +implemented, and it is sort-of spread across several files. + +The `stream.go:Handler` struct contains the system logic of processing +the input stream and emiting the output stream, calling to an inner +`stream.go:Driver` interface for all of the business logic. + +```ascii + +-[fi-prune-empty]--------------------------------------------------+ + | | ++----------+ | +-[fiutil.RunHandler]-----------------+ | +----------+ +| src repo | +-[git fast-export]-+ +-[???]-+ | {frontend} | +-{stream.go:Handler}---+ | | | dst repo | +| >--->| |>-->| |>-->|>------------->|>-, | | | | | | +| | +-------------------+ +-------+ | | `->|>-, +-{Driver}--+ | | | | | +| | | | | `->| |<><>|<>-, | {backend} | +-[git fast-import]-+ | | +| | {--srcdir=} | {args.srcdir} | | | | | }-------------<>|<>--<>| |>---> | +| >----------------------------------------------------------------------->| |>-------' | | +-------------------+ | | +| | | | | +-----------+ | | | | | ++----------+ | | | | | | +----------+ + | | +-----------------------+ | | + | +-------------------------------------+ | + | | + +-------------------------------------------------------------------+ +``` + +The `Handler` contains a lot of boring clue, but it's *not just* +boring glue. It contains all of the logic for keeping track of which +marks and refs have been emitted, what each ref currently points to, +and what's in the tree of the current commit. It deals with a lot of +state. State is gross and kind of wants to gobble up all of the other +logic. Don't let it! Keep it easy to look at how decisions are made +by forcing it to call out to an external business-logic `Driver` for +those decisions! + +> First of all, I want to say that the `Handler` was designed by +> writing the "business-logic" how I thought best, and then +> implementing the "system-logic" around that to make it work. It was +> designed in a business→system order. But I'm going to explain it in +> a system→business order. + +The `Driver` interface is relatively simple (I've ordered the methods +in the rough order that they get called in): + +```ascii + +-{Driver}------------+ + | | +(input-commit) >---> ProcessCommit() >---> (output-commit) + | | - Do we even output this commit? + | | - Do we need to change this + | | commit's list of parents? + | | + | GotMark() <---------< (output-mark, output-hash) + | | + (input-mark) >---> FixMark() >---------> (output-mark) + | | + (input-mark) >---> FixCommitIsh() >----> (output-mark) + | | + | HandleDone() >------> (arbitrary stream output) + | | + +---------------------+ +``` + +`GotMark()` is mostly just called immediately after `Handler` deals +with the result of `ProcessCommit()`, but there are a few other times +it can get called as well. + +The `Driver` is actually implemented in two parts: + + 1. The `Replacer`, which mostly just listens until the very end, when + `HandleDone()` is called and it emits a bunch of `refs/replace/` + refs that record everything that happened. + 2. The `Pruner`, which takes an active in deciding what and how to + prune. + +`main.go` contains the aggregate `Driver` implementation; it `type +driver struct` aggregates the `Replacer` and the `Pruner` together in +to a complete `Driver` implementation: + +```ascii + +-{main.go:driver}----------------------+ + | | + | +-{Replacer}----+-{Pruner}-----+ | + | | : | | + | | : | | + | | : | | + | ,---> Replacer.ProcessCommit() | | +(input-commit) >-------> Pruner.ProcessCommit() >-------> (output-commit) + | | : | | + | | : | | + | | : | | + | | : | | + | | Replacer.GotMark() <---------, | + | | Pruner.GotMark() <-------------< (output-mark, output-hash) + | | : | | + | | : | | + | | +-------------+ | | + | | | | | + | | | | | + (input-mark) >--------> Pruner.FixMark() >--------------> (output-mark) + | | | | | + | | | | | + | | | | | + | | | | | + (input-mark) >--------> Pruner.FixCommitIsh() >---------> (output-mark) + | | | | | + | | | | | + | | +--------------------------+ | | + | | | | | + | | | | | + | | Replacer.HandleDone() >----------> (arbitrary stream output) + | | | | | + | | | | | + | +----------------------------+-+ | + | | + +--------------------------------------+ +``` diff --git a/fi-prune-empty2/prune.go b/fi-prune-empty2/prune.go index 8933872..ff0119b 100644 --- a/fi-prune-empty2/prune.go +++ b/fi-prune-empty2/prune.go @@ -1,3 +1,18 @@ +// Copyright 2019-2021 Luke Shumaker <lukeshu@parabola.nu> +// +// This program is free software: you can redistribute it and/or modify +// it under the terms of the GNU Affero General Public License as published by +// the Free Software Foundation, either version 3 of the License, or +// (at your option) any later version. +// +// This program is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU Affero General Public License for more details. +// +// You should have received a copy of the GNU Affero General Public License +// along with this program. If not, see <http://www.gnu.org/licenses/>. + package main import ( @@ -212,6 +227,8 @@ func (p *Pruner) wasFastForward(commit Commit) (bool, error) { return (tree == parentTree), nil } +// isEmpty returns whether `git commit` would require `--allow-empty` +// in order to create that commit. func (p *Pruner) isEmpty(commit Commit) (bool, error) { switch len(commit.Parents) { case 0: @@ -228,6 +245,10 @@ func (p *Pruner) isAncestor(ancestor, descendant Mark) bool { return ret } +// pruneParents prunes "duplicate" parents of a commit. It is +// equivalent to running `git merge-base --independent` in the +// destination repo (note: it isn't possible for us to run anything in +// the destination repo). func (p *Pruner) pruneParents(arg []Mark) []Mark { var ret []Mark outer: diff --git a/fi-prune-empty2/replace.go b/fi-prune-empty2/replace.go index 92bbdf7..7a12b66 100644 --- a/fi-prune-empty2/replace.go +++ b/fi-prune-empty2/replace.go @@ -1,3 +1,18 @@ +// Copyright 2019-2021 Luke Shumaker <lukeshu@parabola.nu> +// +// This program is free software: you can redistribute it and/or modify +// it under the terms of the GNU Affero General Public License as published by +// the Free Software Foundation, either version 3 of the License, or +// (at your option) any later version. +// +// This program is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU Affero General Public License for more details. +// +// You should have received a copy of the GNU Affero General Public License +// along with this program. If not, see <http://www.gnu.org/licenses/>. + package main import ( diff --git a/fi-prune-empty2/srcrepo.go b/fi-prune-empty2/srcrepo.go index b788a49..c771bc5 100644 --- a/fi-prune-empty2/srcrepo.go +++ b/fi-prune-empty2/srcrepo.go @@ -1,3 +1,18 @@ +// Copyright 2019-2021 Luke Shumaker <lukeshu@parabola.nu> +// +// This program is free software: you can redistribute it and/or modify +// it under the terms of the GNU Affero General Public License as published by +// the Free Software Foundation, either version 3 of the License, or +// (at your option) any later version. +// +// This program is distributed in the hope that it will be useful, +// but WITHOUT ANY WARRANTY; without even the implied warranty of +// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +// GNU Affero General Public License for more details. +// +// You should have received a copy of the GNU Affero General Public License +// along with this program. If not, see <http://www.gnu.org/licenses/>. + package main import ( diff --git a/fi-prune-empty2/stream.go b/fi-prune-empty2/stream.go index a785faa..da4b7be 100644 --- a/fi-prune-empty2/stream.go +++ b/fi-prune-empty2/stream.go @@ -78,10 +78,44 @@ type Commit struct { } type Driver interface { + // ProcessCommit takes an input Commit, and + // + // 1. Records any information about that Commit that the + // Driver will need for future operations. + // + // 2. Decide whether the output stream should contain that + // Commit at all; return (nil, nil) if the output stream + // should omit this Commit. + // + // 3. Potentially mutate the list of commit.Parents, which it + // returns. ProcessCommit(Commit) (*Commit, error) + + // GotMark is called to inform the Driver what output-repo + // Hash a given output-stream Mark refers to. The Driver + // returns true if this is a new Hash that has never been + // passed to GotMark before; the return value is just used + // progress statistics. GotMark(Mark, Hash) bool + + // FixMark takes an input-stream Mark, and returns the + // equivalent output-stream Mark. An error is signaled by + // returning a negative Mark. FixMark(Mark) Mark + + // FixCommitIsh takes an input-stream commitish, and returns + // the equivalent output-stream commitish. An error is + // signaled by returning an empty string. + // + // As a point of interest, the actual implementation (in + // prune.go) requires that the input commitish be a Mark; + // errors otherwise. FixCommitIsh(string) string + + // HandleDone gets called before the "done" command is + // emitted, but after any other commands that would be emitted + // by the Handler. It is called with a map of every ref that + // has been defined in the output-repo. HandleDone(refs map[string]Mark) error } @@ -145,7 +179,7 @@ func (h *Handler) CmdCommitEnd(cmd libfastimport.CmdCommitEnd) error { }) }() - // build the aggregated object for ProcessCommit + // Build the aggregated object for ProcessCommit. var cmt Commit var err error cmt.Mark = Mark(h.commitMeta.Mark) @@ -186,17 +220,21 @@ func (h *Handler) CmdCommitEnd(cmd libfastimport.CmdCommitEnd) error { sort.Stable(h.commitFile) cmt.Tree = h.commitFile - // remember this + // Remember this. defer func() { h.refs[h.commitMeta.Ref] = Mark(h.commitMeta.Mark) }() - // call ProcessCommit + // Call ProcessCommit. cmtptr, err := h.driver.ProcessCommit(cmt) if err != nil { return err } + + // Do something with the result. if cmtptr == nil { + // Drop the commit. + if _, refExists := h.refs[h.commitMeta.Ref]; !refExists { mark := h.driver.FixMark(Mark(h.commitMeta.Mark)) commitIsh := fmt.Sprintf(":%d", mark) @@ -212,46 +250,49 @@ func (h *Handler) CmdCommitEnd(cmd libfastimport.CmdCommitEnd) error { } } h.refs[h.commitMeta.Ref] = Mark(h.commitMeta.Mark) - return nil - } - - // apply the result of ProcessCommit to h.commitMeta - if len(cmtptr.Parents) == 0 { - h.commitMeta.From = EmptyHash - h.commitMeta.Merge = nil } else { - h.commitMeta.From = fmt.Sprintf(":%d", cmtptr.Parents[0]) - h.commitMeta.Merge = nil - for _, merge := range cmtptr.Parents[1:] { - h.commitMeta.Merge = append(h.commitMeta.Merge, fmt.Sprintf(":%d", merge)) + // Emit the commit. + + // apply the mutations from ProcessCommit to + // h.commitMeta + if len(cmtptr.Parents) == 0 { + h.commitMeta.From = EmptyHash + h.commitMeta.Merge = nil + } else { + h.commitMeta.From = fmt.Sprintf(":%d", cmtptr.Parents[0]) + h.commitMeta.Merge = nil + for _, merge := range cmtptr.Parents[1:] { + h.commitMeta.Merge = append(h.commitMeta.Merge, fmt.Sprintf(":%d", merge)) + } } - } - // emit the commit - if err := h.backend.Do(h.commitMeta); err != nil { - return errors.Wrapf(err, "processing commit :%d", h.commitMeta.Mark) - } - if err := h.backend.Do(libfastimport.FileDeleteAll{}); err != nil { - return errors.Wrapf(err, "processing commit :%d", h.commitMeta.Mark) - } - for _, file := range h.commitFile { - if err := h.backend.Do(file); err != nil { + // actually emit the commit + if err := h.backend.Do(h.commitMeta); err != nil { return errors.Wrapf(err, "processing commit :%d", h.commitMeta.Mark) } - } + if err := h.backend.Do(libfastimport.FileDeleteAll{}); err != nil { + return errors.Wrapf(err, "processing commit :%d", h.commitMeta.Mark) + } + for _, file := range h.commitFile { + if err := h.backend.Do(file); err != nil { + return errors.Wrapf(err, "processing commit :%d", h.commitMeta.Mark) + } + } - sha1, err := h.backend.GetMark(libfastimport.CmdGetMark{ - Mark: h.commitMeta.Mark, - }) - if err != nil { - return err - } - hash, err := AsHash(sha1) - if err != nil { - return err - } - if h.driver.GotMark(Mark(h.commitMeta.Mark), hash) { - h.commitsOut++ + // tell the driver about the emitted commit + sha1, err := h.backend.GetMark(libfastimport.CmdGetMark{ + Mark: h.commitMeta.Mark, + }) + if err != nil { + return err + } + hash, err := AsHash(sha1) + if err != nil { + return err + } + if h.driver.GotMark(Mark(h.commitMeta.Mark), hash) { + h.commitsOut++ + } } return nil |