summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorLuke Shumaker <lukeshu@parabola.nu>2021-05-01 14:42:19 -0600
committerLuke Shumaker <lukeshu@parabola.nu>2021-05-01 15:42:56 -0600
commit11ecb691f395289f2908b6f3ebdff294e3b6133d (patch)
treeb7140342f3f6958749fd142fb30ddc462b967306
parent043c052c810ac0bdb6e276e6418f8c075242e534 (diff)
fi-prune-empty2: Comments and source docs
-rw-r--r--fi-prune-empty2/HACKING.md161
-rw-r--r--fi-prune-empty2/prune.go21
-rw-r--r--fi-prune-empty2/replace.go15
-rw-r--r--fi-prune-empty2/srcrepo.go15
-rw-r--r--fi-prune-empty2/stream.go115
5 files changed, 290 insertions, 37 deletions
diff --git a/fi-prune-empty2/HACKING.md b/fi-prune-empty2/HACKING.md
new file mode 100644
index 0000000..da6ed91
--- /dev/null
+++ b/fi-prune-empty2/HACKING.md
@@ -0,0 +1,161 @@
+# DESCRIPTION
+
+Command fi-prune-empty is a filter that prunes a fast-import stream,
+removing empty commits and empty merges. It writes `git replace`
+references to map between the original git objects and the resulting
+git objects.
+
+Like all of the filters in this repository, fi-prune-empty works by
+implementing a `fiutil.Handler interface` that `main.go` passes to
+`fiutil.RunHandler()`. However, there's an additional wrinkle:
+because fi-prune-empty doesn't just care whether something *is* empty
+and it cares whether that thing *became* empty as a result of
+filtering the input stream, it needs a way (the `--srcdir=` option) to
+bypass the input filtering and ask questions directly to the original
+source repo. So there are two sources of input. And the "output"
+stream actually has information flow bi-directionally, since we can
+send `get-mark` queries to it; so kinda three sources of input.
+
+# CODE LAYOUT
+
+`main.go`
+: Parses the arguments, and sets up the whole thing running by calling
+ `fiutil.RunHandler()`.
+
+`prune.go`
+: Contains logic for deciding if a commit should be pruned or not,
+ which primarily consists of the `Pruner` object.
+
+`replace.go`
+: Contains the logic for writing `git replace` refs, which primarily
+ consists of the `Replacer` object.
+
+`stream.go`
+: Contains the logic for calling to the `Pruner` and the to the
+ `Replacer` (which `main.go` aggregated together in to an
+ implementation of the `stream.go:Driver` interface) and then
+ applying the results to the stream. This primarily consists of the
+ `Handler` object.
+
+`srcrepo.go`
+: Contains the systems-y helper functions for interacting directly
+ with the source repo.
+
+# ARCHITECTURE OVERVIEW
+
+The fi-prune-empty `Handler` is one of the largest handlers that I've
+implemented, and it is sort-of spread across several files.
+
+The `stream.go:Handler` struct contains the system logic of processing
+the input stream and emiting the output stream, calling to an inner
+`stream.go:Driver` interface for all of the business logic.
+
+```ascii
+ +-[fi-prune-empty]--------------------------------------------------+
+ | |
++----------+ | +-[fiutil.RunHandler]-----------------+ | +----------+
+| src repo | +-[git fast-export]-+ +-[???]-+ | {frontend} | +-{stream.go:Handler}---+ | | | dst repo |
+| >--->| |>-->| |>-->|>------------->|>-, | | | | | |
+| | +-------------------+ +-------+ | | `->|>-, +-{Driver}--+ | | | | |
+| | | | | `->| |<><>|<>-, | {backend} | +-[git fast-import]-+ | |
+| | {--srcdir=} | {args.srcdir} | | | | | }-------------<>|<>--<>| |>---> |
+| >----------------------------------------------------------------------->| |>-------' | | +-------------------+ | |
+| | | | | +-----------+ | | | | |
++----------+ | | | | | | +----------+
+ | | +-----------------------+ | |
+ | +-------------------------------------+ |
+ | |
+ +-------------------------------------------------------------------+
+```
+
+The `Handler` contains a lot of boring clue, but it's *not just*
+boring glue. It contains all of the logic for keeping track of which
+marks and refs have been emitted, what each ref currently points to,
+and what's in the tree of the current commit. It deals with a lot of
+state. State is gross and kind of wants to gobble up all of the other
+logic. Don't let it! Keep it easy to look at how decisions are made
+by forcing it to call out to an external business-logic `Driver` for
+those decisions!
+
+> First of all, I want to say that the `Handler` was designed by
+> writing the "business-logic" how I thought best, and then
+> implementing the "system-logic" around that to make it work. It was
+> designed in a business→system order. But I'm going to explain it in
+> a system→business order.
+
+The `Driver` interface is relatively simple (I've ordered the methods
+in the rough order that they get called in):
+
+```ascii
+ +-{Driver}------------+
+ | |
+(input-commit) >---> ProcessCommit() >---> (output-commit)
+ | | - Do we even output this commit?
+ | | - Do we need to change this
+ | | commit's list of parents?
+ | |
+ | GotMark() <---------< (output-mark, output-hash)
+ | |
+ (input-mark) >---> FixMark() >---------> (output-mark)
+ | |
+ (input-mark) >---> FixCommitIsh() >----> (output-mark)
+ | |
+ | HandleDone() >------> (arbitrary stream output)
+ | |
+ +---------------------+
+```
+
+`GotMark()` is mostly just called immediately after `Handler` deals
+with the result of `ProcessCommit()`, but there are a few other times
+it can get called as well.
+
+The `Driver` is actually implemented in two parts:
+
+ 1. The `Replacer`, which mostly just listens until the very end, when
+ `HandleDone()` is called and it emits a bunch of `refs/replace/`
+ refs that record everything that happened.
+ 2. The `Pruner`, which takes an active in deciding what and how to
+ prune.
+
+`main.go` contains the aggregate `Driver` implementation; it `type
+driver struct` aggregates the `Replacer` and the `Pruner` together in
+to a complete `Driver` implementation:
+
+```ascii
+ +-{main.go:driver}----------------------+
+ | |
+ | +-{Replacer}----+-{Pruner}-----+ |
+ | | : | |
+ | | : | |
+ | | : | |
+ | ,---> Replacer.ProcessCommit() | |
+(input-commit) >-------> Pruner.ProcessCommit() >-------> (output-commit)
+ | | : | |
+ | | : | |
+ | | : | |
+ | | : | |
+ | | Replacer.GotMark() <---------, |
+ | | Pruner.GotMark() <-------------< (output-mark, output-hash)
+ | | : | |
+ | | : | |
+ | | +-------------+ | |
+ | | | | |
+ | | | | |
+ (input-mark) >--------> Pruner.FixMark() >--------------> (output-mark)
+ | | | | |
+ | | | | |
+ | | | | |
+ | | | | |
+ (input-mark) >--------> Pruner.FixCommitIsh() >---------> (output-mark)
+ | | | | |
+ | | | | |
+ | | +--------------------------+ | |
+ | | | | |
+ | | | | |
+ | | Replacer.HandleDone() >----------> (arbitrary stream output)
+ | | | | |
+ | | | | |
+ | +----------------------------+-+ |
+ | |
+ +--------------------------------------+
+```
diff --git a/fi-prune-empty2/prune.go b/fi-prune-empty2/prune.go
index 8933872..ff0119b 100644
--- a/fi-prune-empty2/prune.go
+++ b/fi-prune-empty2/prune.go
@@ -1,3 +1,18 @@
+// Copyright 2019-2021 Luke Shumaker <lukeshu@parabola.nu>
+//
+// This program is free software: you can redistribute it and/or modify
+// it under the terms of the GNU Affero General Public License as published by
+// the Free Software Foundation, either version 3 of the License, or
+// (at your option) any later version.
+//
+// This program is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+// GNU Affero General Public License for more details.
+//
+// You should have received a copy of the GNU Affero General Public License
+// along with this program. If not, see <http://www.gnu.org/licenses/>.
+
package main
import (
@@ -212,6 +227,8 @@ func (p *Pruner) wasFastForward(commit Commit) (bool, error) {
return (tree == parentTree), nil
}
+// isEmpty returns whether `git commit` would require `--allow-empty`
+// in order to create that commit.
func (p *Pruner) isEmpty(commit Commit) (bool, error) {
switch len(commit.Parents) {
case 0:
@@ -228,6 +245,10 @@ func (p *Pruner) isAncestor(ancestor, descendant Mark) bool {
return ret
}
+// pruneParents prunes "duplicate" parents of a commit. It is
+// equivalent to running `git merge-base --independent` in the
+// destination repo (note: it isn't possible for us to run anything in
+// the destination repo).
func (p *Pruner) pruneParents(arg []Mark) []Mark {
var ret []Mark
outer:
diff --git a/fi-prune-empty2/replace.go b/fi-prune-empty2/replace.go
index 92bbdf7..7a12b66 100644
--- a/fi-prune-empty2/replace.go
+++ b/fi-prune-empty2/replace.go
@@ -1,3 +1,18 @@
+// Copyright 2019-2021 Luke Shumaker <lukeshu@parabola.nu>
+//
+// This program is free software: you can redistribute it and/or modify
+// it under the terms of the GNU Affero General Public License as published by
+// the Free Software Foundation, either version 3 of the License, or
+// (at your option) any later version.
+//
+// This program is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+// GNU Affero General Public License for more details.
+//
+// You should have received a copy of the GNU Affero General Public License
+// along with this program. If not, see <http://www.gnu.org/licenses/>.
+
package main
import (
diff --git a/fi-prune-empty2/srcrepo.go b/fi-prune-empty2/srcrepo.go
index b788a49..c771bc5 100644
--- a/fi-prune-empty2/srcrepo.go
+++ b/fi-prune-empty2/srcrepo.go
@@ -1,3 +1,18 @@
+// Copyright 2019-2021 Luke Shumaker <lukeshu@parabola.nu>
+//
+// This program is free software: you can redistribute it and/or modify
+// it under the terms of the GNU Affero General Public License as published by
+// the Free Software Foundation, either version 3 of the License, or
+// (at your option) any later version.
+//
+// This program is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+// GNU Affero General Public License for more details.
+//
+// You should have received a copy of the GNU Affero General Public License
+// along with this program. If not, see <http://www.gnu.org/licenses/>.
+
package main
import (
diff --git a/fi-prune-empty2/stream.go b/fi-prune-empty2/stream.go
index a785faa..da4b7be 100644
--- a/fi-prune-empty2/stream.go
+++ b/fi-prune-empty2/stream.go
@@ -78,10 +78,44 @@ type Commit struct {
}
type Driver interface {
+ // ProcessCommit takes an input Commit, and
+ //
+ // 1. Records any information about that Commit that the
+ // Driver will need for future operations.
+ //
+ // 2. Decide whether the output stream should contain that
+ // Commit at all; return (nil, nil) if the output stream
+ // should omit this Commit.
+ //
+ // 3. Potentially mutate the list of commit.Parents, which it
+ // returns.
ProcessCommit(Commit) (*Commit, error)
+
+ // GotMark is called to inform the Driver what output-repo
+ // Hash a given output-stream Mark refers to. The Driver
+ // returns true if this is a new Hash that has never been
+ // passed to GotMark before; the return value is just used
+ // progress statistics.
GotMark(Mark, Hash) bool
+
+ // FixMark takes an input-stream Mark, and returns the
+ // equivalent output-stream Mark. An error is signaled by
+ // returning a negative Mark.
FixMark(Mark) Mark
+
+ // FixCommitIsh takes an input-stream commitish, and returns
+ // the equivalent output-stream commitish. An error is
+ // signaled by returning an empty string.
+ //
+ // As a point of interest, the actual implementation (in
+ // prune.go) requires that the input commitish be a Mark;
+ // errors otherwise.
FixCommitIsh(string) string
+
+ // HandleDone gets called before the "done" command is
+ // emitted, but after any other commands that would be emitted
+ // by the Handler. It is called with a map of every ref that
+ // has been defined in the output-repo.
HandleDone(refs map[string]Mark) error
}
@@ -145,7 +179,7 @@ func (h *Handler) CmdCommitEnd(cmd libfastimport.CmdCommitEnd) error {
})
}()
- // build the aggregated object for ProcessCommit
+ // Build the aggregated object for ProcessCommit.
var cmt Commit
var err error
cmt.Mark = Mark(h.commitMeta.Mark)
@@ -186,17 +220,21 @@ func (h *Handler) CmdCommitEnd(cmd libfastimport.CmdCommitEnd) error {
sort.Stable(h.commitFile)
cmt.Tree = h.commitFile
- // remember this
+ // Remember this.
defer func() {
h.refs[h.commitMeta.Ref] = Mark(h.commitMeta.Mark)
}()
- // call ProcessCommit
+ // Call ProcessCommit.
cmtptr, err := h.driver.ProcessCommit(cmt)
if err != nil {
return err
}
+
+ // Do something with the result.
if cmtptr == nil {
+ // Drop the commit.
+
if _, refExists := h.refs[h.commitMeta.Ref]; !refExists {
mark := h.driver.FixMark(Mark(h.commitMeta.Mark))
commitIsh := fmt.Sprintf(":%d", mark)
@@ -212,46 +250,49 @@ func (h *Handler) CmdCommitEnd(cmd libfastimport.CmdCommitEnd) error {
}
}
h.refs[h.commitMeta.Ref] = Mark(h.commitMeta.Mark)
- return nil
- }
-
- // apply the result of ProcessCommit to h.commitMeta
- if len(cmtptr.Parents) == 0 {
- h.commitMeta.From = EmptyHash
- h.commitMeta.Merge = nil
} else {
- h.commitMeta.From = fmt.Sprintf(":%d", cmtptr.Parents[0])
- h.commitMeta.Merge = nil
- for _, merge := range cmtptr.Parents[1:] {
- h.commitMeta.Merge = append(h.commitMeta.Merge, fmt.Sprintf(":%d", merge))
+ // Emit the commit.
+
+ // apply the mutations from ProcessCommit to
+ // h.commitMeta
+ if len(cmtptr.Parents) == 0 {
+ h.commitMeta.From = EmptyHash
+ h.commitMeta.Merge = nil
+ } else {
+ h.commitMeta.From = fmt.Sprintf(":%d", cmtptr.Parents[0])
+ h.commitMeta.Merge = nil
+ for _, merge := range cmtptr.Parents[1:] {
+ h.commitMeta.Merge = append(h.commitMeta.Merge, fmt.Sprintf(":%d", merge))
+ }
}
- }
- // emit the commit
- if err := h.backend.Do(h.commitMeta); err != nil {
- return errors.Wrapf(err, "processing commit :%d", h.commitMeta.Mark)
- }
- if err := h.backend.Do(libfastimport.FileDeleteAll{}); err != nil {
- return errors.Wrapf(err, "processing commit :%d", h.commitMeta.Mark)
- }
- for _, file := range h.commitFile {
- if err := h.backend.Do(file); err != nil {
+ // actually emit the commit
+ if err := h.backend.Do(h.commitMeta); err != nil {
return errors.Wrapf(err, "processing commit :%d", h.commitMeta.Mark)
}
- }
+ if err := h.backend.Do(libfastimport.FileDeleteAll{}); err != nil {
+ return errors.Wrapf(err, "processing commit :%d", h.commitMeta.Mark)
+ }
+ for _, file := range h.commitFile {
+ if err := h.backend.Do(file); err != nil {
+ return errors.Wrapf(err, "processing commit :%d", h.commitMeta.Mark)
+ }
+ }
- sha1, err := h.backend.GetMark(libfastimport.CmdGetMark{
- Mark: h.commitMeta.Mark,
- })
- if err != nil {
- return err
- }
- hash, err := AsHash(sha1)
- if err != nil {
- return err
- }
- if h.driver.GotMark(Mark(h.commitMeta.Mark), hash) {
- h.commitsOut++
+ // tell the driver about the emitted commit
+ sha1, err := h.backend.GetMark(libfastimport.CmdGetMark{
+ Mark: h.commitMeta.Mark,
+ })
+ if err != nil {
+ return err
+ }
+ hash, err := AsHash(sha1)
+ if err != nil {
+ return err
+ }
+ if h.driver.GotMark(Mark(h.commitMeta.Mark), hash) {
+ h.commitsOut++
+ }
}
return nil