HBS2 P2P Storage and Platform

hbs2-git

About

hbs2-git is a Git extension allowing it to use HBS2 storage as a backend.

It provides push/fetch operations for Git with P2P replication with other peers subscribed to the given repositories.

hbs2-git operates when hbs2-peer is up. So, first run hbs2-peer on the host as a service or just an application:

hbs2-peer run

hbs2-git interacts with hbs2-peer via RPC; it’s just a client application for hbs2-peer.

Check if hbs2-peer is up and hbs2-git is ready to work. If you get something like this:

hbs2-git3 hbs2:peer:poke
peer-key: "5GnroAC8FXNRL8rcgJj6RTu9mt1AbuNd5MZVnDBcCKzb"
udp: "0.0.0.0:7354"
tcp: "tcp://0.0.0.0:3001"
local-multicast: "239.192.152.145:10153"
rpc: "/tmp/hbs2-rpc.socket"
http-port: 5000

it means that everything is OK, and hbs2-git is good to go.

Note: hbs2-git and hbs2-git3 are the same. hbs2-git stands for the previous version, which is not compatible with the current (hbs2-git3).

Right now, hbs2-git is just an alias for hbs2-git3, so they may be considered the same.

Note: git-hbs2 is a Git extension that enables the git hbs2 command. So git hbs2 is a synonym for hbs2-git.

Prerequisites

git version >= 2.42
git version 2.44 has an issue and does not work
git version >= 2.45 works

Using

Brief

Setup new repo

git hbs2 init --new

Add an existing repo

git remote add origin hbs23://<repo>
git fetch origin

Clone an existing repo

git clone hbs23://<repo>

Setup a new unencrypted repository (Git remote)

Unencrypted means public. Anyone may access the repository if it’s available on their peers.

However, by design, only one author is able to write to the given repository.

By author, we mean the owner of the private key for the repository public key.

So, creating the new repository:

git hbs2 init --new
added git remote crater-stamp hbs23://CRciezyozQDQjbNbjc77eQHbXugELveRhCTDWmWpenV6

That’s it. You’ve just created a reference, i.e., a Git remote with the name crater-stamp.

git remote show crater-stamp
use git fetch to get the latest versions
* remote crater-stamp
  Fetch URL: hbs23://CRciezyozQDQjbNbjc77eQHbXugELveRhCTDWmWpenV6
  Push  URL: hbs23://CRciezyozQDQjbNbjc77eQHbXugELveRhCTDWmWpenV6
  HEAD branch: (unknown)

Note: hbs23:// reference means hbs2-git3 or hbs2-git version 3. hbs23 prefix appeared in order to not interfere with the previous version of hbs2-git and later will be the synonym for hbs2:// when hbs2-git version 2 is fully replaced with the current version. The hbs23:// prefix will continue to work.

You may see some details about your repo:

git hbs2 repo:manifest crater-stamp
(manifest
 (hbs2-git 3)
 (seed 7577042651815261350)
 (public)
 (reflog HbeVoQpGPCK7Z3Rcx9MPpzF858WuAeoM7JHqZcVPscNT))

Setup a new encrypted repository (testing!)

Key management issues are described in another document, so here are only essential details.

git hbs2 init --new --encrypted GK-HASH

GK-HASH is a hash of a group encryption key.

How to get it.

Brief:

Generate (or get an existing one) a encryption key (keys, is multiple readers required for the repo) (hbs2-cli hb2:keyring:new)
Create a group key with those keys (hbs2-cli hbs2:groupkey:create)
Store the generated key into the storage
Use the hash of the storeg group key as GK-HASH

Use hbs2-keyman list to list the known registered keys.

Example:

hbs2-cli [hbs2:groupkey:store  \
    [hbs2:groupkey:create \
        [list CcRDzezX1XQdPxRMuMKzJkfHFB4yG7vGJeTYvScKkbP8 \
         6v2cEWU5Kdmk1LLyCZRcLf2McpR6WDNbhcNJkKV8a6Ls ]]]

4Ps2JWv1EFP7B8DBHcTYvyuSUDw3LcffebQWY4jid2Vn

$ hbs2-cli hbs2:groupkey:dump 4Ps2JWv1EFP7B8DBHcTYvyuSUDw3LcffebQWY4jid2Vn
; fancy group key
group-key-id 9dr5CvitkECqsjLfT88U3cbtmjstCNGy1PmaSxsHMvTy
group-key-id-scheme basic1
group-key-timestamp 1742463870

member "CcRDzezX1XQdPxRMuMKzJkfHFB4yG7vGJeTYvScKkbP8"
member "6v2cEWU5Kdmk1LLyCZRcLf2McpR6WDNbhcNJkKV8a6Ls"

Here is it, the created group key.

Init the repo.

git hbs2 init --new --encrypted 4Ps2JWv1EFP7B8DBHcTYvyuSUDw3LcffebQWY4jid2Vn

Now only owners of the private keys for CcRDzezX1XQdPxRMuMKzJkfHFB4yG7vGJeTYvScKkbP8 and 6v2cEWU5Kdmk1LLyCZRcLf2McpR6WDNbhcNJkKV8a6Ls can decrypt this repo.

Detailed:

First, generate a keyring with encryption keys.

We’re using the hbs2-cli tool for this:

hbs2-cli [hbs2:keyman:keys:add [hbs2:keyring:new 1]]

It generates a new keyring [hbs2:keyring:new 1] with one encryption key and stores it [hbs2:keyman:keys:add ...] to the hbs2-keyman default keys directory and calls hbs2-keyman update in order to track the created keys.

Note: The generated key contains private information and must be kept secure and protected from leaks. Treat it as an SSH private key.

The key is stored in the hbs2-keyman default key path, which may be set using the hbs2-keyman configuration file along with other useful options:

cat ~/.config/hbs2-keyman/config

default-key-path /home/user/keys
key-files        /home/user/keys/**/*.key
key-files        /home/user/test-keys/**/*.key

You may keep your keys wherever you want. The keys automatically generated with hbs2-cli will be stored in default-key-path.

The default key path is ~/.hbs2-keyman/keys

Note: Secure key management is out of scope right now, so you may keep the keys on a mounted encrypted file system, for instance.

hbs2 tools do not keep or cache secret keys and only read them when needed, e.g., to decrypt data for reading.

After creating the encryption key (key pair), you need to create a group encryption key:

hbs2-cli hbs2:groupkey:store \
    [ hbs2:groupkey:create BbFbftY2fdpGa6eAP5dBeUVtqK1JLArH7tyJ7Du8upMu \
    AFFRDDokP4TzDv5gMGAEgSFVDR59NQbgPqxWN5iqcmhi ]

D7CuArtUA5u8yTQzxyMc37s3gfsMP9LjJ58u31jA6P6K

How it works

Technically, the repository is an LWW (CRDT: last write wins) reference.

This reference is updated by issuing transactions, each of which is signed by the owner of its private key.

Therefore, only the owner can update this reference.

This reference points to block, defining the repository, for example:

git hbs2 repo:manifest origin
(manifest
 (hbs2-git 3)  ;; hbs2-git version
 (seed 15200703225588472373) ;; some random data
 (public) ;; public/private flag. so far not really used
 (reflog G7tbCgFxgULK8U2NZZUthG5dB62cwLb62MgPM1UuadTk)) ;; reference to  reflog

The reflog is an important part. This is an another kind of references, it is a mutable pointer to a Merkle Tree of transactions.

In reflogs each transaction is signed with the private key of it’s owner.

Hosts share information about references, and in case of LWWRef each host will choose properly signed LWWRef transaction with greater sequence number:

hbs2-cli hbs2:lwwref:get 95EKVZ5wuZjZnc4FdytjhytdnDyApnrXpqetE7yG9bfr
(lwwref
  (seq 1738601750)
  (value "42nttBEUkzzZRx3sHnH1sUvYYSUAPY6iL5xsKRfBQ8Rd")
  )

The value here points to a value block for LWWRef, which maybe just any random user data. In case of hbs2-git it points to a tree with hbs2-git specific metadata:

; prints hashes of blocks of merkle tree

hbs2 cat -H 42nttBEUkzzZRx3sHnH1sUvYYSUAPY6iL5xsKRfBQ8Rd
63rVF2w1xLKHjD7bihEuB8gkhMRuMzfUVxzEe2jD5VN8

;; print block contents as raw data

hbs2 cat 63rVF2w1xLKHjD7bihEuB8gkhMRuMzfUVxzEe2jD5VN8
(hbs2-git 3)
(seed 15200703225588472373)
(public)
(reflog G7tbCgFxgULK8U2NZZUthG5dB62cwLb62MgPM1UuadTk)

So the manifest is a literally a plain text block with repository metadata.

Reflog is identified by a some public key. The private key of this public key is used to sign the reflog transactions.

In hbs2-git the private key for the reflog is cryptographically derived from LWWRef private key and the seed, in order tominimize the amount of keys being tracked.

Therefore, the hbs2-git repository is actually two references.

It made this way in order to make possible to change format of reflog in future keeping the published references same.

Or if we want to change the content of the repository, say repack it and reduce the number of transactions in order to improve performance.

Here we go:

hbs2-peer poll list | grep -A1 95EKVZ5wuZjZnc4FdytjhytdnDyApnrXpqetE7yG9bfr
95EKVZ5wuZjZnc4FdytjhytdnDyApnrXpqetE7yG9bfr 31  lwwref
G7tbCgFxgULK8U2NZZUthG5dB62cwLb62MgPM1UuadTk 17  reflog

The hbs2-peer subscribes to the mentioned references.

Data distribution occurs only if hbs2-peer is subscribed to some references; otherwise, you must fetch blocks or trees manually.

Each peer subscribed to a reference shares its data with other peers subscribed to the same reference.

In most cases, a peer does not distribute any data it is not subscribed to.

Thus, if you want your data to be distributed among multiple peers, all those peers must subscribe to the corresponding references.

For example, if some users have cloned your repository, their peers will distribute the repository’s data—even if it is encrypted and they lack the access key. They will simply relay the encrypted data. Encrypted data is just as valid for distribution as unencrypted data, except that it remains encrypted.

A block is a block, whether encrypted or not.

Reflogs

A reflog is a CRDT Grow-Only Set of signed transactions. Technically, it is a Merkle Tree of properly signed transactions.

Each peer queries other peers for the reflog’s value and listens for RefLogUpdate transactions via GOSSIP.

If a received transaction belongs to a watched reflog and is properly signed, the peer merges it into its reflog and announces the new Merkle Tree hash for it.

Eventually, all peers will converge on the same Merkle Tree for the reflog, with an identical hash.

Specifically to this version (3) of hbs2-git.

There are two types of transactions, Segment and Checkpoint:

git hbs2 repo:tx:list  origin
...
S 2xCfJ21esWV1DQeU6anr9af82X4VXMjrCEaNFKLG2jtQ CkaeFXfogVXuXXZUbUpqSbBYP1mXum1TScCyQ9jrgQrm
C HL4ckHuXGKJ2cFWWyDj35Br9yNiebifQC3rtJQfYujWY 459W4XQbRK3ePYXo33HmZdFnwVodNFwC8aftQfSbgzk3 1739026121
C 5XZk7JuivaeyLYNiuwKBhXkiUdY4gSjKrorQBeURBU6H Cyxj8XaD8yLapM4aXiG1Kc9eAbz6v2DgBZFVA16yZep7 1738785042
S E6aeeN5dQUNQcr23twubCJZTCrcRbuqys3dz1piAGHHk AEr8pvbpFUf8YcxbtomUXCQc3uP7bMwnkPH2KhTEk9rq
C NWZjvdPthMTbAYHvXDEmhhEeVd4E8jqbyMnbqoYSKC5  DCE3mEQoTY1xY546mWqjSBiY1Qg2Ca4h4UpQe5vCK7B7 1738602021
C CNpWtoGpKZsiqDGfLxSznDvTpHq6kfweyeDFXqntnkjK 8UcvNM2HEebBZUMx1MFCmwoW93aei32Cp6Uh3djxtLhK 1739453427
S 3BBWEHpM8BY5EzHhvo88jxbyxHCCPJADYBTPMFejjhgF GxcEBazZRAbNmKVkDdRrXC7kYGxBAiEGdW1B8DHr3LXH
...

Segments contain git repository data (blobs, trees, commits, references, etc.),

and a checkpoint points to some state of the reflog that is considered consistent, i.e., contains a consistent state of git objects.

I.e., if the checkpoint and all of its data are downloaded, it contains all the data to make a consistent copy of the git repository.

Technically, the segments are some KV logs with git objects and metadata, packed with zstd compressor. They are not git packs.

The reasons are the following: git packing scheme is quite good for its purpose, however, it is complicated to reproduce without git itself, slow to use with git commands, and, most importantly, it does not guarantee minimal data duplication in a P2P environment, because each clone of a git repository may produce git packs of its own. It is not a deterministic process in any way.

We cannot just share plain git data, it will not work, or it will not work well enough, or it will produce conflicts (references are not versioned).

Another reason is, in spite of git’s sophisticated delta mechanisms, zstd just compresses data better than git when segments are large enough (50 MB and higher).

It may be times better than git packs on zstd high compression rates.

The tradeoff here is data access. If we pack all objects together, they cannot be read separately.

The good side is that we do not need random access to the objects yet, because we merely dump the objects to the git repository.

The bad side is that objects are duplicated both in HBS2 storage and in the git repositories, and this could matter for huge repositories like Linux.

If we choose the size of a zstd-packed segment so that it still has a compression rate as good as git, the block sizes will be small enough to operate fast, and typically the block contains related objects, so just LRU caching will work well enough.

So, when hbs2-git unpacks the segments, it just produces valid git pack files (with minimal compression, no xdelta used). So it is a good idea to call git gc from time to time for git to repack the objects.

Security Model

In case of encrypted repository:

Segments are encrypted

So, the original data of your git repository is E2E encrypted with group key.

Checkpoints are not encrypted, they are merely list of hashes of transactions of the Reflog.

Manifest is not encrypted, but it could be potentially encrypted. It’s up to you to place any sensitive data to manifest or not.

Group key is a symmetric secret key encrypted multiple times with public keys of participants.

Group key is not a secret information. Private keys of member’s public keys are secret information and should be treated correspondingly.

Therefore, encrypted repositories may be distributed even by peers that have no read access to their data.

Keys updates are supported by commands

repo:gk
repo:gk:add:extra:keys
repo:gk:journal
repo:gk:journal:import
repo:gk:journal:imported
repo:gk:update

Group keys are also managed by hbs2-cli command:

hbs2-cli --help hbs2:groupkey
hbs2:groupkey:create
hbs2:groupkey:decrypt-block
hbs2:groupkey:dump
hbs2:groupkey:encrypt-block
hbs2:groupkey:find-secret
hbs2:groupkey:list-public-keys
hbs2:groupkey:load
hbs2:groupkey:publish
hbs2:groupkey:store
hbs2:groupkey:update

hbs2-git do not rotate group key automatically yet and it is questionable if it does any sense in case when members keys of group key remain same.

In case of comprometation of any member key the third side may have access to all data accessible by that member.

New data could be encrypted with the new group key where the compromised keys are excluded, but the older data remains untouched and there is no way to change immutable data retroactively.

So there is obviously no PFS, because new members have to have access to whole repository data. This requirement is something opposite to PFS guarantees, so this is by design.

Sounds not good, but you may notice that is not worse than any centralized service, at least HBS2 uses open and clear cryptography, data protection and data redundancy schemes and you may manage the risks.

Besides of this, hbs2-peer uses transport layer encryption and access control lists, so you may limit the set of peers that have access to the repositories.

In order to make it more secure you could have an overlay networks and peers bound to that network and full-disk encryption.

Again, just to remind. The data security, data redundancy for typical centralized service is not guaranteed anyhow. There are no known precedents when the centralized servises got any responsibility for leaking or loosing user data, at least for general users (small companies, private persons).

And large companies and corporations own solutions and it is for a reason.

And still there are documented examples of loosing all sources by hacker attacks on corporative infrastrucure caused to destroy all the source codes (that’s actually ridiculous) and well known examples of leaking source code of proprietary products.

In HBS2, you may control your data safety on your own.

The number of redundant copies of repository data is equal the number of peers subscribed to the data + the number of backups if you do backups.

Just note that any peer could distribute the data of the repository even it unable to decrypt it.

Changes against hbs2-git v2

The hbs2-git version is going to be discontinued, thankfully it has not been widely adopted.

SQLite database replaced by LSMT-like structure improving performance in order of magnitude for time-critical operations.
Fixed git references handling, now they versioned and could be deleted and written again multiple times and it works correcly, each git reference is timestamped and used tombs and lww CRDT mechanisms for resolution.
The metadata structure is redesigned to simplify and clarify
Introduced new, more reliable group keys tracking and distribution mechanisms
git packs generation improving the import operation performance
Introduced bf6 scripting language for integration and automation
Tested and works on “Openwrt” - sized repository