Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HuggingBento: A Bento-flavoured distro running Hugging Face Transformers #108

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .dockerignore
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,5 @@ icon.png
LICENSE
README.md
target/bin/bento
target/bin/huggingbento
target/dist
10 changes: 10 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -115,3 +115,13 @@ jobs:
platforms: linux/amd64,linux/arm64
push: true
tags: ${{ steps.docker_meta.outputs.tags }}

- name: Build and push
uses: docker/build-push-action@v6
with:
context: ./
file: ./resources/huggingbento/Dockerfile
builder: ${{ steps.buildx.outputs.name }}
platforms: linux/amd64,linux/arm64
push: true
tags: ${{ steps.docker_meta.outputs.tags }}
20 changes: 17 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,10 @@ DATE := $(shell date +"%Y-%m-%dT%H:%M:%SZ")

VER_FLAGS = -X main.Version=$(VERSION) -X main.DateBuilt=$(DATE)

LD_FLAGS ?= -w -s
GO_FLAGS ?=
DOCS_FLAGS ?=
LD_FLAGS ?= -w -s
CGO_LDFLAGS ?=
GO_FLAGS ?=
DOCS_FLAGS ?=

APPS = bento
all: $(APPS)
Expand Down Expand Up @@ -63,6 +64,15 @@ $(PATHINSTSERVERLESS)/%: $(SOURCE_FILES)

$(SERVERLESS): %: $(PATHINSTSERVERLESS)/%

HUGGINGBENTO = huggingbento
hugging-bento: $(HUGGINGBENTO)

$(PATHINSTBIN)/$(HUGGINGBENTO): $(SOURCE_FILES)
@CGO_ENABLED=1 \
go build $(GO_FLAGS) -tags "$(TAGS) huggingbento" -ldflags "$(LD_FLAGS) $(VER_FLAGS) -X main.BinaryName=huggingbento -X main.ProductName=huggingbento" -o $@ ./cmd/bento

$(HUGGINGBENTO): %: $(PATHINSTBIN)/%

docker-tags:
@echo "latest,$(VER_CUT),$(VER_MAJOR).$(VER_MINOR),$(VER_MAJOR)" > .tags

Expand All @@ -80,6 +90,10 @@ docker-cgo:
@docker build -f ./resources/docker/Dockerfile.cgo . -t $(DOCKER_IMAGE):$(VER_CUT)-cgo
@docker tag $(DOCKER_IMAGE):$(VER_CUT)-cgo $(DOCKER_IMAGE):latest-cgo

docker-huggingbento:
@docker build -f ./resources/huggingbento/Dockerfile . -t ghcr.io/warpstreamlabs/huggingbento:$(VER_CUT)
@docker tag ghcr.io/warpstreamlabs/huggingbento:$(VER_CUT) ghcr.io/warpstreamlabs/huggingbento:latest

fmt:
@go list -f {{.Dir}} ./... | xargs -I{} gofmt -w -s {}
@go list -f {{.Dir}} ./... | xargs -I{} goimports -w -local github.com/warpstreamlabs/bento {}
Expand Down
19 changes: 18 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,24 @@ go install -tags "x_bento_extra" github.com/warpstreamlabs/bento/cmd/bento@lates
make TAGS=x_bento_extra
```

Note that this tag may change or be broken out into granular tags for individual components outside of major version releases. If you attempt a build and these dependencies are not present you'll see error messages such as `ld: library not found for -lzmq`.
### hugging-bento

`hugging-bento` is a Bento distribution that supports running [ONNX models](https://onnxruntime.ai/). It leverages the [`knights-analytics/hugot`](https://github.com/knights-analytics/hugot) package which in turn has two external dependencies:
- An `onnxruntime.*` ONNX Runtime dynamic library file. This can be obtained from the [onnxruntime project releases page](https://github.com/microsoft/onnxruntime/releases). This is dynamically linked by hugot and used by the onnxruntime inference library `onnxruntime_go`. This can be set with the `onnx_library_path` flag when loading your `config.yaml`.
- The `tokenizers.a` file. This should be at `/usr/lib/tokenizers.a` by default otherwise it can be set with `CGO_LDFLAGS=-L/path/to/tokenizers.a` so that hugot can load it.

There are instructions for configuring these external libraries at [`knights-analytics/hugot#use-it-as-a-library`](https://github.com/knights-analytics/hugot?tab=readme-ov-file#use-it-as-a-library). Alternatively, you can use the `hugging-bento` [Docker image](resources/docker/huggingbento/Dockerfile) which has all of these dependencies baked in.

```shell
# The location of the tokenizers.a file
export CGO_LDFLAGS="-L/usr/lib"

# With go
go install -tags "huggingbento" github.com/warpstreamlabs/bento/cmd/bento@latest

# Using make
make TAGS=huggingbento NODOWNLOAD
```

## Docker Builds

Expand Down
4 changes: 3 additions & 1 deletion cmd/bento/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -16,14 +16,16 @@ var (
DateBuilt string
// BinaryName binary name.
BinaryName string = "bento"
// ProductName name of product for CLI.
ProductName string = "Bento"
)

func main() {
service.RunCLI(
context.Background(),
service.CLIOptSetVersion(Version, DateBuilt),
service.CLIOptSetBinaryName(BinaryName),
service.CLIOptSetProductName("Bento"),
service.CLIOptSetProductName(ProductName),
service.CLIOptSetDocumentationURL("https://warpstreamlabs.github.io/bento/docs"),
service.CLIOptSetShowRunCommand(true),
)
Expand Down
78 changes: 45 additions & 33 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ module github.com/warpstreamlabs/bento
replace github.com/99designs/keyring => github.com/Jeffail/keyring v1.2.3

require (
cloud.google.com/go/bigquery v1.59.1
cloud.google.com/go/pubsub v1.36.1
cloud.google.com/go/storage v1.37.0
cloud.google.com/go/bigquery v1.61.0
cloud.google.com/go/pubsub v1.38.0
cloud.google.com/go/storage v1.40.0
cuelang.org/go v0.7.0
github.com/Azure/azure-sdk-for-go/sdk/azcore v1.9.2
github.com/Azure/azure-sdk-for-go/sdk/azidentity v1.5.1
Expand Down Expand Up @@ -50,13 +50,14 @@ require (
github.com/clbanning/mxj/v2 v2.7.0
github.com/colinmarc/hdfs v1.1.3
github.com/couchbase/gocb/v2 v2.9.1
github.com/daulet/tokenizers v0.9.0
github.com/denisenkom/go-mssqldb v0.12.3
github.com/dgraph-io/ristretto v0.1.1
github.com/dop251/goja v0.0.0-20231014103939-873a1496dc8e
github.com/dop251/goja_nodejs v0.0.0-20231122114759-e84d9a924c5c
github.com/dustin/go-humanize v1.0.1
github.com/eclipse/paho.mqtt.golang v1.4.3
github.com/fatih/color v1.16.0
github.com/fatih/color v1.17.0
github.com/fsnotify/fsnotify v1.7.0
github.com/generikvault/gvalstrings v0.0.0-20180926130504-471f38f0112a
github.com/getsentry/sentry-go v0.27.0
Expand All @@ -80,6 +81,7 @@ require (
github.com/jmespath/go-jmespath v0.4.0
github.com/klauspost/compress v1.17.9
github.com/klauspost/pgzip v1.2.6
github.com/knights-analytics/hugot v0.1.7-0.20240823085553-7da587ad260a
github.com/lib/pq v1.10.9
github.com/linkedin/goavro/v2 v2.12.0
github.com/matoous/go-nanoid/v2 v2.0.0
Expand Down Expand Up @@ -122,12 +124,13 @@ require (
github.com/trinodb/trino-go-client v0.313.0
github.com/twmb/franz-go v1.16.1
github.com/twmb/franz-go/pkg/kmsg v1.7.0
github.com/urfave/cli/v2 v2.27.1
github.com/urfave/cli/v2 v2.27.4
github.com/vmihailenco/msgpack/v5 v5.4.1
github.com/xdg-go/scram v1.1.2
github.com/xeipuuv/gojsonschema v1.2.0
github.com/xitongsys/parquet-go v1.6.2
github.com/xitongsys/parquet-go-source v0.0.0-20211228015320-b4f792c43cd0
github.com/yalue/onnxruntime_go v1.11.0
github.com/youmark/pkcs8 v0.0.0-20201027041543-1326539a0a0a
go.etcd.io/etcd/api/v3 v3.5.14
go.etcd.io/etcd/client/v3 v3.5.14
Expand All @@ -141,25 +144,27 @@ require (
go.opentelemetry.io/otel/sdk v1.24.0
go.opentelemetry.io/otel/trace v1.24.0
go.uber.org/multierr v1.11.0
golang.org/x/crypto v0.25.0
golang.org/x/exp v0.0.0-20231006140011-7918f672742d
golang.org/x/net v0.27.0
golang.org/x/oauth2 v0.17.0
golang.org/x/sync v0.7.0
golang.org/x/text v0.16.0
google.golang.org/api v0.162.0
golang.org/x/crypto v0.26.0
golang.org/x/exp v0.0.0-20240823005443-9b4947da3948
golang.org/x/net v0.28.0
golang.org/x/oauth2 v0.22.0
golang.org/x/sync v0.8.0
golang.org/x/text v0.17.0
google.golang.org/api v0.184.0
google.golang.org/grpc v1.64.0
google.golang.org/protobuf v1.34.2
gopkg.in/natefinch/lumberjack.v2 v2.2.1
gopkg.in/yaml.v3 v3.0.1
modernc.org/sqlite v1.28.0
)

require (
cloud.google.com/go v0.112.0 // indirect
cloud.google.com/go/compute v1.24.0 // indirect
cloud.google.com/go/compute/metadata v0.2.3 // indirect
cloud.google.com/go/iam v1.1.6 // indirect
cloud.google.com/go/trace v1.10.5 // indirect
cloud.google.com/go v0.114.0 // indirect
cloud.google.com/go/auth v0.5.1 // indirect
cloud.google.com/go/auth/oauth2adapt v0.2.2 // indirect
cloud.google.com/go/compute/metadata v0.3.0 // indirect
cloud.google.com/go/iam v1.1.8 // indirect
cloud.google.com/go/trace v1.10.7 // indirect
dario.cat/mergo v1.0.0 // indirect
github.com/99designs/go-keychain v0.0.0-20191008050251-8e49817e8af4 // indirect
github.com/99designs/keyring v1.2.2 // indirect
Expand All @@ -177,9 +182,11 @@ require (
github.com/andybalholm/brotli v1.1.0 // indirect
github.com/apache/arrow/go/arrow v0.0.0-20211112161151-bc219186db40 // indirect
github.com/apache/arrow/go/v14 v14.0.2 // indirect
github.com/apache/arrow/go/v15 v15.0.2 // indirect
github.com/apache/thrift v0.18.1 // indirect
github.com/ardielle/ardielle-go v1.5.2 // indirect
github.com/armon/go-metrics v0.3.4 // indirect
github.com/aws/aws-sdk-go v1.55.5 // indirect
github.com/aws/aws-sdk-go-v2/aws/protocol/eventstream v1.6.0 // indirect
github.com/aws/aws-sdk-go-v2/feature/dynamodb/attributevalue v1.12.16 // indirect
github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.14.11 // indirect
Expand All @@ -206,7 +213,7 @@ require (
github.com/btnguyen2k/consu/reddo v0.1.8 // indirect
github.com/btnguyen2k/consu/semita v0.1.5 // indirect
github.com/bufbuild/protocompile v0.8.0 // indirect
github.com/cespare/xxhash/v2 v2.2.0 // indirect
github.com/cespare/xxhash/v2 v2.3.0 // indirect
github.com/cockroachdb/apd/v3 v3.2.1 // indirect
github.com/containerd/continuity v0.3.0 // indirect
github.com/coreos/go-semver v0.3.0 // indirect
Expand All @@ -215,7 +222,7 @@ require (
github.com/couchbase/gocbcoreps v0.1.3 // indirect
github.com/couchbase/goprotostellar v1.0.2 // indirect
github.com/couchbaselabs/gocbconnstr/v2 v2.0.0-20240607131231-fb385523de28 // indirect
github.com/cpuguy83/go-md2man/v2 v2.0.2 // indirect
github.com/cpuguy83/go-md2man/v2 v2.0.4 // indirect
github.com/danieljoos/wincred v1.2.0 // indirect
github.com/davecgh/go-spew v1.1.1 // indirect
github.com/dgryski/go-rendezvous v0.0.0-20200823014737-9f7001d12a5f // indirect
Expand All @@ -232,17 +239,18 @@ require (
github.com/form3tech-oss/jwt-go v3.2.5+incompatible // indirect
github.com/frankban/quicktest v1.14.6 // indirect
github.com/gabriel-vasile/mimetype v1.4.2 // indirect
github.com/go-errors/errors v1.5.1 // indirect
github.com/go-faster/city v1.0.1 // indirect
github.com/go-faster/errors v0.7.1 // indirect
github.com/go-logr/logr v1.4.1 // indirect
github.com/go-logr/logr v1.4.2 // indirect
github.com/go-logr/stdr v1.2.2 // indirect
github.com/go-sourcemap/sourcemap v2.1.3+incompatible // indirect
github.com/goccy/go-json v0.10.2 // indirect
github.com/gogo/protobuf v1.3.2 // indirect
github.com/golang-jwt/jwt v3.2.2+incompatible // indirect
github.com/golang-sql/civil v0.0.0-20220223132316-b832511892a9 // indirect
github.com/golang-sql/sqlexp v0.1.0 // indirect
github.com/golang/glog v1.2.0 // indirect
github.com/golang/glog v1.2.1 // indirect
github.com/golang/groupcache v0.0.0-20210331224755-41bb18bfe9da // indirect
github.com/golang/protobuf v1.5.4 // indirect
github.com/golang/snappy v0.0.4 // indirect
Expand All @@ -252,7 +260,7 @@ require (
github.com/google/shlex v0.0.0-20191202100458-e7afc7fbc510 // indirect
github.com/google/uuid v1.6.0 // indirect
github.com/googleapis/enterprise-certificate-proxy v0.3.2 // indirect
github.com/googleapis/gax-go/v2 v2.12.0 // indirect
github.com/googleapis/gax-go/v2 v2.12.5 // indirect
github.com/gorilla/css v1.0.0 // indirect
github.com/gosimple/unidecode v1.0.1 // indirect
github.com/govalues/decimal v0.1.29 // indirect
Expand All @@ -278,8 +286,10 @@ require (
github.com/jcmturner/gokrb5/v8 v8.4.4 // indirect
github.com/jcmturner/rpc/v2 v2.0.3 // indirect
github.com/josharian/intern v1.0.0 // indirect
github.com/json-iterator/go v1.1.12 // indirect
github.com/kballard/go-shellquote v0.0.0-20180428030007-95032a82bc51 // indirect
github.com/klauspost/cpuid/v2 v2.2.5 // indirect
github.com/knights-analytics/HuggingFaceModelDownloader v1.3.5 // indirect
github.com/kr/fs v0.1.0 // indirect
github.com/kylelemons/godebug v1.1.0 // indirect
github.com/lann/builder v0.0.0-20180802200727-47ae307949d0 // indirect
Expand All @@ -289,6 +299,8 @@ require (
github.com/mattn/go-isatty v0.0.20 // indirect
github.com/mattn/go-runewidth v0.0.16 // indirect
github.com/moby/term v0.5.0 // indirect
github.com/modern-go/concurrent v0.0.0-20180306012644-bacd9c7ef1dd // indirect
github.com/modern-go/reflect2 v1.0.2 // indirect
github.com/montanaflynn/stats v0.7.0 // indirect
github.com/mpvl/unique v0.0.0-20150818121801-cbe035fff7de // indirect
github.com/mtibben/percent v0.2.1 // indirect
Expand Down Expand Up @@ -316,33 +328,33 @@ require (
github.com/shopspring/decimal v1.3.1 // indirect
github.com/spaolacci/murmur3 v1.1.0 // indirect
github.com/stretchr/objx v0.5.2 // indirect
github.com/viant/afs v1.25.1 // indirect
github.com/viant/afsc v1.9.3 // indirect
github.com/vmihailenco/tagparser/v2 v2.0.0 // indirect
github.com/xdg-go/pbkdf2 v1.0.0 // indirect
github.com/xdg-go/stringprep v1.0.4 // indirect
github.com/xeipuuv/gojsonpointer v0.0.0-20190905194746-02993c407bfb // indirect
github.com/xeipuuv/gojsonreference v0.0.0-20180127040603-bd5ef7bd5415 // indirect
github.com/xrash/smetrics v0.0.0-20201216005158-039620a65673 // indirect
github.com/xrash/smetrics v0.0.0-20240521201337-686a1a2994c1 // indirect
github.com/zeebo/xxh3 v1.0.2 // indirect
go.etcd.io/bbolt v1.3.10 // indirect
go.etcd.io/etcd/client/pkg/v3 v3.5.14 // indirect
go.opencensus.io v0.24.0 // indirect
go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc v0.49.0 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.47.0 // indirect
go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp v0.49.0 // indirect
go.opentelemetry.io/otel/metric v1.24.0 // indirect
go.opentelemetry.io/proto/otlp v1.1.0 // indirect
go.uber.org/atomic v1.11.0 // indirect
go.uber.org/zap v1.27.0 // indirect
golang.org/x/mod v0.17.0 // indirect
golang.org/x/mod v0.20.0 // indirect
golang.org/x/sys v0.24.0 // indirect
golang.org/x/term v0.22.0 // indirect
golang.org/x/term v0.23.0 // indirect
golang.org/x/time v0.5.0 // indirect
golang.org/x/tools v0.21.1-0.20240508182429-e35e4ccd0d2d // indirect
golang.org/x/tools v0.24.0 // indirect
golang.org/x/xerrors v0.0.0-20231012003039-104605ab7028 // indirect
google.golang.org/appengine v1.6.8 // indirect
google.golang.org/genproto v0.0.0-20240227224415-6ceb2ff114de // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20240227224415-6ceb2ff114de // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20240401170217-c3f982113cda // indirect
google.golang.org/grpc v1.63.2 // indirect
google.golang.org/genproto v0.0.0-20240604185151-ef581f913117 // indirect
google.golang.org/genproto/googleapis/api v0.0.0-20240610135401-a8a62080eff3 // indirect
google.golang.org/genproto/googleapis/rpc v0.0.0-20240610135401-a8a62080eff3 // indirect
gopkg.in/inf.v0 v0.9.1 // indirect
gopkg.in/jcmturner/aescts.v1 v1.0.1 // indirect
gopkg.in/jcmturner/dnsutils.v1 v1.0.1 // indirect
Expand All @@ -360,4 +372,4 @@ require (
modernc.org/token v1.1.0 // indirect
)

go 1.21
go 1.22.0
Loading
Loading