Running the app yourself
From the beginning, I wanted to make running TalkingCode as easy as possible. I wanted to provide a way for people to host it themselves. This article will walk you through the steps needed to host TalkingCode yourself.
Summary
To host TalkingCode yourself you will need to have the following:
- Install Docker and Docker compose
- Get an OpenAI and GitHub API key
- Copy over the compose files
- Create a .env file with the necessary config variables
- Run the containers
- Run the pipeline using Dagster
- Use the application through the OpenAPI docs, curl or your own frontend.
Requirements
To host
Getting started
After you have the above requirements, you can make an empty folder, for example TalkingCode. Afterwards you need to copy over both the compose.backend.yml and compose.dataprocessing.yml files from the repository. You can find them here. compose.*.override.yml you do not need to copy these files over. You may
also just clone the repository instead of copying the files over.
Configuration through environment variables
After you have copied the files over, you need to create a file called .env. You can do this by running the following command: touch .env After you have created the file, you need to add a few mandatory environment variables. The rest
are optional. Another thing you could do is use the config/env.template file as a starting point. You can copy the contents of the file and paste it into the .env.
Mandatory environment variables
These are the mandatory environment variables that you need to add to the .env. Without these, the application will not work. The list is
kept as small as possible to make it easier to get started.
OPENAI_EMBEDDING_API_KEY- This is your OpenAI API key.GITHUB_API_TOKEN- This is your GitHub personal access token.WHITELISTED_EXTENSIONS- The extensions that are allowed to be embedded. They need to be formatted as a JSON list. For example,'["py", "js"]'.SYSTEM_PROMPT- The prompt that the chat model will use. This should be a short to medium length description of the tasks the model will be asked to perform. For example, I specify it needs to answer questions about my projects without straying too far from the topic.
Optional environment variables
Aside from the mandatory environment variables, there are also optional environment
variables that you can add to the .env. These are used to configure
the postgres database and tweak the models used by the application. If you're just trying it
out, you can stick with the defaults.
POSTGRES_USER- This is the username for the Postgres database. For example, you can usepostgres.POSTGRES_PASSWORD- This is the password for the Postgres database. For example, you can usepostgres.POSTGRES_DB- This is the name of the Postgres database. For example, you can useTalkingCode.POSTGRES_HOST- This is the host for the Postgres database. For example, you can uselocalhost.POSTGRES_PORT- This is the port for the Postgres database. For example, you can use5432.BLACKLISTED_FILES- The files that are not allowed to be embedded. They need to be formatted as a JSON list. For example,'["package-lock.json", "package.json"]'EMBEDDING_MODEL- The openAI model to use for embedding. For example,text-embedding-3-large. The choice is up to you. The full list of models can be found on openAI's website.MAX_SPEND- The maximum amount of money you want to spend on the OpenAI API per day. The number isn't 100 % accurate but it's a good, conservative estimate. The default is 1.5 dollars / day.CHAT_MODEL- The openAI model to use for generating answers. For example,gpt-4o. There is a trade-off between speed, cost, and quality wheregpt-4odoes well in all three. The full list of models can be found on openAI's website.TOP_K- The number of retrieved documents to consider for the RAG model. The default is 5 but you can change it to any number you like.DAGSTER_PORT- The port that the Dagster UI will be available on. The default is 3000, but that might conflict with other services you have running. You can change it to any number you like. The UI will be available onlocalhost:PORT.BACKEND_PORT- The port that the FastAPI backend will be available on. The default is 8000, but that might conflict with other services you have running. You can change it to any number you like. The backend will be available onlocalhost:PORT. You can browse to the docs by going tolocalhost:PORT/docsto start making requests with a low barrier to entry.
Running the containers
The next step is to run the containers. You can do this by running the following commands:
docker-compose -f --env-file .env compose.backend.yml up -ddocker-compose -f --env.file .env compose.dataprocessing.yml up -d
docker ps. You should see three containers running. One for
the backend, one for the database, and one for the data processing.Running the pipeline using dagster
After you have the containers running, you can run the pipeline using Dagster. You can browse
to localhost:3000 or localhost:DAGSTER_PORT if you have changed the port. This will bring up
the Dagster UI. You can then start the pipeline by clicking on the materialize all button.
Using the application
After you have started the pipeline, you can start using the application. You can browse to localhost:8000/docs or localhost:BACKEND_PORT/docs if you have changed the port. This will bring
up the FastAPI docs. You can then start making requests to the API. You will have to provide your
own frontend to interact with the API. You're free to use the frontend I used as a starting point.
Conclusion
That's it! You now have a fully working version of