Skip to content

feat: embeddings model support #86

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 13 commits into from
Mar 18, 2025
Merged

feat: embeddings model support #86

merged 13 commits into from
Mar 18, 2025

Conversation

hahaQWQ
Copy link
Collaborator

@hahaQWQ hahaQWQ commented Mar 10, 2025

No description provided.

@hahaQWQ hahaQWQ mentioned this pull request Mar 10, 2025
40 tasks
@hahaQWQ hahaQWQ added the PR label Mar 10, 2025
@@ -1,12 +1,11 @@
import * as input from '@inquirer/prompts'
import { useLogger } from '@tg-search/common'
import { EmbeddingService } from '@tg-search/core'
import { findMessagesByChatId, updateMessageEmbeddings } from '@tg-search/db'
import { findMessageMissingEmbed, updateMessageEmbeddings, useEmbeddingTable } from '@tg-search/db'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is findMissingEmbedMessage better?

const messages = await findMessagesByChatId(Number(chatId))
const messagesToEmbed = messages.items.filter(m => !m.embedding && m.content)
const totalMessages = messagesToEmbed.length
const messages = await findMessageMissingEmbed(Number(chatId), embedding.getEmbeddingConfig())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is missingMessage better?

Just a propose, not a forcement. but can got better DX?

@@ -48,6 +48,8 @@ export class EmbedCommandHandler {

// Initialize embedding service
const embedding = new EmbeddingService()
await useEmbeddingTable(embedding.getEmbeddingConfig())
logger.debug('awdadwd')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's meaning?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I will correct these mistakes. (:3 ⌒゙)

return {
...this.embedding_config,
// 替换-为_
model: this.embedding_config.model.replace(/-/g, '_'),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why need convert?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

because it looks good, if it's not suitable, I can make some reversions

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some embedding models, such as nomic-embed-text, have a hyphen in their names. this is not allowed in the table naming of PostgreSQL, so a conversion is needed.

PostgreSQL table naming rules

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what do you think about moving the convert logic to the models layer? which is more relevant.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will try it.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you have regenerate drizzle migration file after you modify the schema?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm... No😲

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you use a dynamic table, it is unnecessary

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay

@luoling8192 luoling8192 changed the title embeddings model support feat: embeddings model support Mar 10, 2025
@luoling8192 luoling8192 requested a review from Copilot March 10, 2025 18:55
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Overview

This pull request adds support for embedding models by introducing an EmbeddingTableConfig and related helper functions. Key changes include:

  • Adding new schema and utility functions for creating and using embedding tables.
  • Extending configuration and types to include the embedding dimensions.
  • Updating message handling and search functionality to work with the new embedding model support.

Reviewed Changes

File Description
packages/db/src/schema/embedding.ts Introduces embedding table creation and use functions.
packages/db/src/schema/types.ts Adds the EmbeddingTableConfig interface including dimensions.
packages/db/src/models/message/embedding.ts Updates functions to use the new embedding table and config; refactors types.
packages/db/src/models/message/message.test.ts Skips the findSimilarMessages test indicating decreased test coverage for that feature.
packages/common/src/types/config.ts Adds the embedding dimensions to the API config.
config/config.example.yaml Updates the example configuration with dimensions for embeddings.
packages/core/src/services/embedding.ts Updates the embedding service to incorporate the new EmbeddingTableConfig.
apps/cli/src/commands/embed.ts Refactors the embed command to use the new embedding config and table initialization.
packages/common/src/composable/config.ts Adds a default dimensions value to the configuration.
apps/server/src/routes/config.ts Updates the configuration schema to include embedding dimensions.
packages/db/src/models/message/types.ts Adds an optional uuid field for message creation inputs.
packages/db/src/schema/index.ts Re-exports the embedding schema.
apps/server/src/services/commands/embed.ts Updates the server command for embedding to use new embedding table/config functions.
apps/cli/src/commands/search.ts Updates the search command to utilize the new embedding configuration.
apps/server/src/routes/search.ts Adjusts server search routes to use the updated embedding API.
packages/db/src/schema/message.ts Updates the message table schema by removing the embedding field from the ORM schema.
apps/frontend/src/pages/settings.vue Adds a settings input for configuring embedding dimensions.

Copilot reviewed 19 out of 19 changed files in this pull request and generated 1 comment.

Comments suppressed due to low confidence (2)

packages/db/src/schema/message.ts:29

  • The embedding column is defined in the CREATE TABLE SQL block but has been removed from the ORM schema definition. Ensure that the ORM schema and SQL table definitions remain consistent.
embedding: vector('embedding'),

packages/db/src/models/message/message.test.ts:95

  • The findSimilarMessages test is skipped, which may leave the vector search functionality unverified. Consider re-enabling or adding tests to ensure proper coverage.
describe.skip('findSimilarMessages', () => {

const messagesToEmbed = messages.items.filter(m => !m.embedding && m.content)
const totalMessages = messagesToEmbed.length
const messages = await findMessageMissingEmbed(Number(chatId), embedding.getEmbeddingConfig())
logger.debug('awdadwd')
Copy link
Preview

Copilot AI Mar 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[nitpick] The debug log message 'awdadwd' appears to be a placeholder and is not descriptive. Please remove or replace it with a meaningful message.

Suggested change
logger.debug('awdadwd')
logger.debug('Successfully retrieved messages for embedding processing')

Copilot uses AI. Check for mistakes.

@hahaQWQ hahaQWQ requested a review from luoling8192 March 14, 2025 13:00
@hahaQWQ hahaQWQ force-pushed the dev/embeddings-model branch from 430340f to 3c5ffc7 Compare March 14, 2025 13:07
@hahaQWQ hahaQWQ force-pushed the dev/embeddings-model branch from aaffcbf to da8e73e Compare March 15, 2025 07:45
hahaQWQ and others added 4 commits March 15, 2025 15:47
… result handling

- Added TypeScript interfaces for Command and CommandMetadata to improve type safety.
- Updated the search function to return a structured SearchResponse.
- Introduced a new types file for SearchResult and SearchResponse to standardize search-related data structures.
…ings components

- Updated pagination logic in search.vue to ensure total is a valid number before comparison with pageSize.
- Ensured apiId in settings.vue is converted to a string before saving the configuration.
- Removed unused imports in search.ts to clean up the codebase.
- Adjusted parameters in findSimilarMessages function to include targetChatId for better search results.
- Implemented migration for creating embedding_models table with necessary fields and triggers for automatic timestamp updates.
- Added logic to create corresponding embedding tables for existing message tables, including necessary constraints and indexes.
- Enhanced logging for better tracking of migration progress and errors.
@luoling8192 luoling8192 mentioned this pull request Mar 18, 2025
@luoling8192
Copy link
Collaborator

LGTM, Thank you very much!

@luoling8192 luoling8192 merged commit 032cbb4 into main Mar 18, 2025
3 checks passed
luoling8192 added a commit that referenced this pull request Mar 21, 2025
@luoling8192 luoling8192 deleted the dev/embeddings-model branch May 3, 2025 05:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants