-
-
Notifications
You must be signed in to change notification settings - Fork 140
feat: embeddings model support #86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -1,12 +1,11 @@ | |||
import * as input from '@inquirer/prompts' | |||
import { useLogger } from '@tg-search/common' | |||
import { EmbeddingService } from '@tg-search/core' | |||
import { findMessagesByChatId, updateMessageEmbeddings } from '@tg-search/db' | |||
import { findMessageMissingEmbed, updateMessageEmbeddings, useEmbeddingTable } from '@tg-search/db' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is findMissingEmbedMessage better?
const messages = await findMessagesByChatId(Number(chatId)) | ||
const messagesToEmbed = messages.items.filter(m => !m.embedding && m.content) | ||
const totalMessages = messagesToEmbed.length | ||
const messages = await findMessageMissingEmbed(Number(chatId), embedding.getEmbeddingConfig()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is missingMessage better?
Just a propose, not a forcement. but can got better DX?
@@ -48,6 +48,8 @@ export class EmbedCommandHandler { | |||
|
|||
// Initialize embedding service | |||
const embedding = new EmbeddingService() | |||
await useEmbeddingTable(embedding.getEmbeddingConfig()) | |||
logger.debug('awdadwd') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's meaning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I will correct these mistakes. (:3 ⌒゙)
return { | ||
...this.embedding_config, | ||
// 替换-为_ | ||
model: this.embedding_config.model.replace(/-/g, '_'), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why need convert?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because it looks good, if it's not suitable, I can make some reversions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some embedding models, such as nomic-embed-text, have a hyphen in their names. this is not allowed in the table naming of PostgreSQL, so a conversion is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what do you think about moving the convert logic to the models layer? which is more relevant.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will try it.
packages/db/src/schema/embedding.ts
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you have regenerate drizzle migration file after you modify the schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm... No😲
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you use a dynamic table, it is unnecessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PR Overview
This pull request adds support for embedding models by introducing an EmbeddingTableConfig and related helper functions. Key changes include:
- Adding new schema and utility functions for creating and using embedding tables.
- Extending configuration and types to include the embedding dimensions.
- Updating message handling and search functionality to work with the new embedding model support.
Reviewed Changes
File | Description |
---|---|
packages/db/src/schema/embedding.ts | Introduces embedding table creation and use functions. |
packages/db/src/schema/types.ts | Adds the EmbeddingTableConfig interface including dimensions. |
packages/db/src/models/message/embedding.ts | Updates functions to use the new embedding table and config; refactors types. |
packages/db/src/models/message/message.test.ts | Skips the findSimilarMessages test indicating decreased test coverage for that feature. |
packages/common/src/types/config.ts | Adds the embedding dimensions to the API config. |
config/config.example.yaml | Updates the example configuration with dimensions for embeddings. |
packages/core/src/services/embedding.ts | Updates the embedding service to incorporate the new EmbeddingTableConfig. |
apps/cli/src/commands/embed.ts | Refactors the embed command to use the new embedding config and table initialization. |
packages/common/src/composable/config.ts | Adds a default dimensions value to the configuration. |
apps/server/src/routes/config.ts | Updates the configuration schema to include embedding dimensions. |
packages/db/src/models/message/types.ts | Adds an optional uuid field for message creation inputs. |
packages/db/src/schema/index.ts | Re-exports the embedding schema. |
apps/server/src/services/commands/embed.ts | Updates the server command for embedding to use new embedding table/config functions. |
apps/cli/src/commands/search.ts | Updates the search command to utilize the new embedding configuration. |
apps/server/src/routes/search.ts | Adjusts server search routes to use the updated embedding API. |
packages/db/src/schema/message.ts | Updates the message table schema by removing the embedding field from the ORM schema. |
apps/frontend/src/pages/settings.vue | Adds a settings input for configuring embedding dimensions. |
Copilot reviewed 19 out of 19 changed files in this pull request and generated 1 comment.
Comments suppressed due to low confidence (2)
packages/db/src/schema/message.ts:29
- The embedding column is defined in the CREATE TABLE SQL block but has been removed from the ORM schema definition. Ensure that the ORM schema and SQL table definitions remain consistent.
embedding: vector('embedding'),
packages/db/src/models/message/message.test.ts:95
- The findSimilarMessages test is skipped, which may leave the vector search functionality unverified. Consider re-enabling or adding tests to ensure proper coverage.
describe.skip('findSimilarMessages', () => {
apps/cli/src/commands/embed.ts
Outdated
const messagesToEmbed = messages.items.filter(m => !m.embedding && m.content) | ||
const totalMessages = messagesToEmbed.length | ||
const messages = await findMessageMissingEmbed(Number(chatId), embedding.getEmbeddingConfig()) | ||
logger.debug('awdadwd') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nitpick] The debug log message 'awdadwd' appears to be a placeholder and is not descriptive. Please remove or replace it with a meaningful message.
logger.debug('awdadwd') | |
logger.debug('Successfully retrieved messages for embedding processing') |
Copilot uses AI. Check for mistakes.
430340f
to
3c5ffc7
Compare
…e embedding table retrieval
…clude chatId for improved retrieval
aaffcbf
to
da8e73e
Compare
… result handling - Added TypeScript interfaces for Command and CommandMetadata to improve type safety. - Updated the search function to return a structured SearchResponse. - Introduced a new types file for SearchResult and SearchResponse to standardize search-related data structures.
…ings components - Updated pagination logic in search.vue to ensure total is a valid number before comparison with pageSize. - Ensured apiId in settings.vue is converted to a string before saving the configuration. - Removed unused imports in search.ts to clean up the codebase. - Adjusted parameters in findSimilarMessages function to include targetChatId for better search results.
- Implemented migration for creating embedding_models table with necessary fields and triggers for automatic timestamp updates. - Added logic to create corresponding embedding tables for existing message tables, including necessary constraints and indexes. - Enhanced logging for better tracking of migration progress and errors.
LGTM, Thank you very much! |
This reverts commit 032cbb4.
No description provided.