KG + LLM
最后更新于
最后更新于
Represent entities and relations in the knowledge graph as dense vector embeddings, which can be incorporated into the language model’s input or output representations. This allows the model to capture and utilize relational knowledge during training or inference.
Tensor Factorization Models (TuckER, m-CP)
Hyperbolic Embeddings (MuRP, RotatE)
Geometric Embeddings (RefE, BoxE)
Augment the language model’s training data with knowledge graph triples, effectively teaching the model to better represent and reason over structured knowledge.
During inference, retrieve relevant knowledge graph subgraphs or facts based on the input text, and provide this structured knowledge as additional context to the language model.
Develop language generation models that can directly generate text conditioned on knowledge graph facts, ensuring the generated output respects and accurately reflects the provided knowledge.
Train language models on a combination of traditional language tasks (e.g., text generation, question answering) and knowledge graph tasks (e.g., link prediction, path ranking), enabling them to develop both language understanding and structured knowledge capabilities.
Vector-based retrieval: This method of retrieval requires that you vectorize your KG and store it in a vector store. If you then vectorize your natural language prompt, you can find vectors in the vector store that are most similar to your prompt. Since these vectors correspond to entities in your graph, you can return the most ‘relevant’ entities in the graph given a natural language prompt. This is the exact same process described above under the tagging capability — we are essentially ‘tagging’ a prompt with relevant tags from our KG.
Prompt-to-query retrieval: Alternatively, you could use an LLM to generate a SPARQL or Cypher query and use that query to get the most relevant data from the graph
Train language models to reason over multi-hop knowledge graph paths, enabling them to answer complex queries by traversing and combining multiple facts.
Incorporate symbolic reasoning capabilities into language models, allowing them to perform logical operations and inference over structured knowledge representations.
Develop hybrid neuro-symbolic approaches that combine the strengths of neural networks (pattern recognition, generalization) and symbolic reasoning (interpretability, logical consistency).
Develop attention mechanisms that can effectively attend to relevant knowledge graph entities and relations during language model inference, enabling more focused and contextualized knowledge utilization.
Enforce knowledge graph constraints (e.g., type constraints, cardinality constraints) during language model training or inference to ensure generated outputs respect the underlying knowledge graph structure and semantics.
Develop techniques to refine and extend knowledge graphs based on language model outputs, enabling a symbiotic relationship where the knowledge graph enhances the language model, and the language model, in turn, helps refine and expand the knowledge graph.
Purpose is to control hallucinations
A fusion of EKG and LLM control hallucinations in three ways:
Post-generation Verification: After text is generated, a knowledge graph is used to verify the information in the text, potentially identifying hallucinations.
Direct Querying: In the process of text generation, the model may query a knowledge graph for specific information to use in the generation, which could help prevent hallucinations by providing the model with accurate information directly.
Training Enhancements: A knowledge graph could be used during the training of the RAG model to teach it to avoid certain types of hallucinations. This would involve using the knowledge graph as a source of “ground truth” during training.