RAG 구현 Step-by-Step Embedding & Searching: Query Translation

AI/Gen AI

RAG 구현 Step-by-Step Embedding & Searching: Query Translation

문괜 2024. 9. 20. 12:00

~~가상환경 설정 및 MVP 구현~~
~~Dataset과 Vector DB 구현~~
1. ~~Dataset 확보~~
2. ~~Vector DB 구현~~
Embedding & Searching 구현
Generation 구현

위의 그림은 LangChain에서 RAG를 필요에 맞춰 수정할 수 있는 부분을 도식화한 그림이다.

위의 그림과 같이 단순히 RAG는 조회와 조회된 결과를 바탕으로 생성하는 간단한 구성이 아니었다. 조회와 생성 그리고 데이터 확보 부분 또한 기획한 서비스 의도에 맞춰 다양하게 변형할 수 있었다.

그래서 이번 Embedding & Searching에서는 Query에 대한 Embedding을 진행하고 그다음 가장 유사한 Vector들을 찾는 유사도 방식에 추가적으로 Query에 대한 분석과 분석을 바탕으로 한 조회를 진행하는 Query Translation을 진행하게 됐다.

기초적인 Similarity Search(유사도 검색)에 대한 정보는 아래의 글을 참고하면 된다.

도대체 유사도 검색은 뭘까? with LangChain .as_retriever() 분석

또한 아래의 글을 먼저 보고 오는것도 좋다.

도대체 LCEL은 뭘까?

먼저 위와 같이 Query Translation을 진행하기로 결정한 동기는 아래와 같다.

Query Translation 진행 동기

조회의 정확성을 높이고 싶었다.
- Similarity Search는 주어진 Query를 직접 비교하는데 만약 Query가 만약 내용이 부실하면 조회에 정확성이 많이 떨어질 수 있다고 생각했다.
저장된 Document의 특징에 따라 조회가 달라질 수 있다고 생각했다.
- 대부분의 예시 Document의 경우 하나의 큰 Document를 Chunking 한 경우가 대다수였다. 반대로 내가 지금 현재 가지고 있는 Document의 경우 Crawling 된 정보로서 확실히 구분되는 값들이 존재했다. 그렇기 때문에 일반적인 Similarity Search가 유용하지 못할 수 있다고 생각했다.
생성에서도 LLM을 사용 할 수 있다는 점을 알고 있었다. 그런데 '조회에서도 LLM이 사용될 수 있지 않을까?' 하는 의문이 들었다.
- 생성의 경우 Prompt Engineering을 통해 LLM이 서비스의도에 맞춰 생성하도록 만들 수 있다고 알고 있었다. 그런데 '이걸 단순히 생성에만 사용하지 않고 조회에서도 사용할 수 있지 않을까?'라는 생각이 들었고 서로 다른 역할을 가진 LLM이 협업할 수 있겠다는 생각이 들었다.

기본적으로 Query Similarity Search의 경우 아래의 순서로 진행 된다.

먼저 FAISS와 같은 Vector Store를 불러온다.
Vector Store를 Retriever로 불러온다.
Retriever에 Query를 invoke 한다.

간단하게 코드로는 아래처럼 작성하면 된다.

# Faiss index set
import faiss
from langchain_community.docstore.in_memory import InMemoryDocstore
from langchain_community.vectorstores import FAISS
from langchain_openai import OpenAIEmbeddings

# faiss_index_path는 LangChain Faiss로 생성한 DB에 대한 Path다.
embeddings = OpenAIEmbeddings(model="text-embedding-3-large")
loaded_vector_store = FAISS.load_local(faiss_index_path, embeddings, allow_dangerous_deserialization=True)
retriever = loaded_vector_Stroe.as_retriever()

selected_docs = retriever.invoke("질문")

물론 기본적인 형태에서도 다양하게 retriever에 대한 필드 설정이 가능하나 여전히 부족한 부분이 있다.

위와 같은 기본적인 형태에서 Query Translation을 고도화하기 위한 방식은 여러 개가 있다.

Multi-Query
RAG-Fusion
Decomposition
- Answer Recursively
- Answer Individually
Step Back
HyDE

각각 사용되는 상황과 선호되는 Vector Store의 Document 구조가 다르다. 이번의 경우 Document의 Data를 구분 짓는 Fields가 존재하기 때문에 Multi-Query를 사용하였다. (각 Query Translation 방식은 하나의 기준점이기 때문에 필요에 맞게 전환해 주면 된다.)

Multi-Query

이 방식의 핵심은 들어온 Query에 관련한 여러 개의 Query를 생성한 다음 질문에 대한 고유한 답변들을 모은다. 그래서 위의 기본적인 형태에 더해 두 가지만 추가하면 된다.

Generate Queries
- Prompt
Retrieval Chain

Generate Queires의 경우 Query를 생성하는 LLM을 정의하는 부분이다.

Retrieval Chain은 LCEL을 이용하여 Generate Queries, Retriever, 그리고 중복결과 제거를 하나의 Chain으로 만들어 RAG Application의 Retrieval Part를 만드는 데 사용한다.

그래서 먼저 Generate Queries를 코드를 보면

from langchain.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_openai import ChatOpenAI

# Multi Query: Different Perspectives
template = """
    You are an expert planning a date for loved one, family and friends.
    Your task is retrieving relevant data to generate a date plan.
    You have access to a database of locations for dating in Seoul.
    Your task is to generate five different versions of the given user question to retrieve relevant documents from a vector
    database.
    By generating multiple perspectives on the user question, your goal is to help the user overcome some of the limitations of the distance-based similarity search.


    Each row in the table represents a location and its featrues.
    Features are separated by [SEP].
    If a row have 'None' in the feature, it means that the row doens't have that feature.
    Every row is in Korean while column names are in English.
    Provide these alternative questions separated by newlines.
    Original question: {question}"""
prompt_perspectives = ChatPromptTemplate.from_template(template)

generate_queries = (
    prompt_perspectives
    | ChatOpenAI(temperature=0)
    | StrOutputParser()
    | (lambda x: x.split("\n"))
)

위의 코드와 같이 LLM의 Perspective를 지정하는 Prompt와 사용할 llm 그리고 출력 형태의 Chain을 만들면 된다.

다음으로는 Generate Queries와 Retriever 그리고 중복값제거 함수를 연결한 Retrieval Chain을 아래와 같이 만들면 된다.

from langchain.load import dumps, loads

def get_unique_union(documents: list[list]):
    """ Unique union of retrieved docs """
    # Flatten list of lists, and convert each Document to string
    flattened_docs = [dumps(doc) for sublist in documents for doc in sublist]
    # Get unique documents
    unique_docs = list(set(flattened_docs))
    # Return
    return [loads(doc) for doc in unique_docs]

retriever = loaded_vector_store.as_retriever(
    search_type="mmr",
    search_kwargs={"k": 5, "fetch_k": 10}
)

# Retrieval chain
retrieval_chain = generate_queries | retriever.map() | get_unique_union

이렇게 해서 만든 retrieval_chain의 경우 아래와 같이 테스트해 볼 수 있고 또한 RAG의 다른 Generation Chain과 연결하여 사용할 수 있다.

이렇게 만들어진 retrieval chain의 경우 LangChain 0.3 버전부터는 MultiQueryRetriever를 활용하여 간단하게 만들 수 있다.(처음 만들었을 당시 0.2 버전이었고 1주일 만에 0.3 버전이 업데이트 됐다.)

아래의 링크를 통해 구체적인 구현 예시를 참고할 수 있다.

How to use the MultiQueryRetriever

How to use the MultiQueryRetriever | 🦜️🔗 LangChain

Distance-based vector database retrieval embeds (represents) queries in high-dimensional space and finds similar embedded documents based on a distance metric. But, retrieval may produce different results with subtle changes in query wording, or if the emb

python.langchain.com

물론 Multi Query는 만능이 아니었다. 아래와 같은 문제상황이 존재했다.

문제상황

TPM

역시 여기서도 TPM이 발생했다. 특히 Generate Queries에서 생성되는 Query에 대한 지정이 없었다 보니 발생한 것으로 보인다. 또한 생성된 Query를 기준으로 봤을 때도 Query의 길이가 길어지는 것에 대한 대비 또한 하지 않았다. 그런 점들에 대한 보완이 추후 Optimization과정에서 필요성이 느껴진다.

현재까지 구현한 RAG는 아래와 같은 구조로 돼있다. 추가적으로 현시점부터는 클라이언트에서 요청이 들어온다는 기준으로 생각해야 하기 때문에 요청 Query의 경우 필드로 나누었다.

구체적인 구현은 아래의 링크를 통해 확인할 수 있다.

Github: RAG_APPLICATION_QUERY_ANALYSIS

RAG/rag-practice/rag-langchain/RAG_application_mvp_query_analysis_prompt_langhcain.ipynb at main · jwywoo/RAG

1. RAG Practice. Contribute to jwywoo/RAG development by creating an account on GitHub.

github.com

이제 RAG에서는 마지막인 Generation으로 넘어가 보자.

RAG 구현 Step-by-Step: Generation 및 API 전환

참고자료

LangChain: RAG from Scratch

GitHub - langchain-ai/rag-from-scratch

Contribute to langchain-ai/rag-from-scratch development by creating an account on GitHub.

github.com

저작자표시 비영리 변경금지

'AI > Gen AI' 카테고리의 다른 글

Project HowAbout RAG API - Outro: Optimization - TPM 문제 정의 (3)	2024.09.28
RAG 구현 Step-by-Step: Generation 및 API 전환 (8)	2024.09.24
RAG 구현 Step-by-Step Vector DB 구현 - 2: Faiss with LangChain (3)	2024.09.17
RAG 구현 Step-by-Step Vector DB 구현 - 1: Implementation Outline (1)	2024.09.16
RAG 구현 Step-by-Step Dataset 확보: Selenium을 활용한 Crawler (3)	2024.09.02

현재글RAG 구현 Step-by-Step Embedding & Searching: Query Translation

문과지만 괜찮아

우린 더 괜찮아질 거예요!

컴퓨터 통신, 인공지능, 스터디, 오블완, AI, GenAI, kakao tech, 코딩테스트, Kakao, 백준, 티스토리챌린지, rag, 생성형, 백엔드, 개발, kakaotech, Generation, 웹개발, backend, project joing,

Today :
Yesterday :

일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

문과지만 괜찮아