'AI/LLM Paper review' 카테고리의 글 목록

AI/LLM Paper review

2022_Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions (IR-RAG 실습) 2024.02.18 2
Paper Review "Retrieval-Augmented Generation for Large Language Models: A Survey" (한국어) 2024.02.02

2022_Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions (IR-RAG 실습)

jhworld 2024. 2. 18. 15:41

2024. 2. 18. 15:41

다 다시 쓰일 것입니다.

What is IR-RAG

QA 성능 향상을 위해서 CoT를 활용하는데, 매 step 마다 RAG를 수행하여 CoT의 성능을 향상시킨다.

개념은 매우 간단하니 이제 실험의 구성 요소를 대충 보자.

- Data

: We evaluate our method on 4 multi-step QA datasets in the open-domain setting: HotpotQA (Yang et al., 2018), 2WikiMul-ihopQA (Ho et al., 2020), answerable subset of MuSiQue (Trivedi et al., 2022), and answerable subset of IIRC (Ferguson et al., 2020).

- Retriever

: We use BM25

- Metric

: F1, EM score.

코드를 살펴 본 다음 저 자세하게 보자.

Installation

: 필요한 모듈을 설치한다.

conda create -n ircot python=3.8.0 -y && conda activate ircot
pip install -r requirements.txt

# Python에서 NLP(자연어 처리) 작업을 위해 널리 사용되는 spaCy 라이브러리
# 를 이용하여 en_core_web_sm이라는 영어 모델을 다운로드하고 설치하는 동작을 수행합니다.
python -m spacy download en_core_web_sm

Prepare Data

원래 data set (=raw_data)라고 부르고, 이 raw_data를 가지고 와서 이 논문에서 쓰이는 자료 형태로 한번 정재한게 processed_data 라고 부른다. 이 부분은 나중에 더 자세하게 보면 좋을 거 같고 지금 당장은 그냥 processed_data를 바로 받아놓자.

./download/processed_data.sh

실행이 끝나면 아래와 같이 데이터 들이 다운 받아져 있는데 어떤 형태인진 아직 잘 모르겠네.

Prepare Prompts

실험에 사용된 모든 prompts가 저장이 되어있다. 그리고 당연히 이런 prompts들은 사람이 쓴게 아니라 코드로 생성 된 것이다. 그리고 여기서는 그 prompt를 생성하는 코드까지 모두 줬다.

- prompt_generator: prompts를 생성하는 파이썬 코드

- prompts: 위에를 통해 생성된 프롬프트를 저장해 둔 곳.

이 부분도 나중에 잘 보면 좋겠지만 지금 당장은 그냥 prompts에 생성 되어있는거 먼저 보자.

프로그램 작동을 살펴보면 가장 기본 흐름은 아래와 같다.

reproduce.sh -> runner.py -> run.py.

- 대충 살펴보니 reproduce.sh 는 크게 4단계로 이루어져있다.

echo ">>>> Instantiate experiment configs with different HPs and write them in files. <<<<"
python runner.py $1 $2 $3 write --prompt_set 1

echo ">>>> Run experiments for different HPs on the dev set. <<<<"
python runner.py $1 $2 $3 predict --prompt_set 1

echo ">>>> Run evaluation for different HPs on the dev set. <<<<"
python runner.py $1 $2 $3 evaluate --prompt_set 1

echo ">>>> Show results for experiments with different HPs <<<<"
python runner.py $1 $2 $3 summarize --prompt_set 1

- $1,$2,$3에 넣을 수 있는 값으로는 아래와 같다.

$1=("ircot" "ircot_qa" "oner" "oner_qa" "nor_qa")
$2=("codex" "flan-t5-xxl" "flan-t5-xl" "flan-t5-large" "flan-t5-base", "none")
$3=("hotpotqa" "2wikimultihopqa" "musique" "iirc")

- 나는 차근차근 코드를 따라 들어갈 예정이니 ($1, $2, $3) = (ircot, codex, hotpotqa) 이 1 set에 대해서만 먼저 돌려보겠다. 이러면 runner.py 에 들어갈 argu는 아래와 같다.

echo ">>>> Instantiate experiment configs with different HPs and write them in files. <<<<"
python runner.py ircot codex hotpotqa write --prompt_set 1

echo ">>>> Run experiments for different HPs on the dev set. <<<<"
python runner.py ircot codex hotpotqa predict --prompt_set 1

echo ">>>> Run evaluation for different HPs on the dev set. <<<<"
python runner.py ircot codex hotpotqa evaluate --prompt_set 1

echo ">>>> Show results for experiments with different HPs <<<<"
python runner.py ircot codex hotpotqa summarize --prompt_set 1

- 이 argu가 들어갈 때, 최종 적으로 run.py에는 어떤 식으로 argu가 들어가는지 확인 하기 위해서 아래와 같이 runner.py에 debug 걸고 run.py가 돌기 직전에 break point 잡은 다음 argu를 넣어준다.

그 결과 위에 argu 들은 최종적으로 아래와 같이 run.py에 들어가게 된다.

# python runner.py ircot codex hotpotqa write --prompt_set 1
python run.py write ircot_codex_hotpotqa --instantiation_scheme ircot --prompt_set 1 --no_diff

# python runner.py ircot codex hotpotqa predict --prompt_set 1
python run.py predict ircot_codex_hotpotqa --instantiation_scheme ircot --prompt_set 1 --evaluation_path processed_data/hotpotqa/dev_subsampled.jsonl --skip_if_exists --silent

# python runner.py ircot codex hotpotqa evaluate --prompt_set 1
python run.py evaluate ircot_codex_hotpotqa --instantiation_scheme ircot --prompt_set 1 --evaluation_path processed_data/hotpotqa/dev_subsampled.jsonl

# python runner.py ircot codex hotpotqa summarize --prompt_set 1
python run.py summarize ircot_codex_hotpotqa --instantiation_scheme ircot --prompt_set 1 --evaluation_path processed_data/hotpotqa/dev_subsampled.jsonl

- 이제 run.py의 main 함수 가장 첫 줄에 break point 걸고 debug mode로 argu 넣고 한줄씩 따라가면 끝.

1. python run.py write ircot_codex_hotpotqa --instantiation_scheme ircot --prompt_set 1 --no_diff

위의 코드를 돌리면, 그냥 실험 할 때 쓰이는 config를 생성한다. 이게 어떻게 쓰이는진 더 뒤에 보자.

2. python run.py predict ircot_codex_hotpotqa --instantiation_scheme ircot --prompt_set 1 --evaluation_path processed_data/hotpotqa/dev_subsampled.jsonl --skip_if_exists --silent

run.py 속에는 총 7개의 subprocess.call이 있는데. 위에 argu로 돌리면 line 727번에 있는 subprocess.call이 불리고 그때, predict.py가 불린다.

python predict.py instantiated_configs/ircot_codex_hotpotqa____prompt_set_1___bm25_retrieval_count__2___distractor_count__1.jsonnet processed_data/hotpotqa/dev_subsampled.jsonl --silent
python predict.py instantiated_configs/ircot_codex_hotpotqa____prompt_set_1___bm25_retrieval_count__2___distractor_count__2.jsonnet processed_data/hotpotqa/dev_subsampled.jsonl --silent
python predict.py instantiated_configs/ircot_codex_hotpotqa____prompt_set_1___bm25_retrieval_count__2___distractor_count__3.jsonnet processed_data/hotpotqa/dev_subsampled.jsonl --silent
python predict.py instantiated_configs/ircot_codex_hotpotqa____prompt_set_1___bm25_retrieval_count__4___distractor_count__1.jsonnet processed_data/hotpotqa/dev_subsampled.jsonl --silent
python predict.py instantiated_configs/ircot_codex_hotpotqa____prompt_set_1___bm25_retrieval_count__4___distractor_count__2.jsonnet processed_data/hotpotqa/dev_subsampled.jsonl --silent
python predict.py instantiated_configs/ircot_codex_hotpotqa____prompt_set_1___bm25_retrieval_count__4___distractor_count__3.jsonnet processed_data/hotpotqa/dev_subsampled.jsonl --silent
python predict.py instantiated_configs/ircot_codex_hotpotqa____prompt_set_1___bm25_retrieval_count__6___distractor_count__1.jsonnet processed_data/hotpotqa/dev_subsampled.jsonl --silent
python predict.py instantiated_configs/ircot_codex_hotpotqa____prompt_set_1___bm25_retrieval_count__6___distractor_count__2.jsonnet processed_data/hotpotqa/dev_subsampled.jsonl --silent
python predict.py instantiated_configs/ircot_codex_hotpotqa____prompt_set_1___bm25_retrieval_count__6___distractor_count__3.jsonnet processed_data/hotpotqa/dev_subsampled.jsonl --silent
python predict.py instantiated_configs/ircot_codex_hotpotqa____prompt_set_1___bm25_retrieval_count__8___distractor_count__1.jsonnet processed_data/hotpotqa/dev_subsampled.jsonl --silent
python predict.py instantiated_configs/ircot_codex_hotpotqa____prompt_set_1___bm25_retrieval_count__8___distractor_count__2.jsonnet processed_data/hotpotqa/dev_subsampled.jsonl --silent
python predict.py instantiated_configs/ircot_codex_hotpotqa____prompt_set_1___bm25_retrieval_count__8___distractor_count__3.jsonnet processed_data/hotpotqa/dev_subsampled.jsonl --silent

우리는 이 중 첫번째 argu로 다시 predict.py를 돌려보자.

predict.py 속에는 또 3개의 subprocess.call이 존재하고, 이 3개가 차례대로 모두 불린다. 불린 command 는 아래와 같다.

# Run predict_command: 
RETRIEVER_HOST=http://localhost RETRIEVER_PORT=8000 LLM_SERVER_HOST=http://localhost LLM_SERVER_PORT=8010 python -m commaqa.inference.configurable_inference --config instantiated_configs/ircot_codex_hotpotqa____prompt_set_1___bm25_retrieval_count__2___distractor_count__1.jsonnet --input processed_data/hotpotqa/dev_subsampled.jsonl --output predictions/ircot_codex_hotpotqa____prompt_set_1___bm25_retrieval_count__2___distractor_count__1/prediction__hotpotqa_to_hotpotqa__dev_subsampled.json --silent

# Run evaluate_command: 
python evaluate.py instantiated_configs/ircot_codex_hotpotqa____prompt_set_1___bm25_retrieval_count__2___distractor_count__1.jsonnet processed_data/hotpotqa/dev_subsampled.jsonl

# Run evaluate_command: 
python evaluate.py instantiated_configs/ircot_codex_hotpotqa____prompt_set_1___bm25_retrieval_count__2___distractor_count__1.jsonnet processed_data/hotpotqa/dev_subsampled.jsonl --official

다시 또 한줄 한줄 살펴봐야지 뭐... 아오 귀찮아... 첫번째 명령어를 살펴보면 환경변수 4개를 아래와 같이 선언 한 다음.

RETRIEVER_HOST=http://localhost 
RETRIEVER_PORT=8000 
LLM_SERVER_HOST=http://localhost 
LLM_SERVER_PORT=8010

commaqa.inference.configurable_inference.py 를 실행시키고, 그때 argu로 아래의 4개를 넘기네.

 --config instantiated_configs/ircot_codex_hotpotqa____prompt_set_1___bm25_retrieval_count__2___distractor_count__1.jsonnet
 --input processed_data/hotpotqa/dev_subsampled.jsonl
 --output predictions/ircot_codex_hotpotqa____prompt_set_1___bm25_retrieval_count__2___distractor_count__1/prediction__hotpotqa_to_hotpotqa__dev_subsampled.json
 --silent

Debugger로 따라 가려고 아래와 같이 환경변수는 임으로 선언해주고, debugger를 run하니 error가 발생하네. 그 이유는 debugger를 "Python Debugger: Current File with Arguments" 로 실행 시키면, dir을 타고 들어가서 configurable_inference.py 를 실행 시키기 때문이다.

이걸 해결 하기 위해서는 launch.json에 들어가서 아래와 같이 추가하여 debugger를 "Python Debugger: Module"로 실행 시키고 module의 이름을 "commaqa.inference.configurable_inference" 이렇게 설정해서 debugger가 안타고 들어가게 만들면 된다.

 # launch.json
{
    "version": "0.2.0",
    "configurations": [
        {
            "name": "Python Debugger: Module",
            "type": "debugpy",
            "request": "launch",
            "module": "commaqa.inference.configurable_inference",
            "args": "${command:pickArgs}",
        },
        {
            "name": "Python Debugger: Current File with Arguments",
            "type": "debugpy",
            "request": "launch",
            "program": "${file}",
            "console": "integratedTerminal",
            "args": "${command:pickArgs}",
            "subProcess": true,
        },
    ]
}

구성을 살펴보면 매우 간단하다.

build_decompser_and_models 에 들어가면

retriever : "retrieve_and_reset_paragraphs" -> class RetrieveAndResetParagraphsParticipant

cot_reasoning_gen: "step_by_step_cot_gen" -> class StepByStepCOTGenParticipant

exit_controller: "step_by_step_exit_controller" -> class StepByStepExitControllerParticipant

그리고 위의 3 class는 모두 ParticipantModel를 mom class로 가지고 있네.

model_map은 그냥 jsonnet으로 부터 parsing한 model의 정보들을 map으로 가지고 있는 것.

ModelController는 그냥 model_map을 가지고 있는 녀석이라 생각하면 될듯.

근데 결국 decompser도 model_map을 가지고 있는데... 쩝 코드 대충 만들었나보지.

이제 load_reader를 보자.

별거 없고 그냥 구현되어 있는게 class MultiParaRCReader 이거 뿐이네..

다음에 오면 이제 MultiParaRCReader.read_examples( ) 함수를 살펴보면 될듯. 여기에 input으로 'processed_data/hotpotqa/dev_subsampled.jsonl' 가 들어가거든.

'AI > LLM Paper review' 카테고리의 다른 글

Paper Review "Retrieval-Augmented Generation for Large Language Models: A Survey" (한국어) (0)	2024.02.02

Paper Review "Retrieval-Augmented Generation for Large Language Models: A Survey" (한국어)

jhworld 2024. 2. 2. 00:53

2024. 2. 2. 00:53

먼저 LLM 분야를 공부한지 한달도 안되었기에... 잘못된 설명이 있을 수 있습니다. 유의해 주세요.

+ 이 논문이 하도 길어서... 몇편으로 나누어서 올릴거 같은데 얼마나 될지 잘 모르겠네요. 일단 대충 쓰고나서 나중에 다시 리뉴얼 할 수 있을 것 같습니다.

+ 일 마치고 조금씩 쓰는거라 조금씩 완성해 나갈 예정입니다.

1. What is RAG?

먼저 기존에 저가 알고 있던 RAG의 개념은 "LLM에 inference를 할 때 관련 정보를 검색하여 그 결과를 context로 제공하여 생성의 품질을 높히는 것" 으로 이해를 하고 있었습니다. 그러나 이 논문에서 말하는 RAG는 조금 더 포괄적으로 말하는데요. 이 논문에서 말하는 RAG의 개념은 아래와 같습니다 (제가 느끼기에 ㅎㅎ) .

RAG란, 내가 어떤 X 라는 작업을 수행할 때, 단순히 X를 수행하는 것이 아니라 외부의 정보까지 끌고와서 X라는 작업을 수행하는 것.

즉, 제가 기존에 생각했던 RAG는 이 논문의 저자의 입장에서는 "수행해야 하는 작업 X가 inference인 RAG"라는 의미 입니다. 그리고 이 논문에서는 이 X라는 일을 3가지로 분류합니다.

1. Pre-training

2. Inference

3. Fine-tuning

(RAG에 대한 이런 정의는 마음에 드네요. 왜냐하면 이러한 정의는 RAG에 대하여 더욱 폭넓은 생각을 하게 해주기 때문입니다)

2. Inference RAG

다양한 RAG에 관심이 가지만, 최초에 이 논문을 읽기로 한 이유가 Inference 성능 향상이기 때문에 Inference RAG 먼저 다루도록 하겠습니다. 먼저 Inference RAG에 전반적인 block diagram은 아래와 같습니다.

Inference RAG는 3단계로 크게 구분이 되며 각 단계에 대한 설명은 아래와 같습니다.

1. Indexing

- RAG에 사용될 외부 문서들을 준비합니다. 그리고 그 문서들을 적절한 단위(chunk)로 나누어 Chunk vectors를 구성합니다. 그리고 이 과정을 "Chunking" 이라고 하겠습니다.

- 이렇게 나누어진 chunk vector들은 Embedding을 통해서 N차원의 semantic space에 mapping이 되게 됩니다. 그리고 이러한 mapping들이 모여 semantic space라는 vector DB를 구성하게 됩니다. ~~(Embedding 혹은 semantic space에 mapping이 된다는 의미가 무엇인지 잘 모르겠다면 여기를 참고 부탁드립니다.)~~

2. Retrieval

- 사용자의 입력(query)를 Embedding을 통하여 semantic space속 위치로 치환합니다.

- 그리고 query가 mapping된 위치에서 가까운 chunk를 구합니다.

- semantic space 상에서 가까이 있다는 것은 비슷한 의미를 지니는 것이기 query에 답하기 위해 필요한 chunk일 것이라는 가정 하에 이를 relative chunk로 이용합니다.

3. Generation

- 기존의 사용자가 입력한 query와 retrieval 과정을 통해 얻은 relative chunks를 이용하여 LLM에게 던질 질문을 생성합니다. (e.g., "Questins: ~~~. Please answer the above questions based on the dollowing contexts: ~~~~)

3. Inference RAG 성능 향상을 위한 방법들 분류

Inference RAG를 도서관에서 책을 찾아서 읽어서 질문에 대답하는 행위로 본다면, 크게 3가지로 분류해 볼 수 있다. 1. Indexing(도서관 속 책을 잘 만들자), 2. Retrieval(책을 잘 찾아오자), 3. Generation(찾아온 책 중에서 잘 써먹자).

3.1 Indexing

3.1.1 Chunk

When managing external documents, the initial step involves breaking them down into smaller chunks. 기계적으로 나누는 방식과 문맥적으로 나누는 방식이 있다. 기계적으로 나누는 것은 fix size, sliding window, various size, recursive chunk 등이 있다. 문맥적으로 나누는 방식은 문서 형태에 따라 parsing을 하는 것으로, 간단하게는 "\n\n"단위로 나누기, 마크다운 문법을 활용한 단원별 나누기 등 다양한 방법등의 재미없는 방식이 있다. 조금 더 재미 있을만한 문맥적 나누기는 metadata filtering, graph indexing이 있다.

- Metadata filtering: 간단한 예시로 최신 정보인게 중요할 경우 문서가 생성된 날짜를 meta data로 chunk에 달아두는 것이다. 이후에 최근 날짜 기준으로 fitering을 진행하면 답변의 결과가 최신 정보를 기반으로 하게 될 것이다.

- Grahp indexing: 내가 느끼기에 이 방식의 가장 큰 장점은, embedding model에 대한 fine tuning 없이도 + 손쉽게 retrieval 되는 명확한 논리 관계를 직접적으로 적용 가능하다는 것이다. 더 자세한 방식은 아래의 내용을 참고하자. NebulaGraph Launches Industry-First Graph RAG: Retrieval augmented Generation with LLM Based on Knowledge Graphs, RET-LLMs [Modarressi et al., 2023], SUGRE [Kang et al., 2023]

3.1.2 Fine-tuning Embedding Models

사실 embedding model은 open AI등에서 주는거 쓰는게 가장 가성비라 생각하기에 넘어가겠습니다.

3.2 Retrieval

3.2.1 Aligning Queries and Documents

Query를 embedding을 통해 segmantic space에 mapping하고, 그 위치 주위에 가장 가까운 context를 가져온다면, 그 context는 내가 넣은 query에 가장 관련 있는 context이니, 답변에 도움이 될 것이다! 가 기본적인 가정인데요. 저는 처음에 이걸 듣고. what is transformer? 는 질문이고. The transformer is ~~~~ 는 평문인데 이 둘이 비슷하게 mapping이 될까? 라는 의문이 들었습니다(물론! 이미 RAG가 효과가 있음이 증명이 되었기에 아주 잘 mapping을 해준다는 것이겠지만. 엄밀히 말하면 그런거죠 "평문과 질문 사이에 잘 mapping이 되는가?"

이런 걱정을 저만 한 것은 아니였고 이런 것을 "Aligning Queries and Documents" 라고 이 논문에서는 명명하더군요.

- Query Rewriting for Retrieval-Augmented Large Language Models

: Query를 LLM에 넣어 적절한 평문의 형태의 sub-query로 나눈 다음 해당 sub-query를 이용해서 retrieval을 한다.

- Precise Zero-Shot Dense Retrieval without Relevance Labels(HyDE)
: query로 바로 검색하는 것이 아니라. query를 LLM에 주고, passage를 생성하라 한 다음, 그 passage를 이용해서 retreive 하는 방식이다.

- Query2doc: Query Expansion with Large Language Models

: HyDE에서 조금 더 발전한 방식이다. 먼저 Query2Doc은 passage를 생성할때, zero-shot이 아닌 few-shot을 이용해서 passage 생성의 질을 높힌다. 그 다음 HyDE처럼 바로 passage만으로 검색하는 것이 아닌, passage 앞에 query를 반복적으로 복사 붙여넣기를 하여서 가중치를 맞춰준다. 마지막으로 이렇게 나온 passage를 BM25와 일정 비율로 섞어서 검색을 한다.

(BM25 (Best Matching 25)는 정보 검색 분야에서 널리 사용되는 문서 검색 알고리즘입니다. 이 알고리즘은 주어진 검색 쿼리에 대해 문서 집합에서 가장 관련성이 높은 문서를 순위화하여 반환합니다. BM25는 TF-IDF (Term Frequency-Inverse Document Frequency) 모델을 기반으로 하며, 여러 가지 개선 사항을 포함하여 더 정확한 검색 결과를 제공합니다.)

3.2.2 Aligning Retriever and LLM

영어로만 학습된 LLM에 RAG를 활용하기 위해서 Top K retrieval을 했는데, 찾아온 문서들이 질문에 대답하기 위한 핵심 정보를 잘 담고 있는덷 다 한국어인 경우를 생각해보자. RAG 결과는 엉망일 것이다. That is, enhancing retrieval hit rate through various techniques may not necessarily improve the final outcome.

물론 위의 경우는 매우 극단적인 예시이다. 그러나 위의 예시로 전달하고 싶은 것은, LLM은 최대한 다양한 corpus를 통해 학습을 하겠지만 corpus를 준비하는 과정 혹은 학습과정 자체에 어떠한 bias가 존제하여 LLM이 선호하는 chunk의 형태가 있을 수 있다는 것이다.

그렇다면 과연 LLM이 선호하는 chunk의 형태가 무엇일까? 답은 정해져 있지 않다. 해당 LLM에 사용된 수많은 corpus를 통계적으로 분석을 할 수도 없고 한다해도 의미있을지 모르겠다. 다만 이러한 질문에 대해서 아래의 논문들이 생각을 펼쳐나가는 법을 따라가며 통찰력을 키울 수 있을 것이라 생각한다.

- REPLUG [Shi et al., 2023]

- UPRISE [Cheng et al., 2023a]

- Atlas [Izacard et al., 2022] ... etc

3.3 Generation

Generation은 retrieved infromation과 query를 잘 조합해서 LLM에게 넣어주는 역할을 합니다. 이러한 Generation을 잘 하는 방법은 크게 두가지로 분류 됩니다. 1) Post-retrieval with Frozen LLM, 2) Fine-tuning LLM for RAG.

3.3.1 Post-retrieval with Frozen LLM

3.3.1.1 Information Compression

retrieved information에 나의 query와 관련이 없는 내용이 들어 있으면 noise로 작용할 수 있습니다. 또한 너무 방대한 크기의 chunk는 prompt에 모두 담기지 않는다는 문제가 있습니다. 그렇기에 찾아온 context들 중 직접적으로 중요한 부분들만 압축한다면 노이즈도 줄이고, token size limit에도 위배되지 않을 수 있을 것입니다.

- PRCA [Yang et al., 2023b] : training an information extractor( 아직 안 읽어봄 ).

- RECOMP [Xu et al., 2023a] : training an information condenser using contrastive learning(아직 안 읽어봄).

- Large language model is not a good fewshot information extractor, but a good reranker for hard samples! [Ma et al., 2023b]: Chunk를 small language model에게 시켜서 extract 문장 예비 후보군을 생성 -> 이후 LLM에게 넘겨서 이 후보군 들 중 가장 함축적으로 잘 표현하고, 의미상으로도 옭은것을 고르게 만듬. -> 그리고 그 문장을 chunk 대신 사용.

3.3.1.2 Reranking

Embedding을 통해 semantic space에서 query에 가장 가까운 k개의 chunk를 가져오는 것은, 기본적으로 "유사도"를 이용해서 가져오는 것이다. 문서를 검색함에 있어서 이런 "유사도"를 사용하는 방식도 있지만, "query likelihood models" 을 이용하는 방식도 있다.

- Open-source large language models are strong zero-shot query likelihood models for document ranking. [Zhuang et al., 2023]

: 아주 간단하게 표현하면, 가저온 chunk들을 다시 하나하나 LLM 에게 " Please write a question based on this passage. {chunk}" 이렇게 물어 본 다음. LLM에서 softmax 직전단의 embedding을(softmax 이후에는 LLM이 생성한 question이 나올 것이다) 가져와서 우리 query와 비교하여 가장 가까운 순으로 순서를 'reranking' 하는 것이다.

3.3.2 Fine-tuning LLM for RAG

Fine tuning은 아직 잘 몰라서 안 읽음.

대충이라도 읽어보고 싶은 것들

- Self-verification 관련 "Language Models (Mostly) Know What They Know" 논문

- Tree of Clarifications

(나머지는 다음에 또 시간나면 쓰겠습니다)

'AI > LLM Paper review' 카테고리의 다른 글

2022_Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions (IR-RAG 실습) (2)	2024.02.18

PREV 이전 1 NEXT 다음

Surviving as a Eng in the modern ear