使用 Llama B 与 Repos(PR) 聊天
作者:Tobi.A
介绍:
在使用大型存储库时,跟上拉取请求(PR)——尤其是那些包含数千行代码的请求——可能是一个真正的挑战。无论是了解特定变化的影响还是浏览大量更新,公关评论很快就会变得势不可挡。为了解决这个问题,我着手构建一个项目,让我能够快速有效地了解这些大型 PR 中的变化。
使用检索增强生成(RAG)结合Langtrace的可观察性工具,我开发了“Chat with Repo(PRs)”——一个旨在简化大型PR审查过程的工具。此外,我还记录并比较了 Llama 3.1B 与 GPT-4o 的性能。通过这个项目,我探索了这些模型如何处理代码解释和摘要,以及哪些模型为此用例提供了速度和准确性的最佳平衡。
本博客中使用的所有代码都可以在这里找到
在深入了解细节之前,我们先概述一下该项目中使用的关键工具:
法学硕士服务:
- OpenAI API
- Groq API
- Ollama(针对当地法学硕士)
嵌入模型:
- SentenceTransformers(特别是“all-mpnet-base-v2”)
矢量数据库:
- FAISS(Facebook AI 相似性搜索)
法学硕士可观察性:
- Langtrace 用于端到端跟踪和指标
Chat with Repo 的工作原理:
Chat with Repo(PRs) 系统实现了一个简单的 RAG 架构来进行 PR 分析。它首先通过 GitHub 的 API 获取 PR 数据,对大文件进行分块来管理令牌限制。这些块使用 SentenceTransformers 进行矢量化,创建捕获代码语义的密集嵌入。 FAISS 索引可以对这些嵌入进行亚线性时间相似性搜索。查询经历相同的嵌入过程,促进与代码索引的语义匹配。检索到的块形成所选 LLM 的动态上下文(通过 OpenAI、Groq 或 Ollama),然后执行上下文推理。这种方法利用了向量搜索的效率和法学硕士的生成能力,允许细致入微的代码理解,适应不同的 PR 复杂性。最后,Langtrace 集成提供了嵌入和 LLM 操作的精细可观察性,提供了对 RAG 管道中性能瓶颈和潜在优化的见解。让我们深入了解它的关键组件。
分块过程:
该系统中的分块过程旨在将大型拉取请求分解为可管理的、上下文丰富的部分。这个过程的核心是在 IngestionService 类中实现的,特别是 chunk_large_file 和 create_chunks_from_patch 方法中。
当提取 PR 时,每个文件的补丁都会单独处理。 chunk_large_file方法负责分割大文件:
def chunk_large_file(self, file_patch: str, chunk_size: int = config.CHUNK_SIZE) -> List[str]: lines = file_patch.split('\n') chunks = [] current_chunk = [] current_chunk_size = 0 for line in lines: line_size = len(line) if current_chunk_size + line_size > chunk_size and current_chunk: chunks.append('\n'.join(current_chunk)) current_chunk = [] current_chunk_size = 0 current_chunk.append(line) current_chunk_size += line_size if current_chunk: chunks.append('\n'.join(current_chunk)) return chunks
此方法根据配置的块大小拆分文件,确保每个块不超过此限制。这是一种基于行的方法,试图在大小限制内尽可能地将逻辑代码单元保持在一起。
一旦文件被分割成块,create_chunks_from_patch 方法就会处理每个块。此方法通过上下文信息丰富每个块:
def create_chunks_from_patch(self, repo_info, pr_info, file_info, repo_explanation, pr_explanation): code_blocks = self.chunk_large_file(file_info['patch']) chunks = [] for i, block in enumerate(code_blocks): chunk_explanation = self.generate_safe_explanation(f"Explain this part of the code and its changes: {block}") chunk = { "code": block, "explanations": { "repository": repo_explanation, "pull_request": pr_explanation, "file": file_explanation, "code": chunk_explanation }, "metadata": { "repo": repo_info["name"], "pr_number": pr_info["number"], "file": file_info["filename"], "chunk_number": i + 1, "total_chunks": len(code_blocks), "timestamp": pr_info["updated_at"] } } chunks.append(chunk)
它使用 LLM 服务为每个代码块生成解释。
它附加元数据,包括存储库名称、PR 编号、文件名、块编号和时间戳。
它包括更广泛的上下文,例如存储库和拉取请求解释。
这种方法确保每个块不仅仅是一段代码,而是一个丰富的、上下文感知的单元:
这包括:
- 实际代码更改
- 这些变化的解释
- 文件级上下文
- 公关级别上下文
- 存储库级别上下文
嵌入和相似性搜索:
EmbeddingService 类处理嵌入的创建和相似性搜索:
1.嵌入创建:
对于每个块,我们使用 SentenceTransformer 创建一个嵌入:
text_to_embed = self.get_full_context(chunk) embedding = self.model.encode([text_to_embed])[0]
嵌入结合了代码内容、代码解释、文件解释、PR 解释和存储库解释。
2.索引:
我们使用 FAISS 来索引这些嵌入:
self.index.add(np.array([embedding]))
3。查询处理:
当提出问题时,我们为查询创建嵌入并执行相似性搜索:
query_vector = self.model.encode([query]) D, I = self.index.search(query_vector, k)
4. Chunk Selection:
The system selects the top k chunks (default 3) with the highest similarity scores.
This captures both code structure and semantic meaning, allowing for relevant chunk retrieval even when queries don't exactly match code syntax. FAISS enables efficient similarity computations, making it quick to find relevant chunks in large repositories.
Langtrace Integration:
To ensure comprehensive observability and performance monitoring, we've integrated Langtrace into our "Chat with Repo(PRs)" application. Langtrace provides real-time tracing, evaluations, and metrics for our LLM interactions, vector database operations, and overall application performance.
Model Performance Evaluation: Llama 3.1 70b Open-Source vs. GPT-4o Closed-Source LLMs in Large-Scale Code Review:
To explore how open-source models compare to their closed-source counterparts in handling large PRs, I conducted a comparative analysis between Llama 3.1b (open-source) and GPT-4o (closed-source). The test case involved a significant update to the Langtrace's repository, with over 2,300 additions, nearly 200 deletions, 250 commits, and changes across 47 files. My goal was to quickly understand these specific changes and assess how each model performs in code review tasks.
Methodology:
I posed a set of technical questions related to the pull request (PR), covering:
- Specific code change explanations
- Broader architectural impacts
- Potential performance issues
- Compatibility concerns
Both models were provided with the same code snippets and contextual information. Their responses were evaluated based on:
- Technical accuracy
- Depth of understanding
- Ability to infer broader system impacts
Key Findings:
Code Understanding:
- Llama 3.1b performed well in understanding individual code changes, especially with workflow updates and React component changes.
- GPT-4o had a slight edge in connecting changes to the overall system architecture, such as identifying the ripple effect of modifying API routes on Prisma schema updates.
Knowledge of Frameworks:
- Both models demonstrated strong understanding of frameworks like React, Next.js, and Prisma.
- Llama 3.1b's versatility is impressive, particularly in web development knowledge, showing that open-source models are closing the gap on specialized domain expertise.
Architectural Insights:
- GPT-4o excelled in predicting the broader implications of local changes, such as how adjustments to token-counting methods could affect the entire application.
- Llama 3.1b, while precise in explaining immediate code impacts, was less adept at extrapolating these changes to system-wide consequences.
Handling Uncertainty:
- Both models appropriately acknowledged uncertainty when presented with incomplete data, which is crucial for reliable code review.
- Llama 3.1b's ability to express uncertainty highlights the progress open-source models have made in mimicking sophisticated reasoning.
Technical Detail vs. Broader Context:
- Llama 3.1b provided highly focused and technically accurate explanations for specific code changes.
- GPT-4o offered broader system context, though sometimes at the expense of missing finer technical details.
Question Comparison:
Below are examples of questions posed to both models, the expected output, and their respective answers:
Conclusion:
While GPT-4o remains stronger in broader architectural insights, Llama 3.1b's rapid progress and versatility in code comprehension make it a powerful option for code review. Open-source models are catching up quickly, and as they continue to improve, they could play a significant role in democratizing AI-assisted software development. The ability to tailor and integrate these models into specific development workflows could soon make them indispensable tools for reviewing, debugging, and managing large codebases.
We'd love to hear your thoughts! Join our community on Discord or reach out at support@langtrace.ai to share your experiences, insights, and suggestions. Together, we can continue advancing observability in LLM development and beyond.
Happy tracing!
유용한 자료
Langtrace 시작하기 https://docs.langtrace.ai/introduction
랭트레이스 트위터(X) https://x.com/langtrace_ai
랭트레이스 링크드인 https://www.linkedin.com/company/langtrace/about/
랭트레이스 홈페이지 https://langtrace.ai/
랭트레이스 디스코드 https://discord.langtrace.ai/
랭트레이스 Github https://github.com/Scale3-Labs/langtrace
以上是使用 Llama B 与 Repos(PR) 聊天的详细内容。更多信息请关注PHP中文网其他相关文章!

热AI工具

Undresser.AI Undress
人工智能驱动的应用程序,用于创建逼真的裸体照片

AI Clothes Remover
用于从照片中去除衣服的在线人工智能工具。

Undress AI Tool
免费脱衣服图片

Clothoff.io
AI脱衣机

Video Face Swap
使用我们完全免费的人工智能换脸工具轻松在任何视频中换脸!

热门文章

热工具

记事本++7.3.1
好用且免费的代码编辑器

SublimeText3汉化版
中文版,非常好用

禅工作室 13.0.1
功能强大的PHP集成开发环境

Dreamweaver CS6
视觉化网页开发工具

SublimeText3 Mac版
神级代码编辑软件(SublimeText3)

Python更易学且易用,C 则更强大但复杂。1.Python语法简洁,适合初学者,动态类型和自动内存管理使其易用,但可能导致运行时错误。2.C 提供低级控制和高级特性,适合高性能应用,但学习门槛高,需手动管理内存和类型安全。

要在有限的时间内最大化学习Python的效率,可以使用Python的datetime、time和schedule模块。1.datetime模块用于记录和规划学习时间。2.time模块帮助设置学习和休息时间。3.schedule模块自动化安排每周学习任务。

Python在开发效率上优于C ,但C 在执行性能上更高。1.Python的简洁语法和丰富库提高开发效率。2.C 的编译型特性和硬件控制提升执行性能。选择时需根据项目需求权衡开发速度与执行效率。

每天学习Python两个小时是否足够?这取决于你的目标和学习方法。1)制定清晰的学习计划,2)选择合适的学习资源和方法,3)动手实践和复习巩固,可以在这段时间内逐步掌握Python的基本知识和高级功能。

Python和C 各有优势,选择应基于项目需求。1)Python适合快速开发和数据处理,因其简洁语法和动态类型。2)C 适用于高性能和系统编程,因其静态类型和手动内存管理。

pythonlistsarepartofthestAndArdLibrary,herilearRaysarenot.listsarebuilt-In,多功能,和Rused ForStoringCollections,而EasaraySaraySaraySaraysaraySaraySaraysaraySaraysarrayModuleandleandleandlesscommonlyusedDduetolimitedFunctionalityFunctionalityFunctionality。

Python在自动化、脚本编写和任务管理中表现出色。1)自动化:通过标准库如os、shutil实现文件备份。2)脚本编写:使用psutil库监控系统资源。3)任务管理:利用schedule库调度任务。Python的易用性和丰富库支持使其在这些领域中成为首选工具。

Python在科学计算中的应用包括数据分析、机器学习、数值模拟和可视化。1.Numpy提供高效的多维数组和数学函数。2.SciPy扩展Numpy功能,提供优化和线性代数工具。3.Pandas用于数据处理和分析。4.Matplotlib用于生成各种图表和可视化结果。
