🚨CODE:RED🚨

Crawling and machine learning

개발 기간 : 24.04.12 ~ 24.04.18

프로젝트 소개

당일의 네이버 뉴스를 스크래핑후 가공하고자 하는 뉴스를 필터링. 사건사고에 해당하는 텍스트 데이터들을 학습한 모델로 뉴스의 클래스를 구분하여 사건사고인지 아닌지 판별.

주요 기능

Selenium으로 동적인 작업처리
- 뉴스 탭에서의 더보기 버튼 클릭
BeautifulSoup을 사용하여 정적 데이터 수집
- 뉴스 게시물 수집
문자열 유사도 알고리즘 사용(Jaro-Winkler similarity)
- 줄거리와 제목을 서로 비교하여 문자열 유사도 평가진행 후 중복 제거
기계학습 진행(나이브 베이즈 알고리즘)
- 사건/사고(accident), 기타(etc) 두개의 클래스를 생성
- 문자열이 주어졌을때 조건부 확률을 계산후 가장 높은 확률을 가진 클래스 선택
- 사건사고인지, 그외의 사건인지 분류하여 저장

📚 stack

시작하기

Requirements

python 3.12.3
pip 24.0
Flask 3.0.3
Werkzeug 3.0.2

Installation

$ git clone https://github.com/startcoriny/CODE-RED_Crawling.git
$ cd CODE-RED_Crawling

BackEnd

$ pip install -r requirements.txt
$ python app.py

##Blog startcoriny

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
code_red_crawling		code_red_crawling
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚨CODE:RED🚨

Crawling and machine learning

프로젝트 소개

📚 stack

시작하기

Requirements

Installation

BackEnd

About

Uh oh!

Releases

Packages

Uh oh!

Languages

startcoriny/CODE-RED_Crawling

Folders and files

Latest commit

History

Repository files navigation

🚨CODE:RED🚨

Crawling and machine learning

프로젝트 소개

📚 stack

시작하기

Requirements

Installation

BackEnd

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages