Skip to content

A simple yaml-based xpath crawler framework for easy tracking site updates. https://zhupeng.github.io/

License

Notifications You must be signed in to change notification settings

ZhuPeng/trackupdates

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

118 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HitCount

trackupdates

A simple yaml-based xpath crawler framework for easy tracking site updates. Visit https://zhupeng.github.io/trackupdates/

Getting Started

git clone git@github.com:ZhuPeng/trackupdates.git
cd trackupdates
pip install -r requirements.txt
# update the smtp mail configure to your own
python trackupdates/trackupdates.py examples/githubtrending.yaml

The above script running as the yaml configuration file specified, it will track the updates of github trending with certain cron time and notify the contents which match the keywords you specified.

And you can visit localhost:5000 to see the Web page.

Yaml Configuration

The yaml configuration file syntax was inspired by Prometheus and Alertmanager. This is an example configuration (examples/githubtrending.yaml) that track the updates of github trending and getting notification when there was a new Python project. The crawl results store in github.db with sqlite.

global:
  # The smarthost and SMTP sender used for mail notifications.
  smtp_smarthost: 'example.smtp.com:587'
  smtp_from: 'example@examle.com'
  smtp_auth_username: 'example@example.com'
  smtp_auth_password: 'example'

  store: 'github.db'

jobs:
- name: 'githubtrending'
  # run every one hour at 0 minute
  cron: '*/1|0'
  url:
    test_target: 'examples/githubtrending'
    target: 'https://github.com/trending{lang}?since=daily'
    query_parameter:
      lang: '/python,/go,'
  parser: 'githubtrending'
  update:
    receiver: 'example'
    match:
      lang: 'Python'

parsers:
- name: 'githubtrending'
  base_url: 'https://github.com'
  base_xpath:
  - "//li[@class='col-12 d-block width-full py-4 border-bottom']"
  attr:
    url: 'div/h3/a/@href'
    repo: 'div/h3/a'
    desc: "div[@class='py-1']/p"
    lang: "div/span[@itemprop='programmingLanguage']"
    star: "div/a[@aria-label='Stargazers']"
    fork: "div/a[@aria-label='Forks']"
    today: "div/span[@class='float-right']"
  format:
    markdown: '[{lang}: {repo}]({url}), star: {star}, fork: {fork}, today-star: {today} <br> {desc}'
    html: '<p><a href="{url}">{lang}: {repo}</a> start: {star}, fork: {fork}, today-star: {today}, {desc}</p>'

receivers:
- name: 'example'
  email_configs:
    to:
    - 'example@example.com'

Now you can also visit a lot already configured at examples/public.yaml, which contains CoreOS blog, Kubernetes Blog etc. You can visti https://zhupeng.github.io/trackupdates/.

License

MIT, please see LICENSE.

About

A simple yaml-based xpath crawler framework for easy tracking site updates. https://zhupeng.github.io/

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published