-
Notifications
You must be signed in to change notification settings - Fork 0
commit for comments #2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
This file was deleted.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,63 @@ | ||
| defmodule GithubWebscraping.ExtractFileInfos do | ||
| def fetch_file_name(html) do | ||
| file_name = | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. não precisa atribuir a var se vc ja esta retornando o resultado da ultima linha da função There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. boa |
||
| html | ||
| |> Floki.find("div.repository-content") | ||
| |> Floki.find("div.d-flex.flex-items-start.flex-shrink-0") | ||
| |> Floki.find("strong.final-path") | ||
| |> Floki.text() | ||
|
|
||
| file_name | ||
| end | ||
|
|
||
| def fetch_extension(html) do | ||
| file_name = | ||
| html | ||
| |> Floki.find("div.repository-content") | ||
| |> Floki.find("div.d-flex.flex-items-start.flex-shrink-0") | ||
| |> Floki.find("strong.final-path") | ||
| |> Floki.text() | ||
|
|
||
| Path.extname(file_name) | ||
|
Comment on lines
+15
to
+21
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Isso seria o mesmo que: html
|> Floki.find(...)
|> Floki.find(...)
|> Floki.find(...)
|> Floki.text()
|> Path.extname()?
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. sim |
||
| end | ||
|
|
||
| def fetch_line_numbers(html) do | ||
| line_numbers = | ||
| html | ||
| |> Floki.find("div.text-mono.f6.flex-auto.pr-3") | ||
| |> Floki.text() | ||
|
|
||
| if String.contains?(line_numbers, "sloc") do | ||
| kylo_bytes = List.first(List.first(Regex.scan(~r/(\d+) lines/, line_numbers))) | ||
| kylo_bytes = List.first(List.first(Regex.scan(~r/(\d+)/, kylo_bytes))) | ||
| {lines, _} = Integer.parse(kylo_bytes) | ||
| lines | ||
| else | ||
| 0 | ||
| end | ||
| end | ||
|
|
||
| def fetch_file_size_in_bytes(html) do | ||
| string_bytes = | ||
| html | ||
| |> Floki.find("div.text-mono.f6.flex-auto.pr-3") | ||
| |> Floki.text() | ||
|
|
||
| cond do | ||
| String.contains?(string_bytes, "KB") -> | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. EU prefiro usar pattern matching ao inves de conds. entao criaria uma funcao auxiliar pra lidar com essas diferencas de codigo ai
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. como eu iria fazer isso com pattern ? tem algum exemplo ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. E nessa linha ao inves de fazer o cond, vc faz: Aproveitando, as linhas 1 e 2 de cada umas das conds sao iguais, ou seja, duplicidade de código que dá pra ser extraida pra funcoes novas :) |
||
| k_bytes = String.trim(List.last(List.last(Regex.scan(~r/(\d+)?.?(\d+)/, string_bytes)))) | ||
| {bytes, _} = Float.parse(k_bytes) | ||
| bytes * 1000.0 | ||
|
|
||
| String.contains?(string_bytes, "MB") -> | ||
| m_bytes = String.trim(List.last(List.last(Regex.scan(~r/(\d+)?.?(\d+)/, string_bytes)))) | ||
| {bytes, _} = Float.parse(m_bytes) | ||
| bytes * 1_000_000.0 | ||
|
|
||
| String.contains?(string_bytes, "Bytes") -> | ||
| bytes = String.trim(List.last(List.last(Regex.scan(~r/(\d+)/, string_bytes)))) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Dava pra ter usado |> |
||
| {bytes, _} = Float.parse(bytes) | ||
| bytes | ||
| end | ||
| end | ||
| end | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| defmodule Lib.GithubWebscraping do | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ficaram 2 arquivos com o mesmo nome. Talvez esse poderia ser só NomeDoProjeto.Api mesmo |
||
| @moduledoc """ | ||
| GithubWebscraping API module | ||
| """ | ||
| alias GithubWebscraping.MappingRepository | ||
|
|
||
| defdelegate get_repository_infos(url), to: MappingRepository, as: :process | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. DAve thomas dorme feliz |
||
| end | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| defmodule GithubWebscraping.GroupFileInformation do | ||
| @spec group_infos(list(%GithubWebscraping.Schemas.GithubFile{})) :: map() | ||
| def group_infos(files) do | ||
| all_lines = get_all_lines(files) | ||
| all_bytes = get_all_bytes(files) | ||
|
|
||
| repo_info = %{ | ||
| all_lines: all_lines, | ||
| all_bytes: all_bytes, | ||
| files: files | ||
| } | ||
|
|
||
| repo_info | ||
| end | ||
|
|
||
| defp get_all_lines(files) do | ||
| all_lines = Enum.reduce(files, 0, fn file, acc -> file.file_lines + acc end) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. n precisa de duas linhas, nome do metodo já é suficiente pra entender |
||
| all_lines | ||
| end | ||
|
|
||
| defp get_all_bytes(files) do | ||
| all_bytes = Enum.reduce(files, 0, fn file, acc -> file.file_bytes + acc end) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. mesma coisa do comentario acima |
||
|
|
||
| all_bytes | ||
| end | ||
| end | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,84 @@ | ||
| defmodule GithubWebscraping.MappingRepository do | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. não sei se entendi o Mapping no nome. O que acha de algo como |
||
| alias GithubWebscraping.{ExtractFileInfos, GroupFileInformation} | ||
| alias GithubWebscraping.Schemas.GithubFile | ||
|
|
||
| @git_url_base "https://github.com" | ||
|
|
||
| @spec process(String.t()) :: map() | ||
| def process(url) do | ||
| files = mapping_repository_by_url(url) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |> |
||
| files_with_all_infos = GroupFileInformation.group_infos(files) | ||
| files_with_all_infos | ||
| end | ||
|
|
||
| defp mapping_repository_by_url(url) do | ||
| urls = get_urls(is_first_url(url)) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |> |
||
| files_url = get_files_url(urls) | ||
| pastes_url = get_pastes_url(urls) | ||
|
|
||
| files = | ||
| Enum.map(files_url, fn file_url -> | ||
| build_file(@git_url_base <> file_url) | ||
| end) | ||
|
|
||
| IO.puts("\n----------Returned files----------\n") | ||
| IO.inspect(files) | ||
| IO.puts("\n----------Returned urls-----------\n") | ||
| IO.inspect(pastes_url) | ||
|
Comment on lines
+24
to
+27
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Eu n sei o quanto essa opinião vale num projeto Elixir, mas se eu estivesse fazendo isso em OO, criaria uma classe separada para lidar com "apresentação". Basicamente um Presenter, ou View. Provavelmente faria um módulo separado para agrupar essa responsabilidade de output
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. eu só coloquei isso para testes mesmo
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. mas bom ponto de qualquer forma |
||
|
|
||
| other_files = | ||
| if Enum.count(pastes_url) > 0 do | ||
| Enum.map(pastes_url, fn url -> | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 💅 Enum.map(dirs, &mapping_repository_by_url/1) |
||
| mapping_repository_by_url(url) | ||
| end) | ||
| else | ||
| [] | ||
| end | ||
|
|
||
| if Enum.count(other_files) > 0 do | ||
| Enum.concat(files, Enum.reduce(other_files, fn elem, acc -> elem ++ acc end)) | ||
| else | ||
| files | ||
| end | ||
| end | ||
|
|
||
| defp build_file(url) do | ||
| html = download_string_url(url) |> Floki.parse_document!() | ||
| name = ExtractFileInfos.fetch_file_name(html) | ||
| lines = ExtractFileInfos.fetch_line_numbers(html) | ||
| extension = ExtractFileInfos.fetch_extension(html) | ||
| bytes = ExtractFileInfos.fetch_file_size_in_bytes(html) | ||
|
|
||
| GithubFile.build(url, name, extension, bytes, lines) | ||
| end | ||
|
|
||
| defp get_urls(url) do | ||
| url | ||
| |> download_string_url() | ||
| |> Floki.parse_document!() | ||
| |> Floki.find("div.js-details-container.Details") | ||
| |> Floki.find("div.js-navigation-item") | ||
| |> Floki.find("a.js-navigation-open.Link--primary") | ||
| |> Floki.attribute("href") | ||
| end | ||
|
|
||
| defp download_string_url(url) do | ||
| HTTPoison.get!(url).body | ||
| end | ||
|
|
||
| defp get_files_url(urls) do | ||
| Enum.filter(urls, fn url -> url =~ "blob" end) | ||
| end | ||
|
|
||
| defp get_pastes_url(urls) do | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. get_directories_urls* There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. em minha defesa existe essa palavra em inglês |
||
| Enum.filter(urls, fn url -> url =~ "tree" end) | ||
| end | ||
|
|
||
| defp is_first_url(url) do | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nome do metodo ta ruim. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
|
||
| if url =~ "https://github.com/" do | ||
| url | ||
| else | ||
| "https://github.com" <> url | ||
| end | ||
| end | ||
| end | ||
This file was deleted.
This file was deleted.
This file was deleted.
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,16 @@ | ||
| defmodule GithubWebscraping.Schemas.GithubFile do | ||
| alias GithubWebscraping.Schemas.GithubFile | ||
|
|
||
| @derive {Jason.Encoder, only: [:file_url, :file_name, :extension, :file_bytes, :file_lines]} | ||
| defstruct [:file_url, :file_name, :extension, :file_bytes, :file_lines] | ||
|
|
||
| def build(file_url, file_name, extension, file_bytes, file_lines) do | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. O elixir tem uma função struct(%GithubFile{}, [key: value]) que retorna um struct. Não precisaria do build |
||
| %GithubFile{ | ||
| file_url: file_url, | ||
| file_name: file_name, | ||
| extension: extension, | ||
| file_bytes: file_bytes, | ||
| file_lines: file_lines | ||
| } | ||
| end | ||
| end | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,10 @@ | ||
| defmodule GithubWebscrapingWeb.GithubWebscrapingController do | ||
| use GithubWebscrapingWeb, :controller | ||
|
|
||
| alias Lib.GithubWebscraping | ||
|
|
||
| def index(conn, %{"github_url" => github_url}) do | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Essa rota seria index mesmo? ou show recebendo url? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. |
||
| files = GithubWebscraping.get_repository_infos(github_url) | ||
| json(conn, files) | ||
| end | ||
| end | ||
This file was deleted.

There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Esse modulo recebe um arquivo html ne ? olhando as funções, talvez daria pra chamar só de File. HTMLFile.get_name