Reborn digital i black box – wpływ procesu archiwizacji na zasób archiwów Webu

Konopa, Bartlomiej

doi:AKZ.2019.008

Article details

Journal

Archiwa – Kancelarie – Zbiory

2019 | 10(12) | 147-167

Article title

Reborn digital i black box – wpływ procesu archiwizacji na zasób archiwów Webu

Authors

Bartlomiej Konopa

Content

Full texts:

https://apcz.umk.pl/czasopisma/index.php/AKZ/article/view/AKZ.2019.008 [remote]

Title variants

Reborn digital and black box – impact of archiving processes on holdings of Web archives

Languages of publication

PL EN

Abstracts

PL

W artykule podjęte zostały rozważania nad ogólną charakterystyką zasobów znajdujących się w różnorodnych archiwach Webu. Zrozumienie problemu postawionego w tytule wydaje się być kluczowe dla refleksji nad tym nowym rodzajem źródeł oraz wykorzystaniem ich w późniejszych badaniach. Użytkownik chcący zagłębić się w dawną Sieć musi wiedzieć, co przechowują tego rodzaju cyfrowe repozytoria i jaki jest charakter tych zbiorów. Problem ten został przedstawiony na dwóch płaszczyznach, które wynikają z dwóch etapów archiwizacji Webu – selekcji i gromadzenia. Pierwszy aspekt – teoretyczny zależy przede wszystkim od gromadzenia zasobów metodą harvestingu, czyli z wykorzystaniem crawlerów. Ich możliwości oraz ograniczenia przekładają się na to, co zostanie zarchiwizowane i jaka będzie tego postać. Należy odnotować fakt, iż prowadzi to do pewnego przekształcenia zasobów Sieci, a więc po zarchiwizowaniu nie będą już one dokładnie tym, czym były wcześniej. Drugi aspekt – praktyczny jest efektem selekcji, a więc wszystkich decyzji podejmowanych przez pracowników archiwum przed rozpoczęciem gromadzenia. Zaliczyć można do nich m.in. określenie celu i zakresu archiwizacji oraz wybór strategii pozwalających je realizować. W tekście przedstawione zostały dwie podstawowe metody – archiwizacja masowa oraz selektywna. Znaczącym utrudnieniem dla użytkowników archiwów Webu jest brak informacji dotyczący stosowanych kryteriów selekcji lub logów crawlera. Zasoby dawnej Sieci mogą stanowić pewnego rodzaju zagadkę, ponieważ nie zawsze można wskazać, co się w nich znalazło, a co nie, i jaka była tego przyczyna.

EN

The article contemplates general characteristics of holdings of various Web archives. Understanding the problem formed in the title seems to be crucial for reflections on this new type of sources and using it research. A user aiming at familiarizing with the old Web must know what is stored in this type of digital repositories and what characterizes these holdings. The problem was presented on two levels, related to two stages of archiving – selection and acquisition. The first aspect, of theoretical character, depends mostly on gathering sources using the method of harvesting (with crawlers). Their capabilities and limitations result in what will be archived and in what form. It must be noted, that this can lead to a certain deformation of Web sources, thus after archiving they will not be exactly what they were before. The second aspect, of practical character, is an effect of selection, i.e. all decisions made by archives’ employees before the process of gathering starts. These decisions comprise of, among others, specifying the aim and scope of archiving and choosing strategies to accomplish them. The text presents two basic strategies – mass archiving and selective archiving. An important obstacle for Web archives users is lack of information about selection criteria or crawlers’ logs. Holdings of the old Web can be a kind of mystery, because not always one can describe, what is in them and what is not, and what is the reason for this state.

Keywords

PL

archiwizacja Webu archiwa Webu źródła cyfrowe zasoby cyfrowe historia Webu reborn digital black box;

EN

Web archiving Web archives digital sources digital collections Web history reborn digital black box

Publisher

Uniwersytet Mikołaja Kopernika w Toruniu. Wydawnictwo UMK

Journal

Archiwa – Kancelarie – Zbiory

Year

2019

Issue

10(12)

Pages

147-167

Physical description

Contributors

author

Bartlomiej Konopa

bartlomiejkonopa@gmai.com

Archiwum Państwowe w Bydgoszczy

References

„About DACHS | DACHS | East Asian Library”. Dostęp 26.08.2019. https://www.zo.uni-heidelberg.de/boa/digital_resources/dachs/about_en.html.
AlSum, Ahmed, Michele C. Weigle, Michael L. Nelson, i Herbert Van de Sompel. „Profiling Web Archive Coverage for Top-Level Domain and Content Language”. International Journal on Digital Libraries 14, nr 3–4 (sierpień 2014): 149–66. https://doi.org/10.1007/s00799-014-0118-y.
Archive-It. „About Us”. Dostęp 26.08.2019. https://archive-it.org/blog/learn-more/.
Archive-It. „Harvard University Archives”. Dostęp 26.08.2019. https://archive-it.org/organizations/935.
Archive-It. „MIT Libraries”. Dostęp 26.08.2019. https://archive-it.org/home/MIT.
„Archive Team Collections.” Dostęp 26.08.2019. https://archive.org/details/archiveteam?tab=about.
Ben-David, Anat, i Adam Amram. „The Internet Archive and the Socio-Technical Construction of Historical Facts”. Internet Histories 2, nr 1–2 (3 kwiecień 2018): 179–201. https://doi.org/10.1080/24701475.2018.1455412.
Bodleian Libraries. „BEAM: Bodleian Libraries’ Web Archive”. Dostęp 26.08.2019. https://www.bodleian.ox.ac.uk/beam/webarchive.
„Browse DACHS | DACHS | East Asian Library”. Dostęp 26.08.2019. https://www.zo.uni-heidelberg.de/boa/digital_resources/dachs/browse_en.html.
Brügger, Niels. Archiving Websites: general Considerations and Strategies. Aarhus: The Centre for Internet Research, 2005. http://cfi.au.dk/fileadmin/www.cfi.au.dk/publikationer/archiving_underside/archiving.pdf.
Brügger, Niels. „Web Archiving – Between Past, Present, and Future.” W Handbook of Internet Studies, zredagowali Mia Consalvo, Charles Ess, 24–42. Oxford, UK: Wiley-Blackwell, 2011.
Brügger, Niels. „Web Historiography and Internet Studies: Challenges and Perspectives”. New Media & Society 15, nr 5 (sierpień 2013): 752–64. https://doi.org/10.1177/
Brügger, Niels. „Wenn Das Web Vergangenheit Wird: Web-Geschichtsschreibung, Digitale Geschichte Und Internet-Forschung / When the Present Web Is Later the Past: Web Historiography, Digital History and Internet Studies”. Historical Social Research 37, No. 4 (2012): 102–117. https://doi.org/10.12759/HSR.37.2012.4.102-117.
Columbia University Libraries. „Web Archives at Columbia.” Dostęp 26.08.2019. https://library.columbia.edu/collections/web-archives.html.
Common Crawl. „In a Nutshell, Here’s Who We Are.” Dostęp 26.08.2019. https://commoncrawl.org/about/.
Costa, Miguel, i Mário J. Silva. „Evaluating Web Archive Search Systems”. W Web Information Systems Engineering – WISE 2012, zredagowali X. Sean Wang, Isabel Cruz, Alex Delis, i Guangyan Huang, 440–454. Berlin, Heidelberg: Springer Berlin Heidelberg, 2012. https://doi.org/10.1007/978-3-642-35063-4_32.
„DACHS – Leiden: The Digital Archive for Chinese Studies, Leiden Division - Homepage”. Dostęp 26.08.2019. https://projects.zo.uni-heidelberg.de/archive2/DACHS_Leiden/.
„End of Term Web Archive: U.S. Government Websites”. Dostęp 26.08.2019. http://eotarchive.cdlib.org/.
European University Institute. „About the Web Archive of the EU Institutions”. Dostęp 26.08.2019. https://www.eui.eu/Research/HistoricalArchivesOfEU/WebsitesArchivesofEUInstitutions.aspx.
Geereart, Friedel, i Sébastien Soyez. „The first steps towards a Belgian web archive: a federal strategy.” Dostęp 26.08.2019. http://netpreserve.org/ga2019/wp-content/uploads//07/IIPCWAC2019-FRIEDEL_GEERAERT__SEBASTIEN_SOYEZ-The_first_steps_towards_a_Belgian_web_archive-a_federal_strategy.pdf.
Holub, Karolina, i Ingeborg Rudomino. “A decade of web archiving in the National and University Library in Zagreb.” Dostęp 26.08.2019. http://library.ifla.org/1092/1/090-holub-en.pdf.
International Organization for Standardization. Information and documentation – Statistics and quality issues for web archiving. ISO/TR 14873. Genewa: ISO, opublikowana 01. 12.2013.
„Internet Archive: About IA”. Dostęp 26.08.2019. https://archive.org/about/.
Keskitalo, Esa-Pekka. Web Archiving in Finland: memorandum for the members of the CDNL. 2010. http://www.doria.fi/bitstream/handle/10024/67051/webarchivingfinland_cdnl.pdf.
Koninklijke Bibliotheek. „Selection.” Dostęp 26.08.2019. https://www.kb.nl/en/organisation/research-expertise/long-term-usability-of-digital-resources/web-archiving/selection.
Konopa, Bartłomiej. „Archiwa Internetu jako nowe bazy źródłowe”. Archiwa - Kancelarie – Zbiory 9(11) (2018): 49–62. https://doi.org/10.12775/AKZ.2018.003.
Król, Karol. „Z archiwów internetu: zmiany w sposobie prezentacji oferty agroturystycznej.” Marketing i Rynek 24, nr 11 (2017): 19–27. http://homeproject.pl/wp-content/uploads/2018/12/Krol_MiR_11_2017_NR.pdf.
Library of Congress. „Archived Websites | Web Archiving | Programs at the Library of Congress | Library of Congress”. Dostęp 26.08.2019. https://www.loc.gov/programs/web-archiving/archived-websites/.
Masanès, Julien. „Selection for Web Archives.” W Web Archiving, zredagował Julien Masanès, 71–91. Berlin, Heidelberg: Springer Berlin Heidelberg, 2006.
Masanes, Julien. „Web Archiving Methods and Approaches: A Comparative Study”. Library Trends 54, nr 1 (2005): 72–90. https://doi.org/10.1353/lib.2006.0005.
Milligan, Ian. „Lost in the Infinite Archive: The Promise and Pitfalls of Web Archives”. International Journal of Humanities and Arts Computing 10, nr 1 (marzec 2016): 78–94. https://doi.org/10.3366/ijhac.2016.0161.
Nacionalna i sveučilišna knjižnica u Zagrebu, National and University Library in Zagreb, i University Computing Centre Zagreb Sveučilišni računski centar (Srce). „Hrvatski arhiv weba, HAW.” Dostęp 26.08.2019. http://haw.nsk.hr/en/thematic-harvestings.
Nacionalna i sveučilišna knjižnica u Zagrebu, National and University Library in Zagreb, i University Computing Centre Zagreb Sveučilišni računski centar (Srce). „Thematic harvesting.” Dostęp 26.08.2019. http://haw.nsk.hr/en.
National Diet Library. „Archiving Internet Information.” Dostęp 26.08.2019. https://www.ndl.go.jp/en/collect/internet/index.html.
Netarkivet. „Selektive høstninger.” Dostęp 26.08.2019. http://netarkivet.dk/om-netarkivet/selektive-hostninger_2016/.
Nielsen, Janne. Using Web Archives in Research: an Introduction. Aarhus: NetLab, 2016. http://www.netlab.dk/wp-content/uploads/2016/10/Nielsen_Using_Web_Archives_in_Research.pdf.
„Ondarenet”. Dostęp 26.08.2019. http://www.ondarenet.kultura.ejgv.euskadi.eus:8085/ondarenet/.
Pamuła-Cieślak, Natalia. „Ukryty Internet – nowe podejście.” W Oblicza przestrzeni informacyjnej w dobie Web 2.0, zredagowali Katarzyna Domańska, Ewa Głowacka i Paweł Marzec, 35–48. Bydgoszcz: Wydawnictwo Uniwersytetu Kazimierza Wielkiego, 2016.
Pedicat. „Mission and objectives.” Dostęp 26.08.2019. https://www.padicat.cat/en/about-us/what-padicat/mission-and-objectives.
Pedicat. „Monographics.” Dostęp 26.08.2019. https://www.padicat.cat/en/search-and-discover/monographics.
Schostag, Sabine, i Eva Fønss-Jørgensen. “Webarchiving: Legal deposit of internet in Denmark: a curatorial perspective.” Microform & Digitization Review 41, nr 3-4 (2012): 110–120.
Spaniol, Marc, Dimitar Denev, Arturas Mazeika, Gerhard Weikum, i Pierre Senellart. „Data Quality in Web Archiving”. W WICOW '09 Proceedings of the 3rd workshop on Information credibility on the web, 19–26. Nowy Jork: ACM Press, 2009. https://doi.org/10.1145/1526993.1526999.
Summers, Ed, i Ricardo Punzalan. „Bots, Seeds and People: Web Archives as Infrastructure”. W Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing - CSCW ’17, 821–834. Portland, Oregon, USA: ACM Press, 2017. https://doi.org/10.1145/2998181.2998345.
The British Library. „UK Web Archive”. Dostęp 26.08.2019. https://www.bl.uk/collection-guides/uk-web-archive.
The National Archives. „UK Government Web Archive”. Dostęp 26.08.2019. http://www.nationalarchives.gov.uk/webarchive/.
The National Archives, Washington D.C. „Congressional & Federal Government Web Harvests.” Dostęp 26.08.2019. https://www.webharvest.gov/.
Thouvenin, Florent, Peter Hettich, Herbert Burkert, i Urs Gasser. Remembering and Forgetting in the Digital Age. T. 38. Law, Governance and Technology Series. Cham: Springer International Publishing, 2018. https://doi.org/10.1007/978-3-319-90230-2.
Trove. „ Australian Web Archive.” Dostęp 26.08.2019. https://trove.nla.gov.au/website.
UK Web Archives. „Topics and Themes.” Dostęp 26.08.2019. https://www.webarchive.org.uk/en/ukwa/collection.
UNT Libraries. „CyberCemetery Home.” Dostęp 26.08.2019. https://govinfo.library.unt.edu/.
Vernalte, Francisca P. , i Sonia M. Maciá. „Capturing the Basque Web.” Dostęp 26.08.2019. http://eprints.rclis.org/13164/1/EN_Lida_paper_Ondarenet_APA.pdf.
Web Archive Singapore. „Frequently asked questions.” Dostęp 26.08.2019. http://eresources.nlb.gov.sg/webarchives/faq.
Web Archive Singapore. „Special collections.” Dostęp 26.08.2019. http://eresources.nlb.gov.sg/webarchives/special-collection.
„Web Archiving Project (WARP)”. Dostęp 26.08.2019. http://warp.da.ndl.go.jp/?_lang=en.
„Wikimedia Foundation Collections.” Dostęp 26.08.2019. https://archive.org/details/wikimediadownloads?tab=collection.

Document Type

Publication order reference

Identifiers

DOI

AKZ.2019.008

YADDA identifier

bwmeta1.element.desklight-95cd8f0d-bfb0-493c-9eb3-fe06041b1371

Article details

Journal

Archiwa – Kancelarie – Zbiory

Article title

Reborn digital i black box – wpływ procesu archiwizacji na zasób archiwów Webu

Authors

Content

Title variants

Languages of publication

Abstracts

Keywords

Publisher

Journal

Year

Issue

Pages

Physical description

Contributors

References

Document Type

Publication order reference

Identifiers

YADDA identifier