@misc{zhang2024autocoderover,
title={AutoCodeRover: Autonomous Program Improvement},
author={Yuntong Zhang and Haifeng Ruan and Zhiyu Fan and Abhik Roychoudhury},
year={2024},
eprint={2404.05427},
archivePrefix={arXiv},
primaryClass={cs.SE}
}
A SequenceInputStream represents the logical concatenation of other input streams. It starts out with an ordered collection of input streams and reads from the first one until end of file is reached, whereupon it reads from the second one, and so on, until end of file is reached on the last of the contained input streams.
pgloader will keep a separate file of rejected data, but continue trying to copy good data in your database.
pgloader also implements data reformatting, a typical example of that being the transformation of MySQL datestamps 0000-00-00 and 0000-00-00 00:00:00 to PostgreSQL NULL value
A very common workflow is to index some data based on its embeddings and then given a new query embedding retrieve the most similar examples with k-Nearest Neighbor search. For example, you can imagine embedding a large collection of papers by their abstracts and then given a new paper of interest retrieve the most similar papers to it.
TLDR in my experience it ~always works better to use an SVM instead of kNN, if you can afford the slight computational hit
["slug" being an entity attribute]
Spring Data offers an existsBy query method, which we can define in the PostRepository, as follows:
1
2
3
4
5
6
@Repository
public interface PostRepository
extends JpaRepository<Post, Long> {
boolean existsBySlug(String slug);
}
[another] option to emulate existence is using a CASE WHEN EXISTS native SQL query:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
@Repository
public interface PostRepository
extends JpaRepository<Post, Long> {
@Query(value = """
SELECT
CASE WHEN EXISTS (
SELECT 1
FROM post
WHERE slug = :slug
)
THEN 'true'
ELSE 'false'
END
""",
nativeQuery = true
)
boolean existsBySlugWithCase(@Param("slug") String slug);
}
@Repository
public interface PostRepository extends BaseJpaRepository<Post, Long> {
@Query("""
select p
from Post p
where date(p.createdOn) >= :sinceDate
"""
)
@QueryHints(
@QueryHint(name = AvailableHints.HINT_FETCH_SIZE, value = "25")
)
Stream<Post> streamByCreatedOnSince(@Param("sinceDate") LocalDate sinceDate);
}
The FETCH_SIZE JPA query hint is necessary for PostgreSQL and MySQL to instruct the JDBC Driver to prefetch at most 25 records. Otherwise, the PostgreSQL and MySQL JDBC Drivers would prefetch all the query results prior to traversing the underlying ResultSet.
E. Pinheiro, W. Weber, and L. Barroso. Proceedings of the 5th USENIX Conference on File and Storage Technologies, page 2--2. Berkeley, CA, USA, USENIX Association, (2007)
T. Reps, S. Horwitz, and M. Sagiv. Proceedings of the 22Nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, page 49--61. New York, NY, USA, ACM, (1995)
W. Scholz, T. Thüm, S. Apel, and C. Lengauer. Proceedings of the 15th International Software Product Line Conference, Volume 2, page 7:1--7:8. New York, NY, USA, ACM, (2011)
H. Kim, J. Choi, D. Choi, H. Choi, and P. Kim. Proceedings of the 2012 ACM Research in Applied Computation Symposium, page 310--315. New York, NY, USA, ACM, (2012)
G. Mishne, and M. de Rijke. Proceedings ot the 7th International Conference on Computer-Assisted Information Retrieval (Recherche d'Information et ses Applications) - RIAO 2004, page 539--554. CID, (April 2004)
H. Traunmüller. Experiments in speech processes, volume XII of Phonetic Experimental Research at the Institute of Linguistics, The Institute of Linguistics, University of Stockholm, Stockholm, (1991)
S. Gulwani, and M. Marron. Proceedings of the 2014 ACM SIGMOD International Conference on Management of Data, page 803--814. New York, NY, USA, ACM, (2014)
B. Sacaleanu, and G. Neumann. Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC-2012), European Language Resources Association (ELRA), (2012)
M. Schuhmacher, and S. Ponzetto. Proceedings of the 7th ACM International Conference on Web Search and Data Mining, page 543--552. New York, NY, USA, ACM, (2014)
D. Pavlovic, P. Pepper, and D. Smith. Mathematics of Program Construction, volume 6120 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, (2010)
M. Baroni, and R. Zamparelli. Proceedings of the 2010 Conference on Empirical Methods in Natural Language Processing, page 1183--1193. Stroudsburg, PA, USA, Association for Computational Linguistics, (2010)